1. Introduction
We are analysts and have access to two unique historical data sets. The first is studying
the relationships between customer characteristics. Whether they are likely to default on their
credit—the other set of data we are given is for the economy. In the dataset, we explore many
variables necessary to calculate the risk that their customer will default on their credit. The
variable used can determine what type of regression model will be developed given to the
specific scenario. We are trying to do a case study and develop a regression model. A dataset can
let us know or predict when a recession will occur and help us prepare for it. It is essential to
government agencies or anyone like a financial analysis to prepare for a crisis. In the credit data
set, we can analyze financial risk. This data analysis can better prepare for different scenarios,
such as what affects a given area. Data sets are essential and developing regression models.
Especially nowadays, with everything collecting data around us, finding validity is more critical
than ever. This regression model can help someone see if they are likely to default. Regression
models are used, such as in case studies or anywhere is very important in general.
2. Data Preparation
The data sets that have been given are from credit_card_default.csv. The data set has
many variables related to the risk that their customers will default on their credit. The data set
consists of 8 columns and around 601 rows. The columns are the particular variable. The rows
are the different values of historical data sets for the risk that their customers will default on their
credit is given particular variables being compared. They had so many variables that could be
extremely important for a data analyst to develop some regression model based on the data.
The variables used to risk their customers will default on their credit, age, sex, education,
marriage, assets, missed_payment, credit_utilize, and default. We want to find a regression
model that best fits an actual scenario data set to see if customers default on their credit.
The other data sets that have been given are from economic.csv. The data set has many
variables that are related to wage growth. The data set consists of 6 columns and around 49 rows.
The columns are the particular variable, and the rows are the different values of historical data