Introduction
Nottingham Credit Union (NCU) has been providing financial products, such as loans to the whole of Nottinghamshire area, since 1992. Prior to making decisions regards lending, the credit union usually carries out credit score checks on the applicant. In the past, NCU has used Experian, which is a world leading global information services company, to obtain the data on the credit history. More recently, it has changed to CallCredit- this is a much smaller credit reference agency, which provides a free credit report .
Experian uses two different type of credit scores, namely ALD and Delphi, to report the credit history of a member. CallCredit using the Gauge score.
Aim of this Project
NCU have more than 4 years’ experience of using ALD/Delphi scores and has approximately one year worth of data on the Gauge score. However, to date, there is no evidence of how well Gauge performs.
The main purpose of this project is to identify aspects of the Gauge scores that will enable the NCU to use them more effectively.
This will be achieved by:
- Assessing how the applicant characteristics differ between the types of scores;
- Assessing the NCU’s loan decisions and how this compares between the different scores;
- Analyzing the correlation between the scores and what factors Gauge score gives greater or less weight to;
- Examining how the Gauge score predict bad debt and is it as good as or better than ALD/Delphi scores.
Materials
Nottingham Credit Union have provided 6 separate EXCEL files which contain information on applicants who may have applied for a loan ranging from the dates 15/05/1992 to 09/10/2018.
These files are not exclusive in the data they detail- there are duplications across the files and also applicants apply for loans more than once so a file can detail several rows of data for a given applicant. In brief, the data files are:
Data file Description Number of observations Date of the observation
- Member All membership details 28,764 15/05/1992 to 09/10/2018
- Gauge Contains Gauge score and other information 94 20/02/2918 and 29/06/2018
- Loan tracker 714 28/08/2018 to 28/12/2018
- Mansfield loan tracker- taskstack Mansfield 514 20/02/2018 to 28/12/2018
- Experian Contain ALD/Delphi scores and other information 643 03/01/2018 to 29/06/2018
- Mansfield Loan tracker
- Haywood 2018 180 04/01/2018 to 07/06/2018
- Computer packages used for the data analysis
The data files that have been provided are all in EXCEL 2016. I will use commands in EXCEL to validate and clean the data. The analysis will be carried out EXCEL and R software.
Merging the data files
- The variable names in the three files that contain the Gauge score information, will be re-named. All the variables in this file will be prefixed by G_, to indicate that this is the Gauge score information.
- A similar exercise will be done for the two files containing information on the ALD/Delphi score. The prefix A_ will be used for all the variables.
- For the member data file, the variables will be prefixed by M_.
- The membership identifier will be the only variable that will have the same name across the data files. This is because it will be easier to merge the files on the membership identifier if required.
Data Cleaning and validation
I will use the procedures adopted to clean and validate the data. Each variable in the dataset will be renamed so that it is identifiable with that particular dataset. Each variable will be sorted from the smallest to the largest to identify outliers in the data and checked the accuracy in the data. This will also highlight error typos, duplicates in the data and missing data. Plots and ranges (minimum and maximum values) of the values for each variable will be checked to see if the values are within the acceptable range. Where possible the values will be corrected and duplicates removed. All the data type will also be checked, e.g. date variables are formatted as date type.
Deriving Variables from the datasets
There are some important variables that can be derived from the data such as age (using date of birth and the application date) of the applicant and the area (derived from the postcode) the applicant lives in. These will be used in the analysis of the data.
Research Methods
In line with the questions stated above, the research methods I will use to answer my questions will be:
- Literature review: I will search the literature to assess the use and performance of these credits scores in relation to each other and when used to assess debt.
- Statistical Methods: As some applicants apply more than once for a loan, the time-points will be treated as fixed (1st and last application) – there are few applicants with more than two applications.
In line with the aims detailed above, my analysis will be at each time point, and will be as follows:
- Characteristics of the applicants: will be summarised using summaries such as means, standard deviations and frequencies for each score.
- Spearman’s correlation coefficient (correlation matrix) and scatter diagrams -I will obtain a correlation between the scores. I will also detail the other data on the loans using summary statistics.
- Linear regression models: I will fit a linear regression model, with the scores as dependent variables and looking to see which predictors are important. I will assess the fit of the model and examine the residuals.
- Logistic regression models – To assess which factors drive the decisions made by the union to accept/decline, I will fit a logistic model with decline/accept as the dependent and assess which are strong predictors which contributed to someone being accepted for a loan. Again I will detail the fit of the models and examine the residuals.
- Frequency tables – Using the banding of the two type of scores (very poor to excellent) detailed in, I will display the data in a frequency tables again accept/decline and very-poor to excellent bad debt.
Risk Assessment
The main purpose of my project is to help the NCU with information on how well the Gauge score performs regards loan application. There is some limitation to how much I can do this, as there are only 6 variables in the Gauge data files that I can use and there are 30 in the ALD/Delphi files.