BULETINUL
Universităţii Petrol – Gaze din Ploieşti Vol. LXII
大学生英语作文网No. 1/2010
88 - 96 Seria
Matematică - Informatică - Fizică
Using Principal Component Analysis in Loan Granting
Irina Ioniţă, Daniela Şchiopu
Petroleum - Gas University of Ploiesti, Informatics Department, Ploieşti, Romania
e-mail: , daniela_
Abstract
This paper describes the utility of Principal Component Analysis (PCA) in the banking domain, more ex
actly in the consumer lending problem. PCA is a powerful tool for analyzing data of high dimension. When an applicant requests a loan for personal needs, a credit officer collects data from him and makes a scoring. The factors analyzed can be significant as well as insignificant. The principal component analysis can help in this ca to extract tho factors, which produce a better credit scoring model. The data t ud for the analysis is provided by a public databa containing credit data from a German bank. The results emphasize the utility of PCA in the banking ctor to reduce the dimension of data, without much loss of information.
Keywords:principal component analysis, consumer lending, credit scoring model, banking domain
Introduction
In banking domain to know what are the best decisions to make is a permanent concern for managers. An active banking area with higher risk is reprented by credit department. Here, credits officers analyze the customers application credit forms and calculate a score. The factors considered can influence more or less the credit scoring model. Identifying tho factors with higher significance is not a simply task.5023
Principal Component Analysis (PCA) reprents a powerful tool for analyzing data by reducing the n
umber of dimensions, without important loss of information and has been applied on datats in all scientific domains [16, 17]. On the other hand, PCA is known as an unsupervid dimensionality reduction technique which transfers the data linearly and projects original data to a new t of parameters called the factors (further on, we will u the term “factor” with the meaning of “principal component”), while retaining as much as possible of the variation prent in the data t.
In this paper we discuss u of PCA in credit approval problem, considering a t of records provided by a German bank [24]. The results indicate the utility of eliminating variables with a minimum influence in credit scoring model in order to make a better decision for consumer loan granting. The instrument ud to apply the PCA technique was SPSS [25]. The paper structure contains a ction with theoretical quences referring to PCA, a ction regarding the PCA application in banking domain and a final ction prenting a ca study.
Using Principal Component Analysis in Loan Granting 89
Principal Component Analysis
PCA is considered the oldest technique in multivariate analysis and was first introduced by Pearson in 1901, and it has been experiencing veral modifications until it was generalized by Loeve in 1963
[21].
alasPCA is a method that reduces the dimensionality of a datat, by finding a new t of variables, smaller than the original t of variables [15]. This efficient reduction of the number of variables is achieved by obtaining orthogonal linear combinations of the original variables – the so-called Principal Components (PCs) [12]. PCA is uful for the compression of data and to find patterns in high-dimensional data.
PCA and Factor Analysis (FA) are both methods for data reduction. FA analyzes only the variance shared among the variables, while PCA analyzes all of the variance. Concepts such eigenvalues, eigenvectors, loadings and scores are characteristics for the statistical methods. The main steps of the PCA algorithm are as shown in Figure 1, adapted from [18].
Fig. 1. Principal Components Analysis steps
The mathematical equations for PCA are prented below.
We consider a t of n obrvations on a vector of p variables organized in a matrix X (n x p ): p n x x x ℜ∈},,,{21L . (1) The PCA method finds p artificial variables (principal components). Each principal component is a “linear combination of X matrix columns, in which the weights are elements of an eigenvector to the data covariance matrix or to the correlation matrix, provided the data are centered and standardized” [7]. The principal components are uncorrelated.
90 Irina Ioni ţă, Daniela Şchiopu
The first principal component of the t by the linear transformation is:
n j x a x a z p i ij i j T , (1)
1111===∑= . (2)
In equation (2), the vectors a 1 and x j are:
),...,,(121111p a a a a = (3) ),...,,(21pj j j j x x x x = . (4) One choos a 1 and x j such as the variance of z 1 is maximum. All principal components start at the origin of the ordinate axes. First PC
is direction of maximum variance from origin, while subquent PCs are orthogonal to first PC and describe maximum residual variance.
For example, when we work with two dimensions, we have the situation depicted in Figure 2, adapted from [23].
Fig. 2. Principal Components
bcl是什么意思In the next ction, we make a survey of various applications of the PCA method and we consider the u of this data reduction method in banking ctor.
PCA and Banking Domain
PCA is applied in various domains such as medicine [11], face detection and recognition [3], signal processing [5], banking [1] etc.
As we discusd earlier in this paper, PCA is an effective transformation method for reduction of a large number of correlated variables in situations in which variable lection is hard to achieve. The result of PCA is a t of new independent variables that can be directly ud by credit scoring techniques.
Continuous changes in the banking world produce strategies remodeling, adaptation on new financial trends and manage knowledge. A bank wants to maintain it in balance on the market, to obtain benefits with minimum costs. Managers have to find better solution to make decisions to increa their credibility and to situate their institution on the top.
The basic objectives of bank management have focud on the need to balance between liquidity, asts, credit, interest rate risks, in order to minimize the risks for bankruptcy. Analyzing the factors that may affect the risks is an important job for mangers. An example of factor analysis (similarly with PCA) applied in banking domain to identity the risk exposure is prented in [14]. The results of this analysis indicated that liquidity and interest, domestic market, international market, business operation and credit are the factors affecting banks’ risk
Using Principal Component Analysis in Loan Granting 91 exposure. The managers have to consider the factors in formulating the risk management strategy to avoid any situation of bankruptcy.
Credit department confronts with various problems regarding loan granting process. When a customer applies for a loan, the credit officer requests some financial and nonfinancial date and calculates a score. If the customer obtains a good score, the file with necessary documents will be analyze and nt to Central Bank. Other verifying procedures will be applied (for example, analyzing of customer credit history by Credit Bureau). A respon affirmative or negative will be nd bank to credit officer and the customer will be announced. In the favorable ca, after signing the contract, the bank will supply the customer account with the value of loan granted. The credit score system ud today was designed to provide lenders with financial profiles on consumers who wished to borrow money. The lenders' biggest concern was whether or not an individual had the ability to repay a loan on established terms, and find what percentage of risk might be involved [6, 8, 10]. Credit scoring is calculated by a mathematical equation that evaluates many types of information found in a consumer’s credit file (duration of loan, loan amount, number of years on rvice, number of years of residence, marital status, education level etc.) [2, 4, 9]. By comparing this information to the repayment patterns store in hundreds of thousands of consumers’ past credit reports, the score identifies the lender’s level of future credit risk.
For example, the FICO credit score model takes into consideration five factors to create a model for credit scoring [19]:
o Payment history (35% significance);
o Outstanding credit balances (30% significance);
o Credit history (15% significance);
o Type of credit (10% significance);
o Inquiries (10% significance).
A FICO score can range between 300 and 850 and is a measure of client creditworthiness. Most credit bureau scores ud in the U.S. are produced by Fair Isaac and Company, or FICO. FICO scores are provided to lenders by the three major credit reporting agencies: Equifax, Experian, and TransUnion [20]. The FICO score is considered an efficient predictive scoring model designed to evaluate costumer credit risk [19]. This credit scoring model has been widely developed and is ud in many credit bureaus around the world, such as [19]: Asia/Pacific Rim, including South Korea, Singapore, India, Taiwan and Thailand, Europe/Middle East, including Ireland, Poland, Sweden, Sau小学三年级英语课件
debug是什么意思di Arabia and Turkey, Latin America, including Brazil, Mexico, Peru and Panama. The FICO score is prented in Romania, since January 2009.
In order to develop a credit scoring system to assist the credit officers in their decision process, we formulate the hypothesis of using PCA to identify tho variables with minimum effect in credit scoring computing and to eliminate them from the scoring model. The next ctions prent our ca study. The software package ud in this ca is SPSS [25].charger
Ca Study
The datat that has been ud in our ca study has been obtained from a public databa that contains credit data of a German bank [22]. We have chon 500 records from this databa, data that were organized in a table in SPSS [25]. In Table 1 we prent the first five rows. The table also contained information about duration of the credit, credit history, purpo of the loan, credit amount, savings account, years employed, payment rate, personal status, residency, property, age, housing, number of credits at bank, job, dependents and credit approval (target variable). The first 15 variables will be further on denoted with V1 to V15 and the target variable with VT.
92 Irina Ioni ţă, Daniela Şchiopu
Table 1. Credit Data
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 VT
1 6
2 0
1169 0 4 4 1 4 2 67 1 2 1 1 1 2 48
0 0 5951 1 2 2 0 2 2 22 1 1 1 1 0 3 12
2 1 2096 1
3 2 1 3 2 49 1 1 0 2 1
4 42
0 2 7882 1 3 2 1 4 3 45 0 1 1 2 1 5 24
1 3 4870 1
2
3 1
4 0 53 0 2 1 2 0 The criteria we take into account for the number of the retained factors are: the cumulated percent in variation explained by the retained factors should be higher than 50% and the variance of each retained factor should be higher than 1.
First, PCA summarizes the pattern of intercorrelations between variables. The variables that are highly correlated with one another are grouped together into factors. The correlation matrix for the first 13 variables is shown in Table 2.
Value 0 for correlation coefficient indicates the abnce of statistical linkage between variables. We note that only the V5 variable is not well correlated with some others. Table 2. The Correlation Matrix
floccinaucinihilipilification
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V1
1 V2
,103 1 V3
,108 ,149 1 V4
围巾哥,611 ,115 ,201 1 V5
-,058 -,054 ,006 -,084 1 V6
,011 ,122 -,019 -,074 ,035 1 V7
,057 ,087 -,063 -,288 ,007 ,136 1 V8
-,053 -,049 ,060 -,025 ,075 ,091 ,033 1 V9
,052 ,104 ,121 ,024 ,011 ,220 ,030 -,112 1 V10
-,188 -,096 -,069 -,247 -,049 -,094 -,024 ,002 -,103 1 V11
-,036 ,198 ,104 ,021 ,003 ,270 ,124 ,038 ,327 -,130 1 V12
-,141 -,091 -,067 -,136 ,058 -,095 -,079 -,036 -,046 ,318 -,342 1 V13 -,031 ,452 ,131 ,016 ,015 ,099 ,043 -,046 ,064 -,022 ,149 -,090 1
sukey
KMO (Kair-Meyer-Olkin) is a statistic test that indicates the degree of association of the variables [16]. In our ca, KMO test is 0.555. This value is an argument favorable to the existence of factors, suggesting that factoring is appropriate.
The communality for a given variable can be interpreted as the proportion of variation in that variable that is explained by the analyzed factor. A factor loading reprents the correlation between a variable and a factor that has been extracted from the data. The communalities for the V i variable (i = 1,…,15) are computed by taking the sum of the squared loadings for that variable. Lowest values of communality indicate that the analyzed variable is inadequate reprented by the factorial model. Here, the most variable communalities are between 0.5 and 0.9 (e Table 3).
Using the PCA method the fifteen variables are reduced to ven factors as shown in Table 4. The ven factors can be ud further as predictors.
The variance in the correlation matrix is reasmbled into 15 eigenvalues. Each eigenvalue reprents the amount of variance that has been captured by one component.
In general, once eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives the components in order of significance and we can decide
to ignore the components of lesr significance. This procedure implies to lo some information, but if the eigenvalues are small, the loss of information is minimal.