Determinants of Borrowers’ Default in P2P Lending under Consideration of the Loan Risk Class

We study the determinants of borrowers’ default in P2P lending with a new data set consisting of 70,673 loan observations from the Lending Club. Previous research identified a number of default determining variables but did not distinguish between different loan risk levels. We define four loan risk classes and test the significance of the default determining variables within each loan risk class. Our findings suggest that the significance of most variables depends on the loan risk class. Only a few variables are consistently significant across all risk classes. The debt-to-income ratio, inquiries in the past six months and a loan intended for a small business are positively correlated with the default rate. Annual income and credit card as loan purpose are negatively correlated.


Introduction
Peer-to-peer (P2P) lending platforms are new financial intermediaries connecting borrowers and lenders. Due to technological innovations they facilitate loans with low intermediation costs and thus pose a threat for traditional banks [1]. Unsurprisingly, the popularity of P2P lending is rising rapidly. For example, Lending Club, the biggest P2P lending platform in the world, almost doubled the amount of issued loans from USD 4.4 billion in 2014 to USD 8.4 billion in 2015. The remarkable growth of P2P lending is present in Europe [2] as well as in China [3].
The fundamental problem of lending is information asymmetry between borrowers and lenders: borrowers have more information about their creditworthiness than lenders have. P2P lending platforms try to decrease this information asymmetry. They apply credit scoring techniques and assign a risk grade to each loan that may serve as a signal for lenders. Indeed, existing research [4][5][6] finds a positive correlation between a loan's default and the assigned risk grade. They also find further determinants of the default rate, for instance, the debt-to-income ratio or revolving credit utilization.
We conjecture that the significance of these default determinants depends on the loan's risk grade. Thus, the goal of our study is to evaluate known determinants of borrowers' default for each risk grade separately. We test this with a new data set consisting of 70,673 loan observations from Lending Club. Loans in our data set have a 36-month duration and were issued between January 2009 and December 2012, thus avoiding a structural break in the data due to the financial crisis in 2007/2008.
We identify Annual Income, the Debt-to-Income ratio, Inquiries in Past 6 Months and the loan purposes Credit Card and Small Business as significant determinants of default in the full data set and also across all loan risk classes. The significance of other variables depends on the loan risk class. For example, Revolving Credit Utilization which is significant in our full data set and in less risky loan classes is not significant in loan classes with riskier loans. We conclude that whether loan/borrower characteristics can be used to predict a loan's default chances actually depends on the loan's risk class. We connect our findings to the literature on funding success of P2P loans in an effort to understand to what extent insights about default determinants are anticipated by lenders' choices when funding a loan. Generally, our results contribute to a better understanding of the mechanisms of P2P lending. Potential lenders, especially those investing in high risk loans, can use our findings for their advantage and allocate their money more effectively.
The remainder of the paper is organized as follows. Related literature is reviewed in Section 2. We describe our data set and analytical approach in Section 3 and report our findings in Section 4. Section 5 concludes.

Related Literature
P2P lending platforms are currently experiencing exponential growth 1 with the USA being the biggest P2P lending market. According to [2], there was an average yearly growth of 113% in P2P consumer lending between 2012 and 2014 in Europe (excluding the UK). The amount of funded P2P consumer loans increased from EUR 62.5 million in 2012 to EUR 274.6 million in 2014. Furthermore, Ref. [3] add that P2P lending has been rapidly growing in China since its launch in 2007. According to [10], there were 1575 P2P lending platforms in 2014 with an estimated volume of funded loans between USD 20 and 40 billion by the end of 2015. These numbers would make China the second largest P2P lending market in the world.

Funding Success of P2P Loans
A number of studies explore what factors contribute to the funding success of P2P loans. Most are based on data from the platform Prosper which used to be the biggest P2P lending platform in the USA. Prosper had many social features, such as a discussion forum and detailed borrowers' characteristics including their photos. Studies, among others, by [11,12], stress the importance of social relationships for funding success. They find that borrowers with better social ties are more likely to get their loans funded and to receive a lower interest rate. However, social features were completely removed by Prosper in 2008. 2 Ref. [13] analyze data from the German platform Smava. Their results indicate that men and women are equally likely to get funded.
Several studies focus on herding behavior in P2P lending. Ref. [14] conclude that a 1% increment in the number of bids represents a 15% increase of the probability of an additional bid (until the loan is fully funded). They also control for borrower/loan characteristics and find that the debt-to-income ratio is negatively correlated with funding, while the credit grade is positively correlated with funding. They find no relationship between funding and home ownership or the requested loan amount. Ref. [15] find that lenders observe their peers' lending decisions and use this information to infer creditworthiness of borrowers. Among their control variables, the debt-to-income ratio is negatively correlated with funding, while the credit grade, home owner status and the amount requested are positively correlated with funding.

Determinants of Borrowers' Default
Investing at P2P lending platforms is a risky activity, because the offered loans are not secured. To decrease the information asymmetry between lender and borrower, borrowers are obliged to provide some personal information, such as annual income or the loan's purpose. For example, borrowers at Lending Club are required to provide detailed information about themselves and their credit history. 1 Two reasons for the rapid emergence of P2P lending platforms are put forward in the literature: low intermediation costs [7,8] and credit rationing after the financial crisis in 2007/2008 [9]. 2 The Securities and Exchange Commission (SEC) ordered P2P lending companies to register their loans as securities and provide them through a bank. P2P lending platforms use this information to assess the likelihood of borrowers' default and assign him or her an appropriate interest rate with a given grade class. 3 It is generally assumed that the better the grade the more likely is the borrower to repay his or her debt.
There are several studies, such as [12,17], examining borrowers' characteristics and their influence on borrowers' default based on data from Prosper. We do not review these studies in detail because of the differences between the platforms Prosper and Lending Club. 4 Instead, we focus on studies examining borrowers' default determinants based on Lending Club data: [4][5][6].
All three are in consensus that the Credit Grade 5 assigned by Lending Club is the best predictor for borrowers' default. Moreover, Revolving Credit Line Utilization is another variable influencing the default rate mentioned in all three papers. Findings of other default determinants vary. The discrepancy between the findings of [4][5][6] might be caused by three different factors. The first factor is the selection of variables potentially having an impact on borrowers' default. For example, Ref. [4,5] found out that the FICO score has an influence on default. [6] did not choose the FICO score as an independent variable in their study. The second factor potentially creating discrepancy between the findings is the data set used. Specifically, differences in time frames, classification of loan status or type of loan length might be the cause. For example, Ref. [4,6] used only 36-month loans. Instead, Ref. [5], used both, 36-and 60-month loans. The last factor which might cause the discrepancy is the research method used. Ref. [5] used dynamic logistic regression to assess determinants influencing default rate in P2P lending. Ref. [6] conducted their study with a combination of univariate means test and Cox regression. Ref. [4] chose binary logistic regression for their analysis. For better clarity, we summarize this information in Table 1.

Data and Method
The aim of our study is to evaluate determinants of borrowers' default within given loan grade classes in P2P lending. The data we use come from Lending Club, the biggest P2P lending platform in the world with total loan issuance of almost $16 billion by the end of 2015 [18]. First of all, we explain the Lending Club process and the way how a prospective borrower can apply for a loan. Secondly, we describe our data set. Thirdly, we explain the main variables of interest. At the end of this section, we provide descriptive statistics of our variables and correlational matrixes. 3 An accurate credit scoring predictive model is crucial for P2P lending platforms. Ref. [16] conduct an extensive literature review of more than 200 articles about credit scoring models. They conclude that there does not exist a single best statistical technique used for the creation of credit scoring models. 4 Before the SEC regulation, as discussed above, Prosper used the Dutch auction to determine the appropriate interest rate for borrowers. Moreover, Prosper used social features enabling social network effects between borrowers and lenders. Even after the SEC regulation, there are still significant differences between the platforms. These differences might make the comparison of determinants influencing borrowers' default inaccurate. 5 To better differentiate and highlight variables, we write them with capital letters and in italics.

The Lending Club Process
The Lending Club connects people who want to borrow money with people who are willing to lend their money. Before applying for a loan at the Lending Club, a prospective borrower should find out the value of his/her FICO score which is widely used by banks and credit providers in the USA. 6 The FICO score represents the creditworthiness of a person. It is computed based on a borrower's personal credit report provided by national credit bureaus in the USA. The exact formula for the FICO score computation is held secret, only approximate weights of given categories are made public. 7 About 90% of borrowers' applications at Lending Club is rejected because of an insufficient FICO score. Only potential borrowers with a FICO score of at least 600 are allowed to apply for a loan at Lending Club.
The potential borrower is further asked to provide some personal and loan information. The self-reported information is his/her Annual Income, the current Home Situation (potential options are own, mortgage or rent), the Length of Employment, the Loan Purpose and a Loan Description. All of this information, except Loan Description, are mandatory.
After checking a borrower's FICO score and his/her self-reported information, Lending Club assigns a risk Loan Grade, from A to G, followed by a more accurate risk Loan Subgrade, from A1 to G5, and a corresponding interest rate. The interest rate charged for A1 was 5.32% and 28.99% for G5 in the first quarter of 2016. Lending Club's credit scoring model is kept secret. The P2P lending platform, however, affirms that the risk Loan Grade and Subgrade are computed based on the borrower's FICO score and his or her personal and loan information.
If the offered loan conditions and the interest rate are accepted by the borrower, Lending Club announces the loan on its website. Potential lenders can then view the loan online and start to fund it. During the loan's funding period, Lending Club might ask the borrower to verify the self-reported information. The loan might be removed from Lending Club's website, if the borrower's self-reported information cannot be verified. However, if the loan gets funded before the verification is done, the verification is not needed anymore and the loan is issued.

Our Data Set
In an effort to be fully transparent about company and loan performance, the Lending Club makes public the data of every loan they have ever issued. The information about these loans used to be updated daily, then monthly and currently is updated quarterly. Our Lending Club data set was downloaded in February 2016. It contains information about 884,633 loans issued between June 2007 and December 2015.
From this data we chose only loans issued between January 2009 and December 2012 with 36-months duration. We focus on this period because of the following reasons. First, the default rate (16.10%) of loans issued before January 2009 is higher than the default rate (12.49%) of loans issued between January 2009 and December 2012 (two-sided t-test, p < 0.001). This difference might be caused by the financial crisis in 2007/2008 which hit hard many US households. We decided to use only observations after 2008 in order to avoid a structural break in our data set. Moreover, we have not included loans issued after December 2012 as their maturity has not yet been reached. For a similar reason we have neither included loans with 60-month duration. Loans with 60-month duration were firstly introduced in 2010. Therefore, their maturity has not yet been reached in the data set we have.
For our analysis, we classify loans in our data set as 'Fully Paid' or as 'Charged Off'. This classification will help us to differentiate between good (Fully Paid) and bad (Charged Off) loans. The loans in our data set, however, have six different statuses: Fully Paid, Charged Off, Current, 6 According to myf [19], up to 90% of top lenders use the FICO score. 7 The FICO score consists of five categories from a person's financial history: the payment history (about 35% weight), debt burden (30% weight), length of credit history (15%), types of credit used and recent searches for credit (both 10%). Default, Late (31-120 days), Late (16-30 days) and In Grace Period. See Part A in Table 2 for the distribution of loan statuses in our data set. A loan is marked as Fully Paid when the full principal with interest rates is paid back. A loan with status Charged Off is a loan where a borrower defaulted on the loan and the loan will never be paid back in full amount. Even though we have chosen our dataset's time span so that all loans are supposed to have already reached their maturity, there are still some loans which have not been completely paid back or charged off. This is usually caused by some delayed instalments in the credit life span. Delayed instalments extend the whole maturity of a loan. These loans have statuses Current, In Grace Period, Late (16-30 days), Late (31-120 days) or Defaulted. There are 33 loans with status Current. These loans are currently being paid back. We do not include them in our analyses, because we do not know whether they will or will not be paid back. In a similar vein, there are six loans with status In Grace Period and 6 loans with status Late (16-30 days). In Grace Period means that a loan instalment is delayed by at most 15 days. A loan with status Late (16-30 days) has a delayed instalment between 16-30 days. We do not consider loans with statuses In Grace Period and Late (16-30 days) as Charged off, because these loans are not delayed by more than 30 days. According to the Lending Club statistics, 75% of loans with status Late (31-120 days) are charged off [18]. We believe that this percentage is even higher for loans with instalment delayed by 90 or more days. There are 81 loans with status Late (31-120 days) and 46 of them are delayed by more than 90 days. We consider all of them as Charged Off. Loans with status Default have delayed instalment by more than 120 days. We consider them as Charged Off. The proportion of Fully Paid and Charged Off loans are shown in Part B of Table 2. The number of Charged Off loans in Part B includes all Charged Off loans from Part A, 46 loans with status Late (31-120 days) and 12 Default loans. We distinguish between the following loan risk classes in our analysis. A-graded loans belong to the Low-Risk Class, the Medium-Risk Class consists of B-graded loans and C-graded loans are in the High-Risk Class. Loans graded with D, E, F and G are aggregated to the Very High-Risk Class in order to make the classes somewhat comparable in terms of the number of observations. Table 3, part A, provides an overview of the four loan classes and their corresponding loan grades, average default rates and average FICO scores. Loans in the Very High-Risk Class, see part B in Table 3, are quite similar in terms of FICO score and default rate. Only the default rate of G-graded loans stands out. However, as there are only 76 G-graded loans, it would not be useful to create a separate group for these loans. Therefore, we added G-graded loans to the same class as D-, E-and F-graded loans.

Variables of Interest
There are 78 variables in the data set provided by Lending Club. 8 Not all are of interest for us as some do not include any values (such as Personal Finance Inquiries and Finance Trades) or do not contain useful information for our purposes (like Loan URL and Loan ID).
Our variables of interest can be divided into two sources of information origin. The first source is the borrower's self-reported information. Borrower's self-reported information are Annual Income, Housing Situation, Length of Employment, Loan Amount, Loan Purpose, and Loan Description. The second source of information origin is the borrower's credit file provided by one of three national credit bureaus in the USA. We choose the following variables from a borrower's credit file: Debt−to−Income, Delinquency in Past 2 Years, Date of First Credit Line, Inquiries in Past 6 Months, Months since Last Delinquency, Months since Last Record, Open Credit Lines, and Revolving Credit Utilization. The description of our variables is included in Table A1 in Appendix A.
We modified two variables from the original data set. The first variable is Loan Description. It is provided by a borrower when applying for a loan. There are many ways to use Loan Description as an independent variable that might be the predictor for borrowers' default. For example, Ref. [5] extracted two dummy variables from Loan Description: Borrower's Self-Claimed Creditworthiness and Description Lacking Full Stop. He found that both variables are significant for default prediction. Our approach is different from [5]. We count the number of characters in Loan Description and call this new variable Number of Characters. The second variable of interest from the original data set that was modified is Date of First Credit Line. It is a variable in the form of 'month-year' and represents the reported date of the first open credit line. We transformed this variable into the number of years since the first reported credit line was opened. The name of this new variable is Length of Credit History.
Thus, our variables are fairly similar to variables used by [4][5][6]. Unlike these papers though, we do not include information about the Loan Subgrade and the FICO Score. Furthermore, we neither include the Loan Grade nor the Interest Rate. All these variables are highly correlated, because Loan Subgrade, which is more specific than Loan Grade, is largely based on the FICO score. The interest rate is then assigned based on the Loan Subgrade. 8 We have downloaded the data from the download data section at the Lending Club website. Moreover, the download data section's Data Dictionary provides variable descriptions [18].  Table A2 in Appendix B contains the correlation matrix table of all non-categorical variables. The correlation matrix is based on our full data set of 70,673 observations. The highest correlation (0.33) is between Debt-to-Income and Open Credit Lines. The second largest correlation, which is 0.29, is between Annual Income and Loan Amount. All correlations between Default and other variables are less than 0.1. The most correlated variable with Default is Revolving Credit Utilization with a correlation of 0.08. Table A3 in Appendix B contains descriptive statistics of our full data set. There are 82 missing values of Revolving Credit Utilization and 2538 missing values of Length of Employment. We have excluded all 82 observations with missing values of Revolving Credit Utilization from our data set. Excluding 2538 observations with missing values of Length of Employment from our dataset would mean a significant loss of information for our hypotheses testing. However, we have found that the Length of Employment is not a significant determinant of borrowers' default. This finding allows us to exclude the Length of Employment from our further analysis.

Descriptive Statistics
Overall, there are 15 observations in our data set with a self-reported income exceeding USD 1,000,000 and we have decided to exclude these outliers. Thus our final data set for the remaining analyses includes 70,579 observations. Table A4 in Appendix B presents mean values of the non-categorical variables, in particular loan classes. Interestingly, the highest mean of Annual Income is in the Very High-Risk Class. Furthermore, borrowers from the Very High-Risk Class wrote, on average, the longest loan descriptions. They might be afraid that their loan will not be funded because of their inferior credit grade. Therefore, they might try to provide a sound explanation of the loan need to their potential funders. Concerning Loan Amount, Delinquency in past 2 Years, Inquiries in Past 6 Months, Months since Last Delinquency and Months since Last Record, Open Credit Lines and Revolving Credit Utilization variables, we can observe a rising trend of variable mean values from Low-Risk Class to Very High-Risk Class. The only variable with a declining trend of its mean value is the Length of Credit History.
Finally, we look at the default statistics of our categorical variables. Table A5 in Appendix B contains the Loan Purpose default statistics. The trend of the default rate is clearly rising with the riskiness of a given loan class-starting with a default rate of 6.59% in the Low-Risk Class and ending with 20.06% in the Very High-Risk Class. The two most frequent loan purposes are Debt Consolidation (51.94% of all loans) and Credit Card (18.28%). Furthermore, the purposes Car and Major Purchase have the smallest default rates across all classes. On the other hand, loans with purpose Small Business or Renewable Energy have the highest default rate in the All Classes category. 9 It is interesting to observe how default rates of given Loan Purposes change in particular loan classes. For example, loans with the purpose Moving have a higher default rate in the Medium-Risk Class (17.97%) than in the High-Risk Class

Results
We generally use binary logistic regression specifications to analyze the determinants of borrowers' default. 10 We use backward stepwise elimination to find the most suitable model specification, that is, 9 We do not further comment loans with Renewable Energy purpose, because they make up only a small percentage (0.20%) of all loans. The same applies to the loans with purpose Education (0.34%). 10 All statistical analyses are performed using the software R (version 3.2.3) with its integrated development environment called RStudio. We use the glm function of the family binomial. we start with a full model including all 13 variables of interest. We then drop every variable with a p-value higher than 0.1 starting with the variable with the highest p-value. Backward stepwise elimination is sometimes criticized for producing models which do not fit the data well. Critics of this approach argue that other models might dominate the model achieved by backward stepwise elimination in terms of the Akaike information criterion (AIC), a measurement of relative model quality for a given data set. As a robustness check, we have run additional regressions which employ an automated selection of the best model with AIC as criterion. All of our specifications reached by backward stepwise elimination are the same as the specifications chosen by using AIC as selection criterion. We first run a logistic regression on the full data set (All Classes) because of two reasons. We want to compare our All Classes findings with results of [4][5][6]. Moreover, this allows us to highlight the differences between our regression results from given loan classes and the regression findings based on the whole data set.
Results from the All Classes regression are in Table 4 We proceed with regressions for the four loan risk classes, see also Table 4. Results for the Low Risk and Medium Risk classes only differ slightly from the All Classes results. In both the Lenght of Credit History and the loan purpose Home Improvement are not significant anymore. Months since Last Record is not significant in the Low Risk class, while it is significant in the Medium Risk class. Delinquency in Past 2 Years is not significant in the Medium Risk class, while it is highly significant in the Low Risk class. In the High Risk and Very High Risk classes, the Number of Characters are not significant anymore as well as the loan purposes Car, Debt Consolidation, Home Improvement and Renewable Energy. The Lenght of Credit History is not significant in the High Risk class but it is significant in the Very High Risk class. The loan purpose Major Purchase is not significant anymore in the Very High Risk class.
Revolving Credit Utilization has been found to be a significant predictor for borrowers' default in all related studies [4][5][6] as well as in our All Classes data. However, it is only significant in our Low-Risk and Medium-Risk Classes. It is not a significant determinant in the High-Risk and Very High-Risk Class.

Result 1. Revolving Credit Utilization is a positive determinant of borrowers' default only in low loan risk classes.
The Debt-to-Income ratio is significant in all loan classes. In fact, the Debt-to-Income ratios for defaulted/non-defaulted loans have almost identical values across risk classes.

Result 2. The Debt-to-Income ratio is a positive determinant of borrowers' default in all loan risk classes.
The current Housing Situation is a significant determinant of default in All Classes, as well as in the Low-Risk, Medium-Risk and High Risk classes. It is, however, not significant in the Very High-Risk Class. Defaulting on a loan when having a mortgage on a house would mean the loss of the house. Therefore, there might be a higher motivation for borrowers to avoid default when having the mortgage than living in a rented home. One of the possibilities to avoid default is to take a further loan. Borrowers from the Very High-Risk Class may not have such an opportunity which might explain that there is no effect of the Current Housing Situation.

Result 3. Home ownership is not a significant determinant of borrowers' default in the highest loan risk class.
Overall, creditworthy borrowers write, on average, 169 characters in their loan descriptions compared to 157 characters in loan descriptions of defaulted loans. This difference is highly significant (p < 0.001). Moreover, it is interesting to observe that borrowers in the Very High-Risk Class write, on average, the most characters in their Loan Description compared to borrowers from other classes. Borrowers from the Very High-Risk Class might feel that their Loan Description must be comprehensive in order to get funding with a risky loan grade. However, the Number of Characters are neither significant in the Very High-Risk Class nor in the High-Risk Class, while they are in low risk classes.

Result 4.
In low loan risk classes, creditworthy borrowers write, on average, a longer Loan Description than borrowers who defaulted.
It seems that a loan used for Credit Card consolidation has a significantly higher chance to be paid back even in the Very High-Risk Class, while loans used for a Small Business generally bear a higher risk of default independently of the associated risk class. For example, the default rates of loans with purpose Small Business are twice as high as default rates of loans with Car or Wedding as the purpose.

Results 5. The loan purposes Credit Card and Small Business are significant determinants of borrowers' default in all loan risk classes.
The Length of Credit History is negatively correlated with loan default in our All Classes regression results. This finding is in line with [5,6]'s results. However, it is only supported in the Very High-Risk Class. The Length of Credit History is not a significant determinant of default in the Low-Risk, Medium-Risk and High-Risk classes. It seems that experience with loans in the Very High-Risk Class is of advantage as people get used to live close to their credit limits. For example, a young man without any previous credit experiences classified to be in the Very High-Risk Class, also without any financial buffer, can easily overdraw his credit. This might cause a default because of insufficient credit experience and a lack of possibilities of obtaining an additional loan.

Result 6.
The Length of Credit History is a negative determinant of borrowers' default only in the High-Risk Class.

Discussion
In our full data set, all variables of interest turn out to be significant determinants of default except the variable Open Credit Lines. Table 5 provides a comparison of our All Classes findings and the previously mentioned studies. Generally, discrepancies of results could be due to the fact that our data avoids the structural break of loan defaults possibly caused by the 2007/08 financial crisis. 11 See Table 1 for differences of the data and methodology. The only difference to [5]'s results is that Debt-to-Income is not a significant predictor of borrowers' default in his study. This difference might be caused by the fact that [5] used loans with status 'current' in his analyses. Comparing our results to [6], two discrepancies are worth to note. Loan Amount is not significant in their study but in ours and Open Credit Lines is significant in theirs but not in ours. Finally, our All Classes results are quite different from [4]'s results. Besides differences in the time frame of the data set, Ref. [4] include the Loan Credit Grade and FICO score as explanatory variables in their regression. A high correlation between FICO score and other variables of interest is to be expected, because the FICO score is computed based on these values. The same may apply to the Loan Grade. 11 We show that loans issued before 2009 have significantly higher default rates than loans issued between 2009 and 2012. Note: The mark x denotes that a given variable was found to be a significant determinant of borrowers' default.
Overall, we find the following determinants of borrowers' default which are significant in All Classes as well as in the loan classes separately: Annual Income, Debt-to-Income, Inquiries in Past 6 Months and the loan purposes Credit Card and Small Business. The significance of other variables varies class by class. Revolving Credit Utilization, Number of Characters or the Lenght of Credit History are significant for All Classes but not in given loan classes. Only Open Credit Lines is neither significant in any class nor on the full data set.
In terms of economic significance, the following changes of the predicted default probability result from a one standard deviation change of our main explanatory variables: Annual Income (0.005), Debt-to-Income (0.01), Inquiries in Past 6 Months (0.02) and the loan purposes Credit Card (−0.06) and Small Business (0.08). Given that, on average, 12.5% of all loans default, a one standard deviation change of the Debt-to-Income ratio increases the chances of default by 6%. 12 This value increases to 12% for high risk loans.
As a robustness check of our loan grade classification we merge the two smallest classes, the High and Very High-Risk Class, and consider them jointly. All regressors that are significant in the High-Risk Class are at least as significant in the regression specification with observations from both risk classes. In addition, the variables Loan Amount and Months since Last Record are positively correlated with default. The loan purposes Car and Wedding are negatively correlated with default.
Finally, we address to what extent these insights are taken into account by lenders, when they decide whether to fund a loan or not. For this purpose, we draw on evidence from existing studies who analyze the funding success of P2P loans. According to both [14,15] loans have a higher chance to attract funding, the lower the debt-to-income ratio is and the better the credit grade is. Moreover, Ref. [15] find a positive correlation between funding success and the amount requested as well as whether the borrower's home is owned. Our analysis reassures lenders' skepticism about a borrower's debt-to-income ratio. Other characteristics warrant more caution. We find that the home ownership status is only a good indicator of a loan getting paid back if the loan is not from the highest risk class.
Our results provide insights for potential P2P lenders, especially those who seek to strictly maximize their profit. As mentioned in Section 3, investing in the Very High-Risk Class at Lending Club historically yields the highest net profit (after accounting for defaulted loans). Thus, investors whose primary goal is to achieve a high return on their investment will target the high risk segment and will try to optimize their loan portfolio choices. Our results contribute to a better understanding to what extent our existing knowledge about loan default determinants applies in this high risk segment. It seems that for high risk loans Revolving Credit Utilization or the Home Situation status are treacherous predictors of default. Instead, the mindful investor should target the Length of Credit History, Inquiries in Past 2 Years, Annual Income, the Debt-to-Income ratio and the loan purposes Credit Card or Small Business as reliable predictors of default.

Conclusions
P2P lending connects people in need for a loan with people willing to lend their money. The intermediation of credit is handled through more or less automated online platforms with very low transaction costs. The benefits of automation can transform into lower interest rates for borrowers and higher interest earnings for lenders in comparison to traditional banks.
However, information asymmetries between borrowers and lenders remain a central issue faced by P2P lending platforms. Credit scoring techniques are employed to address this. They assign a credit grade to each loan based on the perceived risk of default. Riskier loans are associated with higher interest rates as higher interest rates serve as compensation for a potential loan default. Besides the credit grade and interest rate, P2P lending platforms usually provide a prospective lender with a large amount of information about a loan's and borrower's characteristics.
Previous research [4][5][6] identifed some of the borrower's and loan's information as useful determinants for borrowers' default. We conjecture that the significance of default determining variables might not be the same in different loan risk classes. In other words, some variables are only significant default determinants in specific loan classes.
While results on our full data set are largely in line with findings of previous studies, our set of separate regressions for each loan risk class identifies only Annual Income, Debt-to-Income, Inquiries in Past 2 Years and the loan purposes Credit Card and Small Business as significant determinants of loan default in all loan risk classes. Revolving Credit Utilization, Delinquency in Past 2 Years and Number of Characters are only significant for low loan risk classes. Lenght of Credit History is only significant for high loan risk classes.
Our analysis confirms that loan/borrower characteristics can indeed be used to predict a loan's default chances. However, since default determinants depend on the loan's risk class, caution is warranted. What seems to be a good predictor of loan default based on overall data may not be reliable in the highest loan risk class. This is relevant since the high risk segment is most attractive to some lenders due to the highest returns that can be reached.

Conflicts of Interest:
On behalf of all authors, the corresponding author states that there is no conflict of interest.

Name of variable Description of variable
Annual Income The self-reported annual income provided by the borrower during registration.

Housing Situation
The home ownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.
Length of Employment Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.

Loan Amount
The listed amount of the loan applied for by the borrower.
Loan Purpose A category provided by the borrower for the loan request.

Number of Characters
The number of characters used by borrower for loan description.

Debt-to-Income
A ratio calculated using the borrower's total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower's self-reported monthly income.

Delinquency in Past 2 Years
The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years.

Length of Credit History
The number of years since the first reported credit line was opened.

Inquiries in Past 6 Months
The number of inquiries in past 6 months (excluding auto and mortgage inquiries).
Months since Last Delinquency The number of months since the borrower's last delinquency.

Months since Last Record
The number of months since the last public record.

Open Credit Lines
The number of open credit lines in the borrower's credit file.

Revolving Credit Utilization
Revolving credit line utilization rate or the amount of credit the borrower is using relative to all available revolving credit.

Appendix B
Descriptive Statistics