Adverse Selection in P2P Lending: Does Peer Screening Work Efficiently?—Empirical Evidence from a P2P Platform

The rapid development of online lending in the past decade, while providing convenience and efficiency, also generates large hidden credit risk for the financial system. Will removing financial intermediaries really provide more efficiency to the lending market? This paper used a large dataset with 251,887 loan listings from a pioneer P2P lending platform to investigate the efficiency of the credit-screening mechanism on the P2P lending platform. Our results showed the existence of a TYPE II error in the investors’ decision-making process, which indicated that the investors were predisposed to making inaccurate diagnoses of signals, and gravitated to borrowers with low creditworthiness while inadvertently screening out their counterparts with high creditworthiness. Due to the growing size of the fintech industry, this may pose a systematic risk to the financial system, necessitating regulators’ close attention. Since, investors can better diagnose soft signals, an effective and transparent enlargement of socially related soft information together with a comprehensive and independent credit bureau could mitigate adverse selection in a disintermediation environment.


Introduction
Peer-to-peer (P2P) lending has passed the shakeout period and entered a steady growth period. Its development experience can provide valuable inspiration for current market players. The fast development of disintermediated online lending in the past decade, while providing convenience and efficiency, also generates significant concealed credit risk for the financial system (Huang 2018). For example, due to the fragile auditing process and high default rate, in August 2018, the Chinese P2P market ushered in its consolidation period and experienced a reduction of 42% in P2P platforms when 168 platforms ended operation. Even after the Interim Administrative Measures for the Business Activities of P2P Lending Information Intermediaries was established, the default rate in the P2P industry was still at a high level (You 2018). According to (Gao et al. 2021), Chinese P2P lending platforms have an astonishing default rate of 87.2%, based on data available in 2019. Thus, questions are generated. Does disintermediation really provide more efficiency to the lending market, or does it actually add unforeseen credit risk to the system? Does peer screening work efficiently? This paper used a large dataset with 251,887 loan listings from the pioneer P2P lending platform RenrenDai to investigate the efficiency of the credit-screening mechanism under a disintermediated environment by comparing the performance of loan funding signals and repayment determinants.
A group of scholars (Dorfleitner et al. 2016;Santoso et al. 2020;Liao et al. 2015;Lin et al. 2013;Pötzsch and Böhme 2010;Khan and Xuan 2021) attempted to investigate the determinants of credit rationing in the field. However, the findings in the literature regarding the determinants of loan application success and repayment behavior were inconsistent. Moreover, due to data limitation, the analyses of the default determinants were insufficient. The purpose of our paper was, therefore, to contribute to the literature that explores the determinants of the loan application's performance and the default behavior of the online P2P lending platform. More importantly, the comparison of the results can provide evidence for our research question: Does the peer screening mechanism in the P2P platform efficiently diagnose the signals provided by the borrowers in their loan applications? Due to limitations in the repayment history data, no similar study has been performed using an emerging-market dataset. The only reference is (Iyer et al. 2016), who explored the question by using a Prosper dataset and US credit bureau data. However, this paper did not go deeper and explore the specific determinants which resulted in the misspecification. Our paper will fill this gap and also enrich the literature for emerging markets. We used the dataset from P2P pioneer RenrenDai to test our hypothesis. We divided the information provided by the borrowers into two categories: hard (financial) information and soft (social) information. Our findings showed that the hard (financial) indicators were given great importance when lenders were deciding whether to lend money. However, hard information was either unimportant or even acted in the opposite direction when it came to predicting the repayment behavior of a borrower. Meanwhile, soft information had much less inconsistency in the two models. This proved the existence of a TYPE II error in the investors' decision-making process, which indicated that the investors were predisposed to making inaccurate diagnoses of signals, and gravitated to borrowers with low creditworthiness while inadvertently screening out their counterparts with high creditworthiness. Due to the growing size of the fintech industry, this may pose a systematic risk to the financial system, necessitating regulators' close attention. Since, in contrast to hard financial-based signals, investors can better diagnose the soft signals, this implies enlarging socially related soft signals, and the building up of a comprehensive credit bureau could mitigate the adverse selection in a disintermediation environment.
The paper is divided into five sections. In the literature review, we provide an overview of the existing literature concerning the determinants of loan application success and loan defaults in the P2P market. We compare inconsistencies to find the gaps, then we define our scope. In Section 3, general information about the dataset will be introduced, and our model and the descriptive summary of the chosen variables will be presented. In Section 4, the results of the model are analyzed in detail. Finally, we conclude and discuss the policy implications in the discussion and conclusions section.

Literature Review
In the 1950s and 1960s, (Arrow 1964;Debreu 1959) were the first to explore optimal contracts under uncertainty, and laid the foundation for contract theory. In the late 1960s and 1970s, Gorge Akerlof, Josef Stiglitz, and Michal Spence formed the incentive theory as a branch of contract theory, and introduced the concepts of "hidden information" and "hidden actions". The asymmetric information problem under the incentive theory has been prolongedly discussed in modern contract economies. Credit rationing (Stiglitz and Weiss 1981) and information signaling (Spence 1973) were the two major branches of the discussion.
One major class of the contracting problem lies in hidden information, which is also regarded as adverse selection. It describes a situation in which one party to the contract has private information that the other does not. When the contract is made by the party that lacks private information, the uninformed party needs to screen the information possessed by the informed party. This is the so-called screening problem. If the contract is offered by the informed party, it is a signaling problem, since the informed party can signal the information they have through the type of contract offered. (Akerlof 1970) used the automobile market as an example to explain the situation in which one party has private information, and regards the second-hand automobile market as a market for "lemons", since the seller has private information about the condition of the car, and thus they have the incentive to sell cars of below-average-quality. Therefore, the entire market quality has been dragged down, but due to the asymmetric information, the buyer can only bargain according to the average price, so would only like to buy lower-quality cars, which results in above-average-quality cars exiting the market. This situation, when low-quality products replace high-quality products, resulting in the entire market quality declining, is so-called adverse selection. In the loan market, this refers to a situation in which high-risk borrowers are usually those who are most eagerly looking for money, and most likely to obtain the loan. Thus, how to mitigate adverse selection and how to efficiently use signals to screen the borrower becomes a crucial and heated discussion topic. Credit appraisal is the application of screening in the financial market; the borrower has private information about the quality of the project and the incentives of paying back. Our research investigated the efficiency of the screening mechanism in online lending and a possible approach for improvement.
Empirical research concerning credit analysis in peer-to-peer lending can be divided into two groups. One is targeted at analyzing the trust of the lenders. This research area studies how lenders screen borrowers, or what the determinants are for the success of loan funding. The other trend investigates the borrower's repayment behavior, which indicates their creditworthiness; in other words, the potential factors that may signal the possibility of default.
From the perspective of lenders, according to (Debreu 1959), "The role of soft information in trust building: Evidence from online social lending" is representative of the literature analyzing lenders' trust. Data was used from Germany's largest P2P platform, Smava, to analyze trust-building between borrowers and lenders. The interest rate was used as a proxy for trust level. They introduced the concept of soft information as the personal information the borrower was willing to disclose. The results showed that communicating personal information increased lenders' trust, but the impact was small and limited to educational and professional information. In addition, if the borrower used statements aimed at arousing pity, they were given a higher interest rate, indicating a loss of trust. (Herzenstein et al. 2008), on the other hand, more comprehensively summarized the determinants of success in P2P lending into several groups: demographic characteristics, including gender, race, and marital status; financial strength, including credit ratings from credit bureaus, debt ratio, and house ownership; effort indicators, i.e., the effort to increase reputation, mainly through group activity and loan description; and loan decision variables, i.e., the loan features, such as amount, interest rate, and duration. Their results showed that all variables representing financial strength had a significant influence on funding success except house ownership, which was insignificant. Credit ratings from A to E were all positively related to success, except high-risk grading, but debt-to-income ratio was negatively related to success. Results for demographic characteristics showed that women were more likely to receive funding, which was opposite to expectations; marital status was not significant in the decision to grant a loan. African Americans; racial identity had a negative effect on loan funding success. The effort to include a picture had no significant influence on success, but the effort to join in group activity and give a loan description had a positive effect.
Besides these two representative works which summarized the determinants of success in funding applications, a large group of researchers examined the impact of a specific screening variable on the success of the loan application. (Barasinska and Schäfer 2014) analyzed the impact of gender on the possibility of successful funding on German P2P platform Smava; (Gonzalez and Loureiro 2014) and (Pope and Sydnor 2011) analyzed whether a profile picture would influence funding success; similarly, (Duarte et al. 2012) analyzed appearance and funding success; (Greiner and Wang 2009), (Herrero-Lopez 2009), and (Lin et al. 2013) focused on the impact of social capital on loan success; (Wang et al. 2019) led the analysis of the impact of video information on loan success. Researches in this field provided evidence of the screening determinants from the lender's perspective, but lack the comparison with the borrower's repayment behavior. This may be due to data limitation, but without this comparison we cannot diagnose the efficiency of these determinants. Looking from the lender's perspective can only provide information about the lenders' preference but cannot show whether these preferences correctly recognized the borrower's creditworthiness. Our research is based on the determinants previous studies provided, but in addition we compared the results with the borrowers' repayment behavior to explore the real efficiency of the screening mechanism of the lenders.
From the perspective of borrowers, (Santoso et al. 2020) used data from three Indonesian P2P platforms to analyze the determinants of loan interest rates and default status. As an inconsistency in the existing literature, they also observed that factors such as age and gender have different results on three different platforms. The paper investigated the relationship of the chosen determinants with default probability and the loan interest rate. However, they did not link these two results together and further investigate the phenomenon behind and the origin of the problem. Our paper's target is to fill this gap and analyze whether borrower signals are correctly diagnosed by lenders. (Dorfleitner et al. 2016) studied the effect of soft factors derived from descriptive text on the probability of successful funding and probability of default on two European P2P lending platforms. Their results showed that typos, text length, and keywords evoking positive emotions are significantly related to funding success but have no impact on default probability. Their research provided the first evidence of linguistic factors in credit analysis; however, they only focused on linguistic factors and did not further investigate the misdiagnosis of other soft factors when comparing lenders' judgment and borrowers' real behavior.
The first paper that touched on the efficiency of the lenders' diagnosis is that of (Iyer et al. 2016) in the 2016 paper, "Screening peers softly: Inferring the quality of small borrowers", they used the advantage that they had acquired the true credit scores of the borrowers from the credit bureau while the lenders on the prosper platform only had information about the credit grading. As a predictor, they used the final interest rate collected by the borrower to assess whether the lenders on the platform would use the details available to assess the borrower's true credibility. The results showed that, within one credit category, the lenders were able to infer one-third of the variation in creditworthiness that was captured by credit scores. Their results also suggested that, on top of the traditional financial factors, non-standard "softer" information was also used in analyzing the borrower's credit risk, especially for lower credit rating borrowers. Although the paper diagnosed the fact that lenders on the platform had one-third of the ability to infer the real creditworthiness of the borrower, it also indicated that misspecification existed since only one-third had been captured which implied that two-thirds hadn't. This paper opened the first debate on whether the usage of soft information would compensate for the traditional credit analysis model and add more choice for credit model development after the 2008 financial crisis. However, this paper did not delve into the specific determinants which resulted in the misspecification. Our paper is an extension of that of (Iyer et al. 2016), whereby we provide empirical evidence for the misspecification of the lenders' screening mechanism in P2P lending.
We further compared the literature on these two trends, and found inconsistent results for the same variable in different models; for example, gender was insignificantly correlated with success in (Pötzsch and Böhme 2010) but significantly correlated with success in , (Herzenstein et al. 2008) and (Pope and Sydnor 2011). At the same time, female gender was shown to be positively related to default in (Santoso et al. 2020) but negatively related to default in (Ge et al. 2017) and insignificantly related in (Pope and Sydnor 2011). Moreover, the results of (Dorfleitner et al. 2016) showed that typos, text length, and keywords evoking positive emotions were significantly related to funding success but had no impact on default probability. People who mentioned education in their loan descriptions were more likely to obtain loans (results significant), but mentioning education was shown to be insignificant in predicting default. However, in (Liao et al. 2015), people with higher degrees of education had a lower probability of default (significant) but were not more likely to get funding (insignificant). In (Freedman and Jin 2008), mentioning education in loan descriptions had an insignificant influence on funding success but people who did so were significantly less likely to default. Mentioning car ownership was not significantly related to success but was significantly and positively related to default. In addition, mentioning family was significantly and positively related to success but also significantly and positively related to default. Due to these inconsistencies, we doubt whether investors can truly diagnose the credit signals given by borrowers. If there are misdiagnoses, which factors resulted in these mismatches?
Thus, we come up with our hypothesis: Hypothesis 1: Investors on the P2P platform can correctly diagnose the credit signals the borrower provide and efficiently screen out low credit borrowers; Hypothesis 2: Investors can more efficiently diagnose hard financially related signals than soft socially related signals.

Data, Model and Variables
The data we used is from one of the world's pioneer P2P platforms, RenrenDai, which was established in 2010. By October 2016, the total amount of its transactions exceeded 21.2 billion yuan. The platform targets microloans, 71,000 yuan being the average loan amount. The platform consisted of 251,887 listings from 2010 to 2014. Borrowers fill out a loan application online to be published on the website. Peer investors conduct their own credit analyses and choose which loans to invest in. The funding process is completed when the entire loan amount has been filled by investors. Like crowdfunding, a single loan may have multiple investors. Thus, among the total listings, only 65,394 loans were funded. The borrowers can repay the loan in full or in monthly installments until it matures. Among the funded loans, 50,819 loans are still in the repayment process and 14,575 loans have reached maturity. In the finished loans, 13,901 loans completed the repayment process while the other 674 defaulted, representing a relatively modest default rate of about 4.2%. Detailed variable descriptions are presented below.
Since the dependent variable is binary, we use the logit model to test the determinants of loan funding and default in P2P lending. Our models are presented below: Model I: Model II: The dependent variable for Model I, the funding probability model, is a dummy variable which equals 1 when the loans have been successfully funded, otherwise 0.
Model II is the default predicting model; the dependent variable default represents whether the loan has been repaid completely without delay. 1 represents 'defaulted'; 0 represents 'repaid'.
All the chosen hard and soft information variables are listed in Appendix A, Table A1. All the chosen variables are based on the references from the literature review. We used financially related information, income level and collaterals as the hard information. Socially and psychologically related information such as age, gender, loan description, marital status, educational level and social media information are used as the soft information. Loan features are used as the control variables.
The hard information is represented by key financial determinants that indicate the wealth and solvency of the borrower. They are the four key fundamental financial indicators that are available in our dataset: monthly income, home ownership, car ownership and existing mortgage loans. Car and home ownership are dummy variables, with 1 indicating 'ownership' and 0 indicating 'none'. We include verification of income in the model to certify accuracy.
As soft information is difficult to measure, proxies must be employed. Table 1 summarizes the proxies used in our model. Our approach to soft data is similar to that in the literature: we employ education duration (e.g., (Liao et al. 2015)), age (e.g., (Gonzalez and Loureiro 2014)), and gender (e.g., (Gonzalez and Loureiro 2014;Barasinska and Schäfer 2014;Ravina 2019;Pope and Sydnor 2011)). We also employed the length of the loan purpose statement as a linguistic indicator, as suggested by (Lin et al. 2013;Kim et al. 2020).
Since social impact has been proved to be a significant factor on loan success by (Greiner and Wang 2009;Herrero-Lopez 2009;Lin et al. 2013), we used the verification data from Weibo (the largest Chinese social network) as our indicator of social impact. If an applicant's social network was verified, it is represented as "1", otherwise "0".
Profile photos were shown to influence the funding success in (Pope and Sydnor 2011) study. Since the profile photos on Renrendai.com were not always real pictures of the applicants, we chose video verification as the picture indicator's proxy. During the verification process, borrowers must video themselves holding their ID cards and reading a statement accepting general rules and conditions from Renrendai.com as part of the verification procedure, and then upload the video with their loan application. If the applicant accepted video verification, this is recorded as a "1," otherwise it is reported as a "0".
The expansion of mobile services is a fundamental component of Fintech 2.0, and mobile usage data is the preferred verification tool for Fintech firms, particularly big data firms. Since mobile numbers were introduced to China's real-name system, allowing tracking and verifying of real cellphone users, it has become a critical source for anti-fraud efforts. Furthermore, one of the most powerful indicators of default in the consumer finance market is mobile usage behavior. As a result, we included a variable for mobile verification in our model. This is also a dummy variable: "1" means verified, "0" means not verified.
Based on (Nigmonov et al. 2022) and (Khan and Xuan 2021), we included the interest rate, the length of the loan, and the amount of the loan. The average interest rate is 14.9%, and the highest interest rate is 24.4%. The average amount is 60,637.93 yuan. Since the amount is quite large, we used the log of amount as the proxy to normalize the distribution. The loan term is from 1 month to 36 months. The average term is 16 months.
We summarize the descriptive statistics of all the independent variables in Table 1 below.  Table 2 shows the logit regression results for Model I and Model II. The results show that income has a positive relationship with success since we take the mean group 4 as the reference group. Income groups lower than 4 are less likely to receive loans, while groups higher than 4 are more likely than the average group to have loans funded. This reflects the common sense of peer investors, who believe higher income means better solvency and more trustworthiness. This is consistent with most of the research in the field such as (Pötzsch and Böhme 2010). However, the default results suggest that this is not the case: the lower income group is negatively correlated to default, thus they actually have lower default possibility (e.g., income groups 2 and 3), while the high income group can default more (e.g., income groups 6 and 7 are more likely to default than income group 4). This may be because borrowers have the intention to lie about their income to create a more trustworthy image to the lenders. However, the lenders did not recognize the risk of the fake information. Moreover, the value of the income verification has not been recognized: the high verified income group has a lower default probability. Nevertheless, compared to income group 4, investors give more loans to income group 3 than groups 5,6,7, which is evidently a TYPE II error that provides loans to those with lower creditworthiness. This results from the misdiagnosis signals from income. This also implies the necessity of key information verification on the P2P platform. Since there is no credit rationing process on the platform, the judgment is purely based on unprofessional lenders. The validity of the information provided on the platform becomes critical. Table 2 presents the logit regression results for the funding probability model and default prediction model with coefficient and robust standard errors in brackets. Heteroscedasticity-Robust, standard errors in parentheses. *** p < 0.01, ** p < 0.05, * p < 0.1. The numbers associated with the variable 'income' refer to income groups. The sample includes 7 income groups.

Results
After comparing the logit regression results from both models, we can see that, except car ownership, all other hard information variables have either opposite results when compared to each other or different significance levels.
The median income group 4 is used as the reference variable, revealing that lowerincome groups (1,2,3) are less likely to receive loan funding compared to the median income group (4), whereas higher-income groups (5,6,7) were more likely to be funded. The funding probability model shows interesting results, in which the interaction effect of verified income and declared income elicit opposite results. Surprisingly, higher-income groups are less preferred by the investor. Combined with the results of the default predicting model, we find that verified higher-income groups show lower default probability. However, higher-income groups without income verification demonstrate a higher probability of default. The implication may be that people in higher-income groups are more inclined to be dishonest regarding their incomes. In Table 3, we further analyzed the distribution of the income verification, the results showing that the income verification percentage increases along with the increase of income levels. Applicants in income groups 1 and 2 are very unlikely to verify their income, the verification percentage being only around 0.3%. On the other hand, the high-income groups all have a verification percentage above 14%. However, as we can see from the regression results, investors are less willing to lend to verified high-income groups than the average income group, although verified high-income groups have a lower probability of default. But investors are more willing to lend to unverified high-income groups, who actually have a higher probability of default. This induces TYPE II errors among the investors, since they cannot diagnose the income verification in high-income groups as a positive signal of creditworthiness and lend more funds to those who have a higher probability of default. Table 3 shows the distribution of the verified income group and the percentage it occupies of the total application according to income group. Lenders tend to prefer borrowers with fixed assets such as houses or cars. However, only car ownership is seen to be a significant indicator of reduced probability of default. House ownership is unable to secure loan payment, a finding that is in consonance with that of (Jiménez and Saurina 2004) research, in which loans with collateral are often linked to higher default rates. Additionally, since loans in the P2P market are usually small-sized, this makes a car easier to monetize, whereas the process of realizing a house for loan repayment is more time-consuming and complicated, compared to smaller assets. As far as the mortgage loan is concerned, investors prefer borrowers without any debt. However, the default model is suggestive of the fact that the probability of default is lower for people with mortgage loans. This could be attributed to the fact that people with mortgage loans are more concerned about their creditworthiness.
For soft information, mobile verification exhibits the opposite result in the logit regression. It is negatively correlated to funding probability, but also negatively correlated to default. This means that borrowers who have mobile verification are less likely to default but are also less likely to get the loan funded. From Table 4, we can see that the percentage of mobile verified in successful loans (4.77%) is much less than in defaulted loans (17.87%).
Additionally, the percentages of successful and non-default mobile and video verified loans differentiated substantially. Successful mobile verified loans represent 26.6% of all verified loans, among which only 3.9% defaulted. This is lower than the total default rate of 4.6%. This substantiates a positive relationship of the verified mobile with the high creditworthiness of the borrowers. However, lenders cannot effectively diagnose the signal and categorize the borrowers by this feature.
The phenomenon of non-financial information can improve the prediction model and can sometimes even outperform financial information in predicting default, which has been proved by (Fernando et al. 2020) and (Bhimani et al. 2013) using business loans. Now we add further evidence from the microfinance dataset. Table 4 shows the distribution of the mobile verification in funded and not funded loans, and in default and defaulted loans. The video verification also showed opposite results in the Logit regression comparison, which is consistent with (Duarte et al. 2012), where borrowers' willingness to show their appearance does not indicate that they have higher creditworthiness. However, most of the lenders attach great trust to video verification since the indicator is significantly correlated to loan success. As shown in Table 5, in contrast to mobile verified, 61.29% of video verified loans succeed in funding, while 8.2% defaulted, which is 3.6% higher than the total default rate of 4.6%. This may be due to the fact that borrowers that bear higher risk are willing to offer more information, indicating a classic adverse selection case and a TYPE II error existence. Table 5 shows the distribution of video verification in funded and not funded loans, and in default and defaulted loans. We can also see from the significance level of the variables that all the hard information is significant in the funding probability model except house ownership, but becomes less significant when it comes to the default predicting model. However, this phenomenon does not exist in soft information variables, as the results of soft information are more consistent in both models. This suggests that lenders were less capable if diagnosing the signals from hard information compared to soft information.
From our regression results, we can see that investors were not able to effectively diagnose most of the useful information from the signals provided by borrowers provide, especially from hard financially related signals. This indicates that investors on the P2P platform may have lacked the financial literacy regarding credit appraisal. Their biased investment decisions may have created credit risk to the disintermediated financial system. On the other hand, the P2P investors react surprisingly well to soft signals. They correctly diagnosed the effect of age, gender, educational level, marital status, and social media on creditworthiness. This has important policy implication -in a financial environment with a weak credit bureau and limited financial literacy, soft information may even performs better on credit screening. Adding more socially related soft information into the credit rationing model could mitigate adverse selection in disintermediated financial institutions.

Discussion and Conclusions
This paper examines whether online P2P investors can accurately and effectively diagnose signals of creditworthiness during their decision-making process. According to our findings, the TYPE II errors exist in the investors' decision-making process. Comparisons of the signs used in determining both loan defaults and loan funding show that the investors were predisposed to making inaccurate diagnoses of signals and gravitate to borrowers with low creditworthiness, while inadvertently screening out their counterparts with high creditworthiness.
This particularly happens with hard financially based signals. Specifically, signals such as income and property ownership were insignificant or typically provided contradictory guidance in terms of default. However, investors have allocated disproportionate weights to this in the decision-making process of loan funding. Surprisingly, rather than hard financial signals, investors were more adept at diagnosing soft social signals. That is, all directions of soft signals in the loan funding process were found to be accurate reflections in the default prediction model with the exception of softer signals such as video and mobile verification. These results suggest that soft social information can be a compensatory solution when hard information is not solid enough. The absence of solid credit bureau is typically the main problem for developing countries in credit appraisal, and as our results show, soft information can provide an alternative solution in credit analysis to this problem. Due to data limitations, our soft information is restricted to social identity information. However, with artificial intelligence and machine learning development, softer information relevant to social behavior such as social networks and mobile usage behavior can provide more comprehensive angles of credit analysis in microfinance and deserve further research.
Our paper clearly demonstrated the existence of the TYPE II errors in the disintermediated lending market, indicating a high potential credit risk in financial markets. Due to the growing size of the Fintech industry, this may pose systematic risk to financial systems, requiring regulators' close attention. In addition, we believe the problem of misidentification of credit worthiness signals can be alleviated by a sophisticated and independent credit bureau and increasing public financial literacy. Meanwhile, expanding the use of social soft information could also mitigate adverse selection in the disintermediated financial institutions. And this process must be accompanied by establishing a transparent and effective oversight over the use of soft information in order to avoid abuse.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix B. Robustness Check
Since the dataset is from 2010 to 2014, the change in macroeconomic environment in these years may influence the decisions of the investors and the behavior of the borrowers. As China has 36 different regions, regional differences may be found in financial behavior. Thus, we added region and year dummy variables into the model to control the fixed effect of time and region. The loan application distribution by region and year are listed in Tables A2 and A3 accordingly. The regression result with region and year dummy is presented in Table A4. The results are in line with original regression, in that most of the hard information variables have opposite results in the two models while most of the soft variables have consistent results. This proves the existence of TYPE II errors in the investors' decision-making process.   Heteroscedasticity-Robust, standard errors in parentheses. *** p < 0.01, ** p < 0.05, * p < 0.1.
To control for multicollinearity, we analyzed the variance inflation factors (VIF) of our chosen variables. The results 1 show that all the independent variables' VIFs are within 2 and with an average of 1.27. In other words, the variance of the estimated coefficients is inflated with very low factors and within a reasonable rule-of-thumb of 10. For verification, we also calculated the square root of VIF, the R square for the correlation between the given independent variable and the rest of the independent variables, and the tolerance indicators, which are computed as 1-R square. The results prove the non-existence of multicollinearity.