Lender Trust on the P2P Lending: Analysis Based on Sentiment Analysis of Comment Text

: Lender trust is important to ensure the sustainability of P2P lending. This paper uses web crawling to collect more than 240,000 unique pieces of comment text data. Based on the mapping relationship between emotion and trust, we use the lexicon-based method and deep learning to check the trust of a given lender in P2P lending. Further, we use the Latent Dirichlet Allocation (LDA) topic model to mine topics concerned with this research. The results show that lenders are positive about P2P lending, though this tendency ﬂuctuates downward with time. The security, rate of return, and compliance of P2P lending are the issues of greatest concern to lenders. This study reveals the core subject areas that inﬂuence a lender’s emotions and trusts and provides a theoretical basis and empirical reference for relevant platforms to improve their operational level while enhancing competitiveness. This analytical approach o ﬀ ers insights for researchers to understand the hidden content behind the text data.


Introduction
P2P (Peer-to-Peer) lending has garnered significant attention in recent years due to its competitive fees and expedited loan approvals relative to those offered by banks, thus offering important economic benefits to individuals and SMEs [1]. The number of platforms offering loans as well as the loan amounts themselves have increased significantly in many countries. In China, the P2P lending industry has expanded by a factor of 60 from 2013 to 2017 [2]. However, in recent years, many countries have become increasingly stringent in their regulation of P2P lending [3]. For example, between 2017 and today, hundreds of P2P lending platforms have collapsed in China. Monthly transaction volume in P2P lending has dropped from 250.844 billion yuan in March of 2017 to 42.889 billion yuan in December of 2019, representing a total decrease of 82.9% [4]. The plunge in transaction volume highlights the current crisis in lenders' trust in P2P lending. P2P lending directly connects borrowers with lenders. To ensure the sustainability of the P2P lending industry, it is critical that lenders have a robust understanding of P2P lending [5]. Only by maintaining full trust in a large number of lenders can it be ensured that borrowers will be able to raise sufficient funds. Consequently, this will ensure the normal operation and sustainability of a given P2P lending platform. In the context of the many platforms that have collapsed and the steady stream of bad news regarding the P2P lending industry, whether lenders can trust P2P lending and how their level of trust has changed are quite significant questions. What issues are lenders most concerned about in terms of P2P lending? None of these questions has been previously answered. Existing research on P2P lending focuses on lender-selected platforms and target selection [6], but pays less attention to lenders' trust and perception of P2P lending. Although a small number of inquiries into the trustworthiness of P2P lending have used questionnaires, most traditional questionnaires are associated with unattractive barriers to entry, such as costliness and significant time consumption [7]. The questionnaire survey method limits the scope of interviews to questionnaire interviewees, limits the thinking of the respondents to the designed questions, and restricts the research perspective to that of the surveyor.
Lenders have expressed their attitudes and perceptions regarding P2P lending on the Internet. Such data provide rich material for conducting P2P lending research. By mapping the relationship between lenders' emotions and trust, we collected the lender's comment data regarding P2P lending platforms in cyberspace and used text sentiment analysis to explore the degree of lender trust in P2P lending. This process showcases changes in lender trust in P2P lending and by using the LDA topic model to examine the main issues about which the lender is concerned while mining and analyzing hidden information as a means of clarifying potentially cognitive topics that are associated with P2P lending that are deemed important by lenders.
The research based on the text data of online reviews has greatly expanded the scope of available information. All groups using online resources can be used as data points, expanding on the original scope limited to the questionnaires. The review text is the personal experience or sentiment of the lender, and enables the lender to clarify the issues of concern more succinctly so that researchers can better understand the industry as well as the topics associated with the P2P lending that the lender cares about.
The remainder of the paper is organized as follows. Section 2 presents the literature review. Section 3 provides the research design and data processing. Section 4 presents the results. Finally, conclusions are presented in Section 5.

Literature Review
Trust has been defined by many different scholars in the fields of psychology, sociology, etc. Among them, Guiso et al. [8]'s viewpoint is widely accepted, defining trust as a person's comprehensive judgment of being trusted, and believing that the person being trusted will not harm their interests and is willing to bear the risks therein; that is to say, trust is the possibility of being deceived by the person in whom trust is placed. Specifically, the lender's trust generation process is based on a comprehensive judgment of relevant information to weigh the benefits and risks of investing in P2P lending platforms and then determining the trust in P2P lending.
P2P lending is a new financial method that still falls prey to a high level of information asymmetry between borrowers and lenders [9]. When faced with this new model, lenders must transition from a lack of trust to trust initiation. Kim et al. [10] prove that the lender's own tendency to trust plays an important role in the formation of initiating trust. Lenders who are willing to try new things are more likely to invest in P2P lending after learning about the benefits of this new financial method. The lender's penchant for trust, borrower-related information, and the level of security of the P2P lending platform jointly determine the lender's transactional trust [11]. The difference in each person's growth environment determines the trust tendencies of a given lender-a thing that P2P lending platforms cannot directly affect. However, to address the trust problem caused by asymmetric information, platforms can force borrowers to provide relevant information. This increases a lender's trust in the P2P platform. Some borrowers will also add information under the encouragement of the platform or as a means of increasing the possibility of obtaining a loan. This information is mainly divided into three categories: information directly related to the loan, such as the loan amount, the borrower's credit rating, interest rate, term, and purpose; personal information about the borrower, such as age, gender, education level, and financial status; and borrower social network information. Many factors affect the lender's trust. In addition to the borrower's credit rating, loan amount, loan rate, loan term, and other information directly related to the loan [12], age [13], gender [14], appearance [15], education premiums [16], and other factors will affect the lender's decision to allocate funds. The research on PPDAI shows that the more friends a borrower has, the easier it is to win the trust of the lender. Likewise, if the borrower's friend's credit rating is higher, the borrower is deemed to be more trustworthy and can get a loan more quickly. The empirical results underscore the role of social network information [17]. With the increase in the number of investments, the lenders can more clearly identify the risk of lending and determine which borrowers are more trustworthy [18].
In addition to hard information, soft information also affects lender trust. When the borrower applies for a loan, he will state his reason for doing so. The text data cannot be directly compared with the loan amount, credit rating, or other hard data, but it does still affect the lender's judgment. Herzenstein et al. [19] proved that after excluding other factors, borrowers who provided a reason for borrowing had higher success rates than those who did not. Larrimore et al. [20] show that the more the borrower explains his financial situation and the more specific the content is, the more likely he is to obtain a loan. Research on Chinese P2P lending platforms suggests that borrowers can improve their credit image and enhance the possibility of obtaining loans by modifying text information [21]. Gao and Lin [22] demonstrated that text that explains a loanee's reason for borrowing can impact loan default. For borrowers with strong readability and objective descriptions, the default probability is lower. This research shows that textual data plays an important role in P2P lending. In recent years, a series of studies have attempted to quantify textual sentiments with methods such as using Twitter data to gauge the investor's mood [23], examine tourist recommendation systems [24], and analyze product feature evaluation [25]. However, less research has been done on textual analysis regarding P2P lending. In sum, there has been considerable interest in P2P lending, with an added focus on lender behavior as well as borrower information. Research on the cognition of lender emotion is sparse, and a large amount of unstructured lender-generated data is also rarely referenced.

Research Design
From 2017 to now, hundreds of P2P lending platforms have collapsed in China, and many savers cannot get their money back. The monthly transaction volume of P2P lending has dropped from 250.844 billion yuan in March of 2017 to 42.889 billion yuan in December of 2019, a decrease of 82.9% [4]. Under the circumstances, some lenders will reduce or even no longer trust P2P lending. Likewise, some lenders deliberately ignored the warning signs from previous bad investments and continued to lend through various P2P lending platforms. Relevant risks change dynamically throughout the process. It is also difficult for lenders to maintain trust in P2P lending at a stable level while following changes in the environment. Lenders' trust in P2P lending is a psychological state formed by the combination of internal emotional factors and external environmental factors [26]. This mental state is directly reflected in the emotional tendencies shown by their behavior and attitude. Specifically, the positive emotions of the public reflect trust, while the negative emotions of the public reflect distrust [27]. Some scholars believe that emotion and trust are fused and cannot be separated [28]. Moreover, one's emotional state is one of the key factors affecting trust [29]. Negative emotions reduce interpersonal trust, while positive emotions have the opposite effect [30]. In other words, lender emotion is both a reflection and an influencing factor of the lender's trust.
Using the Internet, the lender can express views on P2P lending that can be seen as a true reflection of their own opinions. Therefore, based on the inner relationship between emotion and trust, the lender's emotion is the reflection and influence factor of trust. There is a mapping relationship between the lender's emotion and trust, and the lender's emotion can be used as an effective criterion in measuring trust. Trust is the cognition and evaluation of a lender that is formed during the interaction between the lender and the P2P lending process. The lender's emotion, on the other hand, is the psychological state expressed by the lender through the Internet after having invested in a P2P lending platform. The lender's emotion is an effective representation of trust. Therefore, we can describe the tendency of trust, the strength of trust, the changes in trust, and the core factors that affect trust by measuring and analyzing the emotional polarity, emotional intensity, emotional temporal changes, and the emotional structure of the lender. We used Python to write a web crawler capable of collecting comment texts within a limited period on wdzj.com and preprocessing the collected comment texts to form a data set suitable for analysis. To ensure the validity of the results, the lexicon-based method and Bidirectional Encoder Representations from Transformers (BERT) method are used to perform sentiment analysis on the comment text. Based on the sentiment calculation results, we analyzed the overall distribution of different types of sentiments to measure the lender's trust in P2P lending. We then perform a time series analysis of the lender's emotions to reveal the evolution of the lender's trust in P2P lending. Finally, we use word frequency analysis to show high-frequency vocabulary and deepen the understanding of the comment text. The LDA topic model is used to analyze the overall text as well as negative-emotion comments as a means of obtaining the relevant topic and clarifying the lender's cognitive topic of P2P lending.
The sentence is segmented by the Python extension package jieba, and then used to write scripts for calculation in Python. The BERT model uses the open-source BERT package in Github along with Python 3.6.4, and TensorFlow 1.5.0 environment. Word cloud and high-frequency word statistics are performed using the Python extension packages, jieba and wordcloud. The LDA topic analysis is implemented using the Python extension package gensim.

Text Processing
This article uses the comment text in the "Comment" column of wdzj.com as its data source. Wdzj.com is the largest and most authoritative P2P lending consulting platform in China, with respect to comprehensive industry data. A large number of lenders will judge the P2P lending platform at wdzj.com after utilizing the P2P platform. Furthermore, the lender will browse the relevant platform information on wdzj.com when selecting the investment platform, and refer to the previous lender's comments to judge whether to lend on related platforms. As a result of the enrichment of wdzj.com's comment section via the addition of a large number of comments since October 2015, text collection commenced on 1 November 2015, and ended on 31 December 2019, for a total collection period spanning 50 months. The data contains a total of 243,533 reviews of 2460 P2P lending platforms. To ensure the validity of the results, we discounted duplicate content, as well as Uniform Resource Locator (URL), @users and other operators posted by the same person, and deleted blank text along with text that only contained punctuation. In response to comments including traditional characters, we converted all text to simplified characters using Excel's simplified conversion function. After data preprocessing, a total of 243,405 comments were included.

Sentiment Lexicon
Sentiment analysis is used to analyze texts and divide them into a binary classification of positive and negative emotions. There are two sentiment analysis methods. One is based on machine learning, and the other utilizes lexicon-based methods. Machine learning methods tend to manually learn the characteristics of positively-and negatively-charged sentences and judge the newly given sentences according to the learned features. Lexicon-based methods calculate the sentiment scores of words or evaluation units in sentences and then add them up to determine the emotion of the sentence according to the set threshold [31]. To ensure the validity of the results, we used the lexicon-based method and BERT to perform sentiment analysis on the lender's comments. The specific operation of the lexicon-based method is as follows: 3.3.1. Vocabulary Configuration

1.
Proper noun lexicon added. P2P lending is a highly-specialized field. Using only conventional lexicon will generate a large number of erroneous word segmentations. To improve word segmentation accuracy as much as possible, a manual preset "proprietary dictionary" is still required to support it. Taking the P2P lending platform "Small and Micro Finance" as an example, a non-interfering algorithm will split "Small and Micro Finance" into two segments: "Small and Micro" and "Finance." In this article, by using the web crawler and manual selection, the names of the 2460 P2P lending platforms involved in wdzj.com, product names, and names of key persons are used as proprietary dictionaries to help the program perform correct word segmentation.

2.
Expansion and correction of the lexicon. This article uses the sentiment vocabulary provided by Boson NLP. The Boson NLP sentiment vocabulary has a better treatment of non-standard texts containing a large amount of Internet lexicon. Relative to the National Taiwan University Semantic Dictionary (NTUSD) and HowNet lexicon used in the past, the problem of insufficient Internet lexicon is avoided [32]. P2P loan comments come from the Internet. Among them, there are numerous Internet lexicon and semantic changes. The Word2Vec model [33] introduced by Google is used to make up for issues associated with an inexhaustible emotional vocabulary and diverse meanings of words. The Word2Vec model maps words into n-dimensional vectors and uses similarities in the spaces between two words to determine semantic correlation. Using this property, we can easily amass several synonyms of related words in the corpus used. Table 1 shows the top 5 synonyms for the word "scam" returned by the Word2Vec model.

3.
Negative. The negative vocabulary is supplemented by the HowNet vocabulary, with a total of 21 Chinese negative words.

4.
Adverbs of degree. The adverbs of degree list refers to HowNet's sentiment analysis words. A total of 218 adverbs of degree are obtained, ranging in intensity from 0.8 to 2, where >1 indicates enhanced emotion and <1 indicates weak emotion.

5.
Stop words. The stop words are derived from stop words released by the Chinese Natural Language Processing Open Platform of the Institute of Computing Technology, Chinese Academy of Sciences, with a total of 1208 stop words.

Emotion Computing
The word segmentation function is carried out using "jieba" (https://github.com/fxsjy/jieba). To calculate the sentiment polarity and intensity of the text being analyzed, the words after the text segmentation are processed as follows: two emotional words are selected along with negative words and adverbs of degree. The adverbs of degree, along with the two emotional words, form the emotional phrase. For example, "is not very friendly (不是很友好)", in which "is not (不是)" is a negative word, "very (很)" is a degree adverb, "friendly (友好)" is an emotional word, and the score of this emotional phrase is: score = (−1) × 1.25 × 1.14. Among them, −1 and 1.25 represent the values of negative words and adverbs of degree, respectively, and 1.14 is the emotional score of the emotional word "friendly." The score for each sentence is representative of the emotional intensity of each sentence. Positive sentiments are determined to be greater than 0, and negative sentiments are determined to be less than 0.
Considering the situation of extremely positive sentiments and extremely negative sentiments, after the calculation of all sentences, we arrange all sentiment values in order of size and replace extreme sentiment values outside the 1-99% interval value. The top 1% are replaced with 1%, and values falling within the 99-100% range are replaced with 99%.

BERT
The BERT model [34] is a new language representation model proposed by Google that performs well in natural language processing. In the top-level test of machine reading comprehension the Sustainability 2020, 12, 3293 6 of 14 Stanford Question Answering Dataset 1.1 (SQuAD1.1), all the evaluation indicators surpass the capabilities of humans. Because of its excellent performance in the field of natural language processing, we use BERT to perform sentiment analysis on each comment to obtain the positive and negative classification associated with each comment, and then statistically learn the monthly trust information as a means of analyzing monthly sentiment changes.
The BERT model belongs to Pre-trained Language Modeling. It uses a deep neural network structure to train in large batches of text to obtain the common features of the language. In the past, commonly-used language models such as Embeddings from Language Models (ELMo) and OpenAI Generative Pre-TrainingGPT models had also been used as pre-trained models, but most of them calculate the probability of a word from left to right. In addition, these models do not take contextual meaning into account in the learning process and do not make good use of contextual information, which affects the learning of word polysemy and related sentence grammatical features. BERT uses a two-way Transformer encoder, which can incorporate the information on the left and right sides of the word into the learning, and use the Masked Language Model (MLM) and the next sentence prediction scheme to deepen the words and sentence content understanding.
For word-level recognition, the BERT model uses the MLM model to overcome one-way learning in the language. During the learning process, MLM randomly masks some words in the model input and then predicts those words that are blocked. The next sentence prediction scheme learns the relationship between sentences by pre-training a binary classification model. Specifically, the scheme randomly replaces some sentences and uses prior sentences to predict subsequent sentences. The prediction accuracy during pre-training can reach 97-98%.
The core improvement of BERT lies in the use of a bi-directional transformer coding structure. Different from the recurrent neural network (RNN)'s cyclical network structure, the most important part of the two-way transformer coding unit is its self-attention mechanism. The core idea of the self-attention model is to calculate the correlations between all words in each sentence. These correlations reflect the relevance and importance of different words and allow us to adjust the weight of each word according to the correlation. The adjusted weights represent the relationship between the words, compared to previous, more comprehensive, word vectors.
In general, BERT uses the information before and after the words to get a better word distribution and deepen the understanding of the language.

Comment Text Topic Analysis
Generally speaking, an article will have multiple topics, with some words appearing more frequently than others. The topic model is a statistical model used to find abstract topics in a series of documents. In recent years, the application of topic models in social networks and computer vision has achieved great success. Commonly-used topic models include Latent Semantic Analysis, Probabilistic Latent Semantic Analysis, and LDA. Of the three, the LDA model is the most extensive and effective model. The LDA model uses the prior knowledge of topic distribution information provided by the Dirichlet distribution, which describes the document generation process in detail. After the document is processed by the LDA model, the topic distribution of the document and the distribution of words in each topic will be generated. Figure 1 graphically shows the relevant variables and acquisition of the LDA model. capabilities of humans. Because of its excellent performance in the field of natural language processing, we use BERT to perform sentiment analysis on each comment to obtain the positive and negative classification associated with each comment, and then statistically learn the monthly trust information as a means of analyzing monthly sentiment changes. The BERT model belongs to Pre-trained Language Modeling. It uses a deep neural network structure to train in large batches of text to obtain the common features of the language. In the past, commonly-used language models such as Embeddings from Language Models (ELMo) and OpenAI Generative Pre-TrainingGPT models had also been used as pre-trained models, but most of them calculate the probability of a word from left to right. In addition, these models do not take contextual meaning into account in the learning process and do not make good use of contextual information, which affects the learning of word polysemy and related sentence grammatical features. BERT uses a two-way Transformer encoder, which can incorporate the information on the left and right sides of the word into the learning, and use the Masked Language Model (MLM) and the next sentence prediction scheme to deepen the words and sentence content understanding.
For word-level recognition, the BERT model uses the MLM model to overcome one-way learning in the language. During the learning process, MLM randomly masks some words in the model input and then predicts those words that are blocked. The next sentence prediction scheme learns the relationship between sentences by pre-training a binary classification model. Specifically, the scheme randomly replaces some sentences and uses prior sentences to predict subsequent sentences. The prediction accuracy during pre-training can reach 97-98%.
The core improvement of BERT lies in the use of a bi-directional transformer coding structure. Different from the recurrent neural network (RNN)'s cyclical network structure, the most important part of the two-way transformer coding unit is its self-attention mechanism. The core idea of the self-attention model is to calculate the correlations between all words in each sentence. These correlations reflect the relevance and importance of different words and allow us to adjust the weight of each word according to the correlation. The adjusted weights represent the relationship between the words, compared to previous, more comprehensive, word vectors.
In general, BERT uses the information before and after the words to get a better word distribution and deepen the understanding of the language.

Comment Text Topic Analysis
Generally speaking, an article will have multiple topics, with some words appearing more frequently than others. The topic model is a statistical model used to find abstract topics in a series of documents. In recent years, the application of topic models in social networks and computer vision has achieved great success. Commonly-used topic models include Latent Semantic Analysis, Probabilistic Latent Semantic Analysis, and LDA. Of the three, the LDA model is the most extensive and effective model. The LDA model uses the prior knowledge of topic distribution information provided by the Dirichlet distribution, which describes the document generation process in detail. After the document is processed by the LDA model, the topic distribution of the document and the distribution of words in each topic will be generated. Figure 1 graphically shows the relevant variables and acquisition of the LDA model.  In the LDA model, θ, η and Z are hidden variables, W is the observation data, and the observable values are represented by gray nodes, while the hidden variables are white nodes. The parameters α and η of the Dirichlet distribution are considered constant, α is the topic distribution of each document and η is the word distribution of each topic. K is the number of topics and β k is the distribution of words. W is the only observable variable, W d,n is the nth word in document d, θ d is the topic distribution of dth document, and Z d,n is the distribution of nth word document d. The entire process of the LDA model can be expressed by Formula (7): The goal of the LDA model is to obtain the structure of the topic model (θ, Z, and β). By observing the variable W, the relevant distribution can be obtained, as shown in Formula (2): In this paper, the Gibbs sampling method is used to solve the LDA topic model.

Comment Text Topic Analysis
There are a total of 243,405 valid texts in the comments, an average of 4868 per month, and 160 reviews per day, involving 2460 platforms. The results of the lexicon-based method show that 22.55% of sentiment review texts are negative and 77.45% of sentiment review texts are positive. When the extreme value is not replaced, the average sentiment value of the comment text is 6.61, the minimum value is −28.24, and the maximum value is 97.56. The quarter quartile is 0.92, the median is 3.65, and the three-quarters median is 6.15. Table 2 shows the summary statistics of the lexicon-based method. The results of BERT show that 23.30% of sentiment comments are negative and 76.70% of sentiment comments are positive. It can be seen that, whether using the lexicon-based method or BERT, the proportion of negative sentiment is low. Overall, the lender's perception of P2P lending is relatively positive.

Negative Sentiment Timing Analysis
We divide the comments by month and then calculate the proportion of monthly negative comments to the total monthly texts. Figure 2 shows the proportion of negative sentiment in each month obtained from the lexicon-based and BERT methods.
It can be found from Figure 2 that during the observation period, whether it is the result obtained by the lexicon-based method or the BERT method, the trend of the negative sentiment ratio is the same. However, the proportion of negative emotional texts in the total dataset fluctuated. In general, negative sentiment showed an upward trend. It shows that the overall level of trust of lenders in P2P lending has deteriorated. It can be found from Figure 2 that during the observation period, whether it is the result obtained by the lexicon-based method or the BERT method, the trend of the negative sentiment ratio is the same. However, the proportion of negative emotional texts in the total dataset fluctuated. In general, negative sentiment showed an upward trend. It shows that the overall level of trust of lenders in P2P lending has deteriorated.

Sentiment Changes of the Lender
After obtaining the sentiment value of each comment, we check the lexicon-based method sentiment average of the review text in each month to get the results shown in Figure 3. It can be seen in Figure 3 that the lender's sentiment cognition of P2P lending becomes increasingly negative over time. From November 2015 to December 2019, the emotional value falls significantly. In particular, the sentiment value dropped from the initial high of 8.99 points to 1.17 points in December 2019, though there were occasional repetitions throughout the decline process (the highest point was 9.06 points in March 2016). In general, however, the lender's sentiment of P2P lending decreases, reflecting a decline in their trust in P2P lending. It should be noted that the lender's

Sentiment Changes of the Lender
After obtaining the sentiment value of each comment, we check the lexicon-based method sentiment average of the review text in each month to get the results shown in Figure 3. It can be found from Figure 2 that during the observation period, whether it is the result obtained by the lexicon-based method or the BERT method, the trend of the negative sentiment ratio is the same. However, the proportion of negative emotional texts in the total dataset fluctuated. In general, negative sentiment showed an upward trend. It shows that the overall level of trust of lenders in P2P lending has deteriorated.

Sentiment Changes of the Lender
After obtaining the sentiment value of each comment, we check the lexicon-based method sentiment average of the review text in each month to get the results shown in Figure 3. It can be seen in Figure 3 that the lender's sentiment cognition of P2P lending becomes increasingly negative over time. From November 2015 to December 2019, the emotional value falls significantly. In particular, the sentiment value dropped from the initial high of 8.99 points to 1.17 points in December 2019, though there were occasional repetitions throughout the decline process (the highest point was 9.06 points in March 2016). In general, however, the lender's sentiment of P2P lending decreases, reflecting a decline in their trust in P2P lending. It should be noted that the lender's It can be seen in Figure 3 that the lender's sentiment cognition of P2P lending becomes increasingly negative over time. From November 2015 to December 2019, the emotional value falls significantly. In particular, the sentiment value dropped from the initial high of 8.99 points to 1.17 points in December 2019, though there were occasional repetitions throughout the decline process (the highest point was 9.06 points in March 2016). In general, however, the lender's sentiment of P2P lending decreases, reflecting a decline in their trust in P2P lending. It should be noted that the lender's trust in P2P lending reduces to a certain extent from October 2016 to November 2016 and June 2018 to July 2018. During this period, a large number of P2P lending platforms collapsed. The collapse of a large number of P2P lending platforms saw a decline in the level of lender trust.
To demonstrate the changes in the P2P lending industry, in Figure 4, we illustrate the changes in the number of platforms, transaction amounts, frequency of comments and sentiment values. It can be seen that transaction amounts started to decrease after reaching its peak in March 2017, while the number of platforms has been decreasing. This reduction in transaction amounts and the number Sustainability 2020, 12, 3293 9 of 14 of platforms is very large. This is another aspect of the declining trust of lenders in P2P lending. From Figure 4, it can be seen that there is a highly linear correlation between the number of platforms and sentiment value, which results in a Pearson correlation of 0.902. trust in P2P lending reduces to a certain extent from October 2016 to November 2016 and June 2018 to July 2018. During this period, a large number of P2P lending platforms collapsed. The collapse of a large number of P2P lending platforms saw a decline in the level of lender trust.
To demonstrate the changes in the P2P lending industry, in Figure 4, we illustrate the changes in the number of platforms, transaction amounts, frequency of comments and sentiment values. It can be seen that transaction amounts started to decrease after reaching its peak in March 2017, while the number of platforms has been decreasing. This reduction in transaction amounts and the number of platforms is very large. This is another aspect of the declining trust of lenders in P2P lending. From Figure 4, it can be seen that there is a highly linear correlation between the number of platforms and sentiment value, which results in a Pearson correlation of 0.902. Besides, the overall market yield on P2P lending has fallen from 12.45% in November 2015 to 9.46% in December 2019. During this time, the interbank interest rate first rose and then fell, but P2P lending market yields decreased. The purpose of lenders investing in P2P online platforms is for revenue. A decrease in yields is likely to lead to lower levels of lender trust.
To analyze the lender's trust in P2P lending on a month-by-month basis in more detail, we show how the values of positive and negative emotions change from month to month. The comment data of each month is divided into parts examining positive and negative emotions. The average values of positive emotion and negative emotion in each month are calculated, as shown in Figure 5: Besides, the overall market yield on P2P lending has fallen from 12.45% in November 2015 to 9.46% in December 2019. During this time, the interbank interest rate first rose and then fell, but P2P lending market yields decreased. The purpose of lenders investing in P2P online platforms is for revenue. A decrease in yields is likely to lead to lower levels of lender trust.
To analyze the lender's trust in P2P lending on a month-by-month basis in more detail, we show how the values of positive and negative emotions change from month to month. The comment data of each month is divided into parts examining positive and negative emotions. The average values of positive emotion and negative emotion in each month are calculated, as shown in Figure 5: It can be seen from Figure 5 that the average positive emotion of the lender decreases, which indicates that the lender's trust in P2P lending has declined. Among all the texts, positive sentimental texts take up a big portion, so the trend of positive sentimental text greatly affects the perception of P2P lending. It can also be seen from Figure 5 that the trend of the positive emotion value is similar to the overall trend. The negative sentiment text value increases first and then decreases. And with the collapse of P2P lending platforms, negative emotions become more negative.
Overall, both positive and negative sentiment values are decreasing, which is one reason why lenders have less trust in P2P lending It can be seen from Figure 5 that the average positive emotion of the lender decreases, which indicates that the lender's trust in P2P lending has declined. Among all the texts, positive sentimental texts take up a big portion, so the trend of positive sentimental text greatly affects the perception of P2P lending. It can also be seen from Figure 5 that the trend of the positive emotion value is similar to the overall trend. The negative sentiment text value increases first and then decreases. And with the collapse of P2P lending platforms, negative emotions become more negative.
Overall, both positive and negative sentiment values are decreasing, which is one reason why lenders have less trust in P2P lending.

Comment Text High-Frequency Word Analysis
Regardless of whether the BERT method or the lexicon-based method is used, the negative sentiment text and positive sentiment text obtained by the two have little difference in the overall proportion of the text. Because of this, the overall trend of the two remains the same, with no significant difference in either. Thus, we choose one of the two methods to perform high-frequency word analysis on the comment text. The sentiment dictionary has been segmented during the calculation, which allows us to choose the lexicon-based method for text high-frequency word analysis. Figure 6a,b are the comment word cloud and the top ten high-frequency words, respectively.
It can be seen from Figure 5 that the average positive emotion of the lender decreases, which indicates that the lender's trust in P2P lending has declined. Among all the texts, positive sentimental texts take up a big portion, so the trend of positive sentimental text greatly affects the perception of P2P lending. It can also be seen from Figure 5 that the trend of the positive emotion value is similar to the overall trend. The negative sentiment text value increases first and then decreases. And with the collapse of P2P lending platforms, negative emotions become more negative.
Overall, both positive and negative sentiment values are decreasing, which is one reason why lenders have less trust in P2P lending

Comment Text High-Frequency Word Analysis
Regardless of whether the BERT method or the lexicon-based method is used, the negative sentiment text and positive sentiment text obtained by the two have little difference in the overall proportion of the text. Because of this, the overall trend of the two remains the same, with no significant difference in either. Thus, we choose one of the two methods to perform high-frequency word analysis on the comment text. The sentiment dictionary has been segmented during the calculation, which allows us to choose the lexicon-based method for text high-frequency word analysis. Figures 6a,b are the comment word cloud and the top ten high-frequency words, respectively.  By observing the word cloud and high-frequency word distribution results, we find that the words "investment," "good," and "return" occur most frequently. This shows that although a lot of negative news has surfaced in P2P lending platforms in recent years, lenders still trust this investment channel. Among the twenty words with the highest frequency, words such as "good," "very good," and "reassured" are positive words, and there are no obvious negative words, which also confirms that the lender has a higher level of trust in P2P lending. However, lenders still have a certain degree of reservation about the security of P2P lending, which is reflected in the high frequency of the word "security." Lenders will contact customer service to check relevant information about related issues such as the existence of the platform and the arrival of funds, which will cause the term "customer service" to appear more frequently.
Research by Yin et al. [35] showed that in comparing positive sentiment comment data to negative sentiment comment data, negative sentiment comment texts have a greater impact on readers who read them. Specific to the P2P lending platform, when a reader reviews a negative sentiment comment, lenders may decide not to lend on the platform. Most negative sentiment texts are generated because the platform fails to meet the lender's expectations, and they can better reflect the related problems in P2P lending. The analysis of negative sentiment review texts helps understand the problems associated with the platform and can even enhance a lender's trust in P2P lending. Figure 7 shows the word cloud and high-frequency vocabulary distribution of the negative sentiment review text.
By observing the word cloud and high-frequency words of negative sentiment reviews, we find that the two words that appear most frequently are "customer service" and "runaway." The high-frequency appearance of the word "customer service" in the negative sentiment comment text is unexpected, reminding relevant platforms to pay more attention to customer service. Even on the Internet, lenders also learn about the security of platform construction and funds through channels such as customer service. If customer service is not connected or satisfactory responses are not received in the customer service channel, dissatisfaction with the P2P lending platform will be greatly enhanced.
Loss of contact on the platform is another term that triggers a lot of negative sentiment for lenders, because platform loss of contact not only makes the lender unable to obtain revenue, but also puts the principal itself at risk. When relevant platforms are not maintained in banks, when the interest is not high, and when there is a failure to withdraw cash on time, this causes negative sentiment for lenders. Further, many words appearing in negative sentiment review texts are mostly insults directed toward related platforms, such as "liar" and "junk." that the lender has a higher level of trust in P2P lending. However, lenders still have a certain degree of reservation about the security of P2P lending, which is reflected in the high frequency of the word "security." Lenders will contact customer service to check relevant information about related issues such as the existence of the platform and the arrival of funds, which will cause the term "customer service" to appear more frequently.
Research by Yin et al. [35] showed that in comparing positive sentiment comment data to negative sentiment comment data, negative sentiment comment texts have a greater impact on readers who read them. Specific to the P2P lending platform, when a reader reviews a negative sentiment comment, lenders may decide not to lend on the platform. Most negative sentiment texts are generated because the platform fails to meet the lender's expectations, and they can better reflect the related problems in P2P lending. The analysis of negative sentiment review texts helps understand the problems associated with the platform and can even enhance a lender's trust in P2P lending. Figure 7 shows the word cloud and high-frequency vocabulary distribution of the negative sentiment review text. By observing the word cloud and high-frequency words of negative sentiment reviews, we find that the two words that appear most frequently are "customer service" and "runaway." The highfrequency appearance of the word "customer service" in the negative sentiment comment text is unexpected, reminding relevant platforms to pay more attention to customer service. Even on the Internet, lenders also learn about the security of platform construction and funds through channels such as customer service. If customer service is not connected or satisfactory responses are not received in the customer service channel, dissatisfaction with the P2P lending platform will be greatly enhanced. Loss of contact on the platform is another term that triggers a lot of negative sentiment for lenders, because platform loss of contact not only makes the lender unable to obtain revenue, but also puts the principal itself at risk. When relevant platforms are not maintained in banks, when the interest is not high, and when there is a failure to withdraw cash on time, this causes negative sentiment for lenders. Further, many words appearing in negative sentiment review texts are mostly insults directed toward related platforms, such as "liar" and "junk."

Comment Text Topic Analysis
A total of four topics are limited to the comment text in the topic analysis. Table 3 shows the results of the topic analysis on the comment text using the LDA topic model.

Comment Text Topic Analysis
A total of four topics are limited to the comment text in the topic analysis. Table 3 shows the results of the topic analysis on the comment text using the LDA topic model. The main topics of the lender's review text are as follows: Topic 1: yield. Putting funds into P2P lending to obtain income is a financial management method for lenders. When a lender chooses a platform for investment, they will compare the yield of various platforms and other financial management devices. Topics 2 and 3: safety. Due to the risk of platform collapse, lenders have become more cautious about P2P lending, and lenders are very concerned about the safety of their funds. This is specifically reflected in the lender's concern about whether the funds can be withdrawn smoothly when they expire.
Topic 4: negative platform evaluations. The collapse of a large number of platforms is an important reason why lenders' trust in P2P lending is decreased. Whether the platform is running or overdue is another major issue for lenders. Overdue performance mainly comes about as a result of the relevant project not being repaid on time after its expiration. The failure of the project to arrive on time will cause the lender to have doubts about the legitimacy of the platform. Because negative comments can better show the lender's doubts about certain issues, we further analyze the negative sentiment text of P2P lenders and use the topic model to explore the topic of negative sentiment review text.
For the negative sentiment review text, we draw four topics as shown in Table 4. The topics that trigger negative emotions of lenders are mainly concentrated on the following points: Topic 1: the collapse of the P2P lending platform; Topic 2: the compliance of the platform, and whether the relevant platform has realized bank deposits, which is an important part of the platform's ability to ensure its compliance and fund security measures; Topic 3: failure to withdraw cash when due, which does not necessarily mean that a problem has occurred on the platform, but rather that it simply cannot be reached in time. The lender is likely to choose to contact customer service to resolve the doubt. If the customer service cannot give a satisfactory response, the lender's evaluation of the platform is likely to be negative; Topic 4: lenders need to pay fees during the recharge or withdrawal process, such as handling fees or management fees. The level of such fees is also a concern of the lender. Likewise, if the lender's funds are left idle during the withdrawal period, it is likely to generate negative sentiments from the lender.

Conclusions
As a new type of financial model, the P2P lending platform provides services that allow lenders to lend money directly to borrowers. In the context of the frequent emergence of bad news regarding P2P lending, we have sought to resolve the question of whether the lender's level of trust in P2P lending has changed, and whether lenders pay more attention to those issues when investing. This article uses lenders' emotions as a trust-measurement tool and uses the lexicon-based method as well as the deep learning method to quantitatively measure the lenders' trust and changes in P2P lending. Our research also uses the LDA topic model to mine the P2P lending-related issues that the lender cares about. We find the lender's sentiment cognition of P2P lending becomes increasingly negative over time.
Lenders are most concerned about the yield, security, and compliance of P2P lending. The negative sentimental text topic also focuses primarily on security and compliance.
Compared to traditional lenders, P2P lending platforms offer lower fees, more flexible terms, and faster loan approvals. However, the trust crisis in P2P lending is affecting the sustainability of the industry, so P2P lending platforms need to improve compliance, yields, and security to improve the platform's operations and ensure the industry's sustainability. Although our research used data from China, other countries can also be used to analyze the development of P2P lending to promote the sustainability of the P2P lending industry.
In the future, other data such as that of Twitter may be collected and used to assess the P2P lending for sentiment analysis. Other deep learning algorithms may also be used to adapt as a dynamic update mechanism to handle streamed text data.