Assessing and Predicting Green Credit Risk in the Paper Industry

The paper industry is closely related to forestry resources, which constitute an essential part of achieving sustainable development. Green credit can provide financial support to assist the paper industry in achieving carbon neutrality. To develop a method for performing green credit risk assessments in the paper industry, first, an initial index system was established on the basis of two dimensions: financial risk and socio-environmental risk. Then, the KMV model was applied to measure credit risk. The combined results of this model, along with the environmental penalties of an enterprise, formed the basis for the classification of green credit risk. Third, the Gini index was used to filter out, one by one, the indexes with the least influence among the factors, and then random forest iterations were performed until the prediction accuracy reached the optimum, thus establishing a green credit risk prediction model for the paper industry. The results show that the accuracy of the sample classification reached 93.75%, and the accuracy of the sample classification for high-risk enterprises reached 100%. The established index system offers good guidance for the assessment of green credit risk in the paper industry, in which the interest coverage ratio, current ratio, asset-liability ratio, and green emissions are the main factors affecting green credit risk.


Introduction
Nowadays, much more attention is paid to environmental issues in the pursuit of economic growth [1,2]. To speed up reform, green credit will be an important aspect of promoting and achieving ecological progress. Highly polluting industries cause great damage to the environment. It is necessary for the traditional paper industry to cut down a large number of trees, destroying the balance of the original ecosystem, while the paper-making process is accompanied by pollution such as sewage and odors. Therefore, industrially upgrading the paper industry is an imminent necessity. Industrial upgrade and green credit are closely linked, as the paper industry requires the support of green credit in order to be able to carry out technological innovation. The green trade barrier refers to a series of import restriction measures formulated by importing countries in order to protect natural resources, the ecological environment, and human health when carrying out activities related to international trade. As a traditional industry with high pollution and high energy consumption, if the level of green paper-making technology employed by enterprises remains relatively low, these enterprises could easily suffer negative consequences due to green trade barriers, which will not be conducive to the future development of the industry. The paper industry also needs the support of green credit in order to be able to implement policy. China launched "the combination project of forest and paper" policy with the aim of developing commercial forests while taking forest carbon sequestration into account in order to achieve the strategic goal of sustainable development. Green credit risk assessments in the paper industry help commercial banks avoid the risk of default arising due to the imposition of environmental penalties on enterprises whose emissions do not meet environmental requirements. It is helpful to provide financial support for genuine environmental projects and green transformation in the paper industry, as these not only enhance the profitability of commercial banks in terms of green credit, but also promote the flourishing of green industries, assisting with the realization of the goal of sustainable development.
The management of credit risk in banks involves two main aspects: on the one hand, the systemic risk faced by the bank in the course of its own operation, and on the other hand, the credit risk generated by the bank in the course of its operation as a credit business. There are many different methods for measuring systemic risk. Marginal expected shortfall (MES) is often used as an index for measuring systemic risk, and this index has been used to establish a model for risk prediction [3,4]. The SRISK function was constructed to make systemic risk assessment results more stable, such that they would not be affected by short-term market noise [5]. To measure the spillover effect of systemic risk, the widely used CoVaR model was constructed [6]. Domestic and foreign banks focus greatly on the assessment and prediction of credit risk. As a result, many scholars have begun to study credit risk. The Kealhofer, Mcquown and Vasicek (KMV) model, based on the Black-Scholes-Merton (BSM) model, is often used to monitor credit risk [7], and effectively predicts credit risk and reflects the default risk degree of an enterprise [8,9]. To improve risk resistance ability, scholars have used the generalized autoregressive conditional heteroskedasticity (GARCH) model and the threshold regression model to modify the important parameters of the KMV model, which was then used to assess the impact of changes caused by the COVID-19 pandemic on the credit risk of China's healthcare industry [10]. The KMV model has also been used to study the impact of China's insurance industry on credit risk, providing a reference for the formulation of financial stability policies [11]. The KMV model has been applied not only in the assessment of the credit risk of listed companies, but also in the assessment of government debt risk. The governments of all countries are facing an increasing number of financial risks and uncertainties [12]. Scholars have used the KMV model to measure government debt risks, and have studied and analyzed the government debt of emerging countries [13], Chinese government debt [14], and local government debt [15]. With the advent of the big data era, machine learning methods such as support vector machines (SVMs), artificial neural networks (ANNs), and the classification and regression tree (CART), which present few limitations and high data processing power, have gradually begun to be applied in the field of credit risk assessment [16][17][18]. Some scholars have combined the correlation vector machine and the decision tree to form a decision support model (EDSM), and used multicriteria decision-making methods to determine early warning mechanisms, thus improving the accuracy of credit risk assessments, overcoming issues related to timeliness, and reducing the cost of credit risk [19]. To overcome the failures of the fuzzy comprehensive evaluation in the credit risk evaluation of commercial banks, some scholars have built risk evaluation models based on the Hopfield neural network, which can reflect the current credit risk status [20]. Scholars have extended credit risk assessment research to different fields. To alleviate the financing difficulties of small and medium-sized enterprises (SMEs), the credit risk prediction of supply chain finance has become the key to making financing decisions [21][22][23]. With the increasing proportion of personal loans, personal credit risk has been rising, whether for bank loans or peer-to-peer (P2P) network loans. Thus, scholars have conducted an individual credit risk assessment to reduce personal credit risk [24,25]. Some scholars improved the BP neural network model and constructed an integrated IDGSO-BP model for micro-and small enterprises that can effectively improve the accuracy of credit risk assessments [26].
Green finance integrates environmental protection and economic interests [27]. In China, banks use corporate environmental protection as one of the bases for extending green credit. It not only improves the net profit of commercial banks [28], but also reduces the credit risk [29] and environmental pollution risk [30] of commercial banks. Because of the gradual expansion of green credit in China, many scholars have researched green credit risk assessment. Some scholars found that the risks of the wind power and photovoltaics industries were similar to those of traditional industries, but the risk gap between companies was more extensive [31]. To address the issue of innovation benefits and risks arising from green credit in commercial banks, the grey relational analysis (GRA) was used to find that more than half of the banks studied developed green credit with higher benefits and lower risks than the traditional credit business [32]. An improved KMV model was used to study the green credit risk in the new-energy vehicle industry and its internal subsectors. It was found that the risk of the new-energy vehicle industry was similar to that of traditional industries. Still, there were differences between subsectors [33]. Some scholars quantitatively analyzed the green credit risk of listed companies in different sectors based on the backpropagation neural network (BPNN) model. It was found that mediumto high-risk companies accounted for 11.5%, and most companies in the coal industry faced a higher green credit risk [34].
According to previous studies, scholars have conducted deep research on credit risk assessments, using artificial intelligence, big data, and other methods to perfect them. Later, the credit risk assessment method was applied to the field of green credit, where green factors have not been really considered. The green credit research fields are mostly in lowpolluting and low-energy-consuming sectors, such as the new-energy industry, and there is no research on high-polluting and high-energy-consuming sectors, such as the paper industry. Therefore, the characteristics and contributions of this paper are: Firstly, the index system not only takes the green factors into account, but also has paper industry specificity. Considering the paper industry's financial and socio-environmental responsibility, a set of green credit risk evaluation index systems are established. Secondly, compared to other models, the random forest model has a higher accuracy in data mining. This paper combines the random forest with the KMV model, which is widely used in credit risk assessments, to establish the assessment model.

Materials and Methods
This study was divided into four steps (see Figure 1). The first step was to set the initial evaluation index system for the green credit risk of listed paper-making enterprises as the input value of the random forest. The initial evaluation index system included two dimensions: the financial index and the socio-environment index. The second step was to classify the green credit risk for enterprises based on the KMV model. The combined results of this model, along with the environmental penalties of an enterprise, formed the basis for the classification of green credit risk. The third step was to conduct random forest iterations. The Gini index was used to continuously screen out the least significant indicators to explore the main factors affecting the green credit risk of the paper industry. The fourth step was to establish the paper industry's final green credit risk assessment prediction model.

Construct the Initial Evaluation Index System
A reasonable evaluation index system is the basis of the green credit risk assessment, whereas a practical and professional evaluation index system can improve the accuracy and validity of the evaluation results. The green credit implementation regulations of major Chinese commercial banks and the Green Credit Guidelines issued by the China Banking Regulatory Commission (CBRC) in 2012 indicate that the green credit risk assessment index system should not only involve the financial status of enterprises, but also take social and environmental risks into account. Moreover, profitability is related to environmental performance [35], and the assessment of green credit risk can consider more than just the financial performance of enterprises. Therefore, based on the physical truth, this paper established a green credit risk assessment index system for the paper industry to provide a comprehensive response to corporate green credit risk from the financial and socio-environmental perspectives.

Construct the Initial Evaluation Index System
A reasonable evaluation index system is the basis of the green credit risk assessment, whereas a practical and professional evaluation index system can improve the accuracy and validity of the evaluation results. The green credit implementation regulations of major Chinese commercial banks and the Green Credit Guidelines issued by the China Banking Regulatory Commission (CBRC) in 2012 indicate that the green credit risk assessment index system should not only involve the financial status of enterprises, but also take social and environmental risks into account. Moreover, profitability is related to environmental performance [35], and the assessment of green credit risk can consider more than just the financial performance of enterprises. Therefore, based on the physical truth, this paper established a green credit risk assessment index system for the paper industry to provide a comprehensive response to corporate green credit risk from the financial and socio-environmental perspectives.
This paper selected solvency, profitability, operating capacity, and growth ability as the primary indexes to measure the financial credit risk of listed enterprises [36,37]. Each dimension selected four related subindicators as secondary indexes. Solvency is the main index to determine whether an enterprise can repay debts. We divided the solvency into four secondary indexes: the current ratio, quick ratio, asset-liability ratio, and interest coverage ratio. The current ratio, quick ratio, and asset-liability ratio were used to measure the liquidity of the company's assets, and the interest coverage ratio was used to measure the company's ability to pay interest. Profitability is the main index to determine the efficiency of debt repayment. We divided the profitability into four secondary indexes: ROE, ROA, net asset value per share, and net interest rate in sales. Operating capacity is the main index that determines the efficiency of the capital operation of an enterprise. We This paper selected solvency, profitability, operating capacity, and growth ability as the primary indexes to measure the financial credit risk of listed enterprises [36,37]. Each dimension selected four related subindicators as secondary indexes. Solvency is the main index to determine whether an enterprise can repay debts. We divided the solvency into four secondary indexes: the current ratio, quick ratio, asset-liability ratio, and interest coverage ratio. The current ratio, quick ratio, and asset-liability ratio were used to measure the liquidity of the company's assets, and the interest coverage ratio was used to measure the company's ability to pay interest. Profitability is the main index to determine the efficiency of debt repayment. We divided the profitability into four secondary indexes: ROE, ROA, net asset value per share, and net interest rate in sales. Operating capacity is the main index that determines the efficiency of the capital operation of an enterprise. We divided the operating capacity into four secondary indexes: inventory turnover, total assets turnover, accounts receivable turnover, and current assets turnover. Growth capability is the main index that determines the future development capability of an enterprise. We divided the growth capability into four secondary indexes: the revenue growth ratio, net profit growth ratio, net asset growth ratio, and EPS growth rate.
We measured the socio-environmental index from two perspectives: social responsibility and environmental performance. Enterprise social responsibility was measured by the social responsibility score published by Hexun.com. The Hexun Social Responsibility Score for listed companies is the score that Hexun calculates by systematically evaluating corporate social responsibility undertakings through a system of relevant indicators, with scores published on Hexun.com. There is a strong correlation between enterprise environmental disclosure and environmental pollution [38]. Therefore, we divided the environmental performance index into two secondary indexes: green culture and green emissions. The connotation of green culture is rich, and thus, it is impossible to define green culture accurately. In this paper, the keywords appearing in corporate social responsibility reports (CSRs) or annual reports were used to represent enterprise green culture [39], which include social responsibility, energy saving, emission reduction, green ecology, sustainable development, biodiversity, low-carbon production, and recycling. Each keyword appearing in the report was assigned a score of 1. The maximum cumulative score was 8 points, and the minimum was 0 points.
The granting of green credit also needs to be based on the evaluation of the environmental protection situation of polluting industries, such as resource consumption, pollution emissions, and environmental disclosure, in their daily production and business activities. The better the environmental credentials of the enterprise, the higher the degree of pollution treatment. Therefore, this paper used the major pollution emissions of the industry as a proxy for environmental performance [40]. The main pollutant emissions from the paper industry include chemical oxygen demand (VC) and ammonia nitrogen (VN) in wastewater, and sulfur dioxide (VS) and nitrogen oxides (VO) in the exhaust gas. Therefore, VC, VN, VS and VO were included as measures when considering the green value of an enterprise. To exclude the influence of the size of listed companies on pollutant emissions, this paper used annual emissions of a particular pollutant divided by the company's total operating revenue in the current year to represent unit emissions. After that, the entropy-weighted-TOPSIS [41] model was used to measure the green emission score of the sample.
According to the above process, the final green credit risk assessment index system was obtained under two dimensions of the financial and social environment (see Table 1), totaling 19 initial indexes. The index system not only avoided the subjectivity of qualitative indicators, but also made the selection of indicators more industry-specific than the traditional index system.

Enterprise Green Credit Risk Classification
This paper comprehensively considered credit risk and environmental risk to measure whether paper-making enterprises had a high potential for green credit risk.

The Basis for Classifying Enterprise Green Credit Risk
In terms of credit risk, the KMV model was used to calculate the credit risk of enterprises in different years. We must select a standard as the main basis for classifying enterprise credit risk. That is, the paper-making enterprise did not have a high credit risk and was deemed qualified if the default distance (DD) was not lower than a certain number, whereas the paper-making enterprise had a high credit risk and was deemed unqualified if the DD was lower than a certain number. The focus of green credit origination also includes environmental risks. This paper decided whether there was an environmental pollution incident as the basis for "green" judgment. Reviewing the environmental penalty information of each paper-making enterprise from 2017 to 2021, if there were major environmental pollution incidents or fines, that means that the company lacks environmental protection and should be set as unqualified. If there were no major environmental pollution incidents or fines, that means that the company works well with regard to environmental protection and should be set as qualified. We used the combined results of the DD and environmental penalties to classify green credit risk for paper-making enterprises. The output value is 0 or 1. That is, the random forest output value is 1 if one of the paper-making company's credit risk assessment and environmental risk assessment fails, and the random forest output value is 0 if the paper-making company passes both.

Credit Risk Assessment Model
The KMV model has wide applicability and is often used in credit risk assessment. The model evaluates the credit risk of listed companies by calculating the expected default frequency (EDF) from the asset volatility and asset value. The discriminatory accuracy of EDF in China is inferior to that of DD [42]; therefore, DD is often used to judge the credit risk of listed companies. The credit risk assessment based on the KMV model is divided into three steps: The first step is to figure out the asset value and the annual volatility of asset values. The key to the KMV model is to figure out each enterprise's asset value and annual volatility of asset value through the enterprise equity value and annual volatility of equity value. The simultaneous equations are as Equation (1) shows: V E is the equity value of a listed enterprise, σ E is the equity value volatility of a listed enterprise, V A is the asset value of a listed enterprise, σ A is the asset value volatility is the enterprise book value, r is the risk-free rate, and T is the repayment term, all of which can be acquired from publicly available market data. The equity value volatility in Equation (1) is calculated using the historical volatility method, and the formula is as follows: The daily standard deviation of the stock is σ n = , and n is the number of days the stock of the listed company is traded in a year.
The second step is to figure out the default point (DP). By studying historical default data, KMV found that the default condition is only partially valid when the enterprise asset value is lower than the debt value due to long-term debt (LTD). Enterprises choose to default when the value of assets is less than half of the value of short-term debt (STD) plus the value of long-term debt. The formula is as follows: The third step is to figure out the value of the DD. The value of the DD represents the distance of the enterprise market asset value from the DP. Enterprises are less likely to default if the value of the DD is long, and enterprises are more likely to default if the value of the DD is short. Apply σ A and V A , obtained by the Newton iteration method, to Equation (4):

Green Credit Risk Prediction Model
In this paper, we used the random forest model for green credit risk prediction in the paper industry. The initial evaluation index value was used as the input value of the random forest, and the conclusion of green credit risk classification was used as the output value of the random forest.
Random forest is a classification model that combines random sampling and decision trees. The indexes of all samples are entered into the random forest. After that, the sample size and indexes are randomly selected by the bootstrap sampling method to construct the decision tree models. Each decision tree model obtains a different classification result, and thus, the decision tree classifier produces a judgment with the principle of the minority obeying the majority to improve the prediction accuracy. The principle of the random forest algorithm is shown in Figure 2. random forest, and the conclusion of green credit risk classification was used as the output value of the random forest. Random forest is a classification model that combines random sampling and decision trees. The indexes of all samples are entered into the random forest. After that, the sample size and indexes are randomly selected by the bootstrap sampling method to construct the decision tree models. Each decision tree model obtains a different classification result, and thus, the decision tree classifier produces a judgment with the principle of the minority obeying the majority to improve the prediction accuracy. The principle of the random forest algorithm is shown in Figure 2. As a machine learning algorithm, the prediction accuracy of random forest can be further improved by iterative computation on the underlying model. The Gini index applied to the random forest model can discern the importance of the input indexes [43]. The index importance analysis can identify important variations to reduce the dimensions of the original index system, save computing costs, and improve prediction accuracy. The Gini index score determines the importance of each index in predicting green credit risk, and the larger the value that is indicated, the greater the importance. The specific calculation process is as follows: represents the Gini index; represents the category; and represents the proportion of category in node .
Equation (6) shows the importance of the index at node of the th tree. represents the score of variable importance; represents the number of decision trees; represents the indexes 1, 2, 3, … , ; and ( ) and ( ) represent the Gini index of two new nodes after branching. As a machine learning algorithm, the prediction accuracy of random forest can be further improved by iterative computation on the underlying model. The Gini index applied to the random forest model can discern the importance of the input indexes [43]. The index importance analysis can identify important variations to reduce the dimensions of the original index system, save computing costs, and improve prediction accuracy. The Gini index score determines the importance of each index in predicting green credit risk, and the larger the value that is indicated, the greater the importance. The specific calculation process is as follows: GI represents the Gini index; C represents the category; and P qc represents the proportion of category c in node q.
Equation (6) shows the importance of the index X j at node q of the ith tree. V I M represents the score of variable importance; I represents the number of decision trees; J represents the indexes X1, X2, X3, . . . , X J; and GI Suppose Q is the ensemble of nodes where index X j appears in decision tree i; then, the importance of X j in the ith tree is shown in Equation (7): Suppose there are I trees in the random forest. Equation (8) can determine the V I M: The normalized processing of V I M is calculated by Equation (9):

Date Sources
We selected 26 A-share listed paper-making enterprises based on the research period between 2017 and 2021 for the study. The total number of A-share listed paper-making enterprises in China is 40. There was no mandatory requirement for enterprises to publish pollution emission data until 2022, and therefore, the published data were not comprehensive. This paper screened out the paper-making enterprises that were missing serious environmental information disclosures, and the remaining enterprises numbered 26. This paper selected 2017-2021 as the measurement interval because of the incomplete disclosure of corporate environment-related data before 2017. December 31 of each of the five years was used as the base date for the measurements. If there were paper-making companies listed after 2017, the year after listing was taken as the empirical interval.
The paper-making industry is a highly polluting industry, and its main pollutant emissions contain chemical oxygen demand (VC) and ammonia nitrogen (VN) in wastewater, and sulfur dioxide (VS) and nitrogen oxides (VO) in exhaust gases. As a result, we took the pollutants mentioned above as the influencing factors of green emissions. The annual reports and CSRs of each company from 2017 to 2021 were reviewed to obtain the emissions of chemical oxygen demand (VC), ammonia nitrogen (VN), sulfur dioxide (VS), and nitrogen oxides (VO). Then, we screened out the years with incomplete environmental information disclosures of 26 paper companies. Finally, we obtained 104 samples for the study. All financial data of listed enterprises came from the CSMAR, and pollution emission data came from corporate annual reports or CSRs.

Case Study of Green Emissions
The green emission score of each enterprise was calculated with the help of MATLAB (see Table 2). On the one hand, the degree of environmental pollution levels of different paper-making enterprises varies greatly. The highest score was for Yuto Packaging Technology from 2019 to 2020, almost reaching a full score and indicating that the enterprise had light environmental pollution. The lowest score was for Yueyang Forest & Paper in 2017, with a score of 0.3717, indicating that the enterprise had serious environmental pollution. On the other hand, the green emission score fluctuated widely in different years, even for the same paper-making enterprise. The most volatile fluctuation was found for Songyang Recycle Resources, with a variance of 0.02792, indicating that the severity of environmental pollution was variable. The least volatile fluctuation was found for Yuto Packaging Technology, whose score was the top score over the past five years, indicating that the enterprise pays more attention to pollutant emissions.

Parameter Settings for Enterprise Credit Risk Assessment
We evaluated credit risk using the KMV model with the following parameter settings: (1) Parameter setting for the risk-free rate (r).
In this paper, we decided to use the one-year fixed deposit interest rate published by the People's Bank of China (PBC) to make up for the gap in China's official risk-free rate data. If an interest rate adjustment was encountered during the year, the calculation formula would be r = ∑ n 1 r * * the number o f sequential days f or r *

the number o f days in the year
. r * is the real interest rate announced by the People's Bank of China.
(2) Parameter setting for the repayment term (T).
The repayment period was set as one fiscal year, T = 1.
(3) Parameter setting for the equity value (V E ).
In the complex situation that the shareholding reform has been completed and the stock market is mixed with "tradable shares" and "non-tradable shares" at the same time, this paper defined the equity value as follows: enterprise equity value (V E ) = closing price of shares at the end of the year (p) * the number of tradable shares (Q C ) + net assets per share (N APS) * the number of non-tradable shares (Q n ).

Case Study of Credit Risk Assessment
The credit risk assessment of each enterprise was calculated with the help of MAT-LAB (see Table 3). Observing the default distance of each company, we can find that the greatest default distance was obtained by Forest Packing in 2020, with a distance of 5.1820, indicating a relatively low credit risk in that year. The least default distance was obtained by the Chenming Group in 2019, with a distance of −7.9213, indicating a relatively high credit risk in that year. The maximum difference in default distance between listed enterprises was 13.1033, indicating a large credit risk gap between enterprises. In terms of the default distance of the same enterprise over five years, Xianhe Corp. had the most volatile fluctuation in the default distance with a variance of 5.6656, indicating that the credit risk was unstable. Songyang Recycle Resources had the least volatile fluctuation in the default distance with a variance of 0.0287, indicating that the credit risk was stable. First, we can find that most paper-making enterprises had a default distance between −2 and 3. Thus, the probable criteria for judging enterprise default should be between −2 and 3. Moreover, determining the average value of the default distance of these 104 samples was used to clarify the criteria for judging enterprise default. The mean value was 1. That is, if the default distance was less than 1, the credit risk of the paper-making enterprise should be unqualified. Meanwhile, the credit risk of the paper-making enterprise was determined to be qualified if the default distance was greater than or equal to 1.

Parameter Settings for Random Forest
Firstly, we divided all samples into a training set and test set. Bootstrap sampling in the random forest model only contains 37% of the original samples, which means that about 67% will not appear in the training set. Therefore, the ratio of the training set to the test set is generally 7:3. In this paper, we put 72 samples into the training set (n_train), and 32 samples into the test set (n_test).
Secondly, we determined the sampling times (n_tree). The accuracy of the random forest improves to be stable with the increase in sampling times. Therefore, the sampling times were set to 3000 in this paper (n_tree = 3000).
Finally, the number of indexes (m_features) needed to be adjusted, which is one of the most important parameters of the random forest. As the number of indexes increases, the accuracy of the random forest will also improve. However, if the number of indexes continues to rise, the correlation between decision trees will also increase. In other words, the accuracy of the random forest model will decrease as the number of indexes increases.
Generally, the random forest complexity is moderate when m_features = √ M M . We assumed that the initial value of m_features was 0.3.
We used Python to construct the initial random forest algorithm based on the parameters set above. To figure out the optimal parameters given by printing the gradient optimization, we input 0.2, 0.3, and 0.4 into the random forest, which was carried out to find the initial value of m_features and its proximal minima. It can be seen that when m_features=0.4, the random forest model error rate of the test set was the smallest (12.5%), and the model accuracy was the highest (see Table 4). The importance of the 19 indexes in the random forest was measured by the Gini index (see Figure 3). Generally, the random forest complexity is moderate when m_features = √M . We assumed that the initial value of m_features was 0.3.
We used Python to construct the initial random forest algorithm based on the parameters set above. To figure out the optimal parameters given by printing the gradient optimization, we input 0.2, 0.3, and 0.4 into the random forest, which was carried out to find the initial value of m_features and its proximal minima. It can be seen that when m_fea-tures=0.4, the random forest model error rate of the test set was the smallest (12.5%), and the model accuracy was the highest (see Table 4). The importance of the 19 indexes in the random forest was measured by the Gini index (see Figure 3). X18 had the least influence on the model and was removed according to the importance ranking, and then, the remaining 18 indexes were used to train the random forest again until the misclassification rate was minimized. After the above steps, we can find that the error rate of the model was the lowest when the number of indexes decreased to X18 had the least influence on the model and was removed according to the importance ranking, and then, the remaining 18 indexes were used to train the random forest again until the misclassification rate was minimized. After the above steps, we can find that the error rate of the model was the lowest when the number of indexes decreased to 11, and at this point, m_features = 0.2 (see Table 5). In the end, the minimum misclassification rate of the random forest was 6.25%, which meant that the prediction accuracy reached 93.75%. The model covered 11 indexes (see Figure 4), which are, in order of importance, the interest coverage ratio (X04), current ratio (X02), asset-liability ratio (X01), green emissions (X19), inventory turnover (X09), revenue growth ratio (X13), net profit growth ratio (X14), ROE (X05), EPS growth ratio (X16), current assets turnover (X12), and total assets turnover (X10).  Figure 4), which are, in order of importance, the interest coverage ratio (X04), current ratio (X02), asset-liability ratio (X01), green emissions (X19), inventory turnover (X09), revenue growth ratio (X13), net profit growth ratio (X14), ROE (X05), EPS growth ratio (X16), current assets turnover (X12), and total assets turnover (X10). We set the socio-environmental indexes from the perspective of the degree of corporate social responsibility (social responsibility score), environmental awareness (green culture), and actual environmental protection actions (green emissions). After the random forest iteration, green emissions was the only one left, which meant that enterprises' actual environmental protection actions played a more important role in the classification of green credit risk.

Robustness Test
This paper had assumed that a DD less than 1 or enterprises subjected to environmental penalties had potential green credit risks. This assumption was adjusted to test the model's robustness further. That is, a DD less than 0.5 or the enterprises subjected to environmental penalties had potential credit risks. Firstly, the variable parameter involved in the above assumption was set as i. Secondly, the green credit risk classification result obtained as i=0.5 was substituted into the random forest model. Finally, the above model construction and iteration process was repeated. The results show that when the misclassification rate of the random forest drops to a low level, the final indexes of the model include the interest coverage ratio, current ratio, asset-liability ratio, green emissions, inventory turnover, current assets turnover, revenue growth ratio, quick ratio, net profit growth ratio, total assets turnover, EPS growth ratio, and other indicators, but the order of importance was slightly adjusted. It can be concluded that the above indicators were exactly the main factors which affected the green credit risk (see Table 6), and the previously established random forest model was robust. We set the socio-environmental indexes from the perspective of the degree of corporate social responsibility (social responsibility score), environmental awareness (green culture), and actual environmental protection actions (green emissions). After the random forest iteration, green emissions was the only one left, which meant that enterprises' actual environmental protection actions played a more important role in the classification of green credit risk.

Robustness Test
This paper had assumed that a DD less than 1 or enterprises subjected to environmental penalties had potential green credit risks. This assumption was adjusted to test the model's robustness further. That is, a DD less than 0.5 or the enterprises subjected to environmental penalties had potential credit risks. Firstly, the variable parameter involved in the above assumption was set as i. Secondly, the green credit risk classification result obtained as i = 0.5 was substituted into the random forest model. Finally, the above model construction and iteration process was repeated. The results show that when the misclassification rate of the random forest drops to a low level, the final indexes of the model include the interest coverage ratio, current ratio, asset-liability ratio, green emissions, inventory turnover, current assets turnover, revenue growth ratio, quick ratio, net profit growth ratio, total assets turnover, EPS growth ratio, and other indicators, but the order of importance was slightly adjusted. It can be concluded that the above indicators were exactly the main factors which affected the green credit risk (see Table 6), and the previously established random forest model was robust.

Discussion of Results
(1) The DD calculated by the KMV model showed the credit difference between paper enterprises, which can distinguish the credit risk status faced by other paper enterprises. Even the DD of the same paper enterprise in different years could fluctuate sharply; thus, enterprises should strengthen the stability of their risk management. (2) The random forest model had a high prediction accuracy for the green credit risk assessment of the paper industry, as shown in Table 5. The prediction accuracy of the model on the test set reached 93.75%. This indicates that the random forest can effectively assess the green credit risk of paper-making enterprises, which further proves the scientificity of the index system and the effectiveness of the assessment model constructed in this paper. (3) Commercial banks will decide whether to extend green credit based on the forecast results. Generally speaking, commercial banks will suffer a greater loss when they extend credit to enterprises with a high credit risk and normal prediction results, compared with enterprises with a normal credit risk and high prediction results. It can be seen from Table 5 that the correct rate for the classification of high green credit risk enterprises in the paper industry reached 100%. It was further shown that the green credit risk assessment model based on the random forest algorithm for paper enterprises was applicable, and can effectively prevent the loss of commercial banks because of misjudgment. (4) Figure 4 shows 11 indexes, including 5 primary indexes from two perspectives. The interest coverage ratio, current ratio, and asset-liability ratio belong to solvency. The inventory turnover, current assets turnover, and total assets turnover belong to operating capacity. The revenue growth rate, net profit growth rate, and EPS growth rate belong to growth capacity. The inventory turnover rate belongs to operating capacity. The green emission belongs to Socio-environmental. Therefore, the green credit risk of paper-making enterprises was the result of a combination of two dimensions: credit risk and socio-environmental risk. (5) Based on the Gini index, the interest coverage ratio (X04), current ratio (X02), assetliability ratio (X01), and green emissions (X19) had been ranked relatively high in terms of their impact on the green credit risk of the paper industry. They can be considered as the main factors affecting the green credit risk of the paper industry.

Conclusions
This paper extended the application of the random forest model to the field of green credit research. The empirical analysis used 104 samples from 26 paper-making enterprises and established a green credit risk assessment model for the paper industry based on the random forest. The conclusions are as follows: Environment and finance, two main parts of the green credit risk evaluation index system, had a significant impact on assessing the credit risk of the enterprise. On one hand, it was reflected that the green credit risk index system of the paper industry established in this paper had a good reference for green credit risk assessment. Among them, the interest coverage ratio, current ratio, asset-liability ratio, and green emissions had a greater impact on green credit risk in the paper industry. Therefore, commercial banks need to focus on solvency and environmental performance when assessing the green credit risk in the paper industry. On the other hand, the random forest had an excellent performance in green credit risk assessment. The accuracy of sample classification for high-risk enterprises reached 100%. This means that the random forest had an excellent evaluation effect and a high practical value in green credit risk assessment, and can effectively reduce the influence of subjective factors.
The innovation point of this paper mainly involved two aspects. Firstly, the green credit risk index system of the paper industry was constructed with strong industry pertinence, considering both financial and socio-environmental aspects. In particular, the secondary index of green emissions was designed to identify pollution discharge in the paper industry. Secondly, the research method combined the random forest and KMV model. We used the Gini index to continuously select indexes and conducted random forest iterations to establish the model with the best prediction effect. The green credit risk prediction model of the paper industry established in this paper had a high accuracy, which provided some reference for domestic and foreign commercial banks to issue green credit. The development of green credit is an irreversible trend, and thus, we should promote the implementation of green credit to achieve the goal of carbon neutrality as early as possible.
Due to the limited number of Chinese paper enterprises and the polluting emission data of each enterprise not being fully disclosed, the advantages of machine learning with large random forest samples was not the best due to the limitation of sample size. In the future, the model can be adjusted to improve the prediction accuracy by continuously expanding the sample size. Moreover, the green index system in China has not been standardized. If China publishes a complete set of green index systems in the future, the selection of socio-environmental indexes in this paper can be further strengthened.

Data Availability Statement:
The data analyzed in this study are subject to the following licenses/ restrictions: Belong to Study Group Requirements. Requests to access these datasets should be directed to Y.C., sanchen007@njfu.edu.cn.

Conflicts of Interest:
The authors declare no conflict of interest.