Identifying and Predicting the Credit Risk of Small and Medium-Sized Enterprises in Sustainable Supply Chain Finance: Evidence from China

: COVID-19 has created a strong demand for supply chain ﬁnance (SCF) for small and medium-sized enterprises (SMEs). However, the rapid development of SCF leads to more complex credit risks. How to effectively discriminate and manage SMEs to reduce credit risk has become one of the most critical issues in SCF. In addition, sustainable SCF (SSCF) has received increasing attention, and credit risk management is important to achieve SSCF. Therefore, it is signiﬁcant to identify the key factors inﬂuencing the credit risk of SMEs and construct a prediction model to promote SSCF. This study uses the lasso-logistic model to identify factors inﬂuencing the credit risk of SMEs and to predict the credit risk of SMEs. The empirical results show that (i) the key factors inﬂuencing SMEs’ credit risk include six variables—the matching degree of order data, ratio of contract enforcement, number of contract defaults, degree of business concentration, and number of administrative penalties; and (ii) the lasso-logistic model can identify the key factors inﬂuencing credit risk and have a better prediction performance. Moreover, transaction credit and reputation supervision signiﬁcantly inﬂuence the credit risk of SMEs.


Introduction
In recent years, China's logistics industry has developed rapidly and generated numerous employment opportunities nationwide with more than 50 million employees in the logistics industry. According to the National Logistics Operation Bulletin 2020, the total social logistics expenditure in China is CNY 14.9 trillion, which accounts for 14.7% of GDP, with transportation expenditures of CNY 7.8 trillion. With the increasing importance of sustainable development (SD) for firms to gain competitive advantage, more firms are adopting various mechanisms to achieve their SD [1]. Corporate sustainable management should not be limited to internal resource conservation and emission control but should be extended to the supply chain [2]. As a bridge connecting every link of production, the logistics industry plays an important strategic role in the global sustainable supply chain. Most studies on SD focus on the environment [3], but the challenges of SD include not only environmental, but also economic issues [1]. As the mainstay of economic development, the logistics industry faces various challenges in the process of promoting SD. Financial institutions (FIs) are more inclined to support large and reputable logistics enterprises while the lack of fixed assets and financial statements of small and medium-sized logistics enterprises results in difficulties obtaining external funding [4,5]. This makes it difficult for small and medium-sized logistics enterprises to access opportunities for SD. Supply chain finance (SCF) has received widespread attention for alleviating small and medium-sized enterprises' (SMEs') financing issues by providing alternative financial services that are conducive to sustainable supply chain development with the lowest interest rates and more fixed payment terms [6][7][8].
With the continuous integration and application of technologies, such as the Internet of Things, artificial intelligence, blockchain, and big data, the development of financial technology has accelerated. Simultaneously, the development of SCF is also significantly affected; data have gradually become a core factor. This has changed the credit awarding standard of traditional SCF and rendered new ideas for the innovative development of SCF [9]. In traditional SCF, SMEs obtain financial services by relying on the credit transmission of core enterprises (CEs) [10]. Consequently, there is significant information asymmetry following the limitation of the informatization level of FIs [11]. Traditional SCF business objects are mostly up-and downstream enterprises that have direct transactions with CEs. Therefore, the scope of financing entities covered at this stage is relatively small. The extensive application of financial technology in the SCF field has reconstructed the internal relationships of enterprises in the supply chain and simultaneously broken dependence on CEs credit. SCF can use big data, block chains, cloud computing, and other financial technologies to analyze transactional behavior and data to help SMEs obtain credit [12]. Therefore, financial technology enables SCF to alleviate the dilemma of traditional SCF with minimal coverage, which truly includes all the participants in the supply chain network. Finally, sustainable SCF (SSCF) is achieved by establishing inclusive finance throughout the supply chain.
SMEs have limited operating histories and weak market competitiveness [13]. Additionally, the spread of COVID-19 has made the survival and development of SMEs even more challenging. Several enterprises urgently need capital to resume their operations. Financing issues are regarded as an obstacle for SMEs in supply chains to engage in SD [14]. SMEs are often cash-constrained or have difficulty obtaining loans from banks [2], which impedes their ability to improve their sustainable performance, while low levels of SMEs' sustainability participation can seriously affect the efficiency and sustainability of the entire supply chain. As a result, to broaden the financing channels for SMEs, government agencies in China have issued policies to encourage FIs to carry out services to SMEs. Under the current SCF business model, FIs rely on the significant amount of data accumulated over a long period in the supply chain for the credit evaluation and risk monitoring of SMEs. This business model not only effectively solves the capital shortage problem of SMEs but also expands the customer base of SMEs for FIs. However, due to lack of pledged assets and CEs credit guarantee in case of SCF, FIs face complex risks, the most significant of which is the credit risk [15]. This study defines credit risk as the possibility that borrowing enterprises cannot or are unwilling to repay the principal and interest according to the financing contract [16]. Previous studies on SCF have focused on credit risk evaluation systems but failed to identify the key factors influencing credit risk in SCF. Identifying the key factors influencing the credit risk of SMEs, efficiently discriminating and managing SMEs, and providing targeted financial services can demonstrate the social responsibility of SCF [17,18]. To efficiently promote SSCF, an analysis that can identify and predict the credit risk of SMEs is essential.
The overall objective of this paper is to mine the credit information of SMEs to improve their credit rating, reduce information asymmetry, and alleviate financing difficulties. Based on the credit risk evaluation index system, the key factors influencing credit risk are identified to strengthen credit risk management and achieve SD. This study's contributions are summarized as follows. (1) We propose an SMEs credit risk evaluation index system specifically for SCF relied on data to grant credit, and 177 small and medium-sized transportation logistics enterprises are selected as empirical samples. This system is used to evaluate the credit risk in three dimensions, which not only include the financial status and non-financial status of SMEs, but also contain transaction credit and reputation supervision based on the supply chain. (2) We use the lasso-logistic regression approach to study the factors influencing the credit risk of SMEs. On one hand, it can scientifically and effectively select the key factors that influence credit risk; on the other hand, it can depict the influence of key factors on the probability of credit risk occurrence and predict credit risk level to provide a reference for the prevention and control of SCF credit risk. (3) We construct the SME credit risk evaluation index system from the perspective of transaction credit and reputation supervision, aiming to find more credit information to make up for the lack of corporate credit. The contribution to the practice of SCF is to prove that the transaction credit and reputation supervision of SMEs have an important impact on credit risk. It is of great significance to alleviate the financing dilemma of SMEs caused by the lack of corporate credit. When FIs carry out SCF business based on data to grant credit, they can rely on the data advantages of, for instance, third-party platforms, CEs, and logistics service providers (LSPs), to dilute the dependence on the corporate credit of SMEs. The rest of this paper proceeds as follows. Section 2 provides the literature review. Section 3 discusses the methodology of the study. Section 4 presents the credit risk-triggering mechanism and describes the variables. Section 5 shows the data simulation. Section 6 presents the empirical results. Finally, Section 7 concludes the study.

Literature Review
As an important part of supply chain management, SCF is an innovation produced by the intersection of supply chain and finance. Currently, studies on SCF mainly focus on two perspectives: finance oriented and supply chain oriented [19]. Based on finance oriented, SCF is interpreted as a short-term financial solution, of which FIs are an important part. Camerinelli [20] argued that SCF is an innovative financial solution that effectively connects FIs and SMEs through the supply chain to reduce the risk of mismatch between the supply and demand of capital flows. Based on supply chain, SCF expands the working capital management framework and emphasizes that members of the supply chain must cooperate to optimize the management of capital flow. Hofmann [21] and Pfohl and Gomm [22] indicated that SCF can be used to optimize the flow of funds in the supply chain. The CEs and LSPs are the main participants and supporters of SCF, respectively. Enterprises with strong credit in the supply chain, such as third-party platforms, CEs, and LSPs, etc., transmit their credit to the upstream and downstream SMEs in the supply chain to help them obtain credit. In addition, SCF emphasizes the real transaction information and transaction credit generated by SMEs relying on the supply chain [23]. This is because the status of transactions and capital flow in the supply chain network can reflect the operational situation of SMEs. In this paper, transaction credit refers to the long-term accumulation of credit based on the supply chain network that can reflect the business operation status and repayment ability. At the same time, SMEs do not operate in an isolated environment. To maintain the reputation of the supply chain, their business behaviors are supervised and bound by the network in which they are embedded [24]. Therefore, reputation supervision can also reflect the real operation situation of SMEs. Based on the supply chain, the data of transaction credit and reputation supervision are transmitted to FIs. Finally, information asymmetry between SMEs and FIs is alleviated and SMEs obtain credit. In summary, the essence of SCF is credit transmission and credit creation through the integration of supply chain resources and the reduction of information asymmetry.
Normally, the SCF can result in the increase in trust and profitability through the entire chain [25]. The acquisition of transaction information and reputation supervision in SCF can reduce information asymmetry and control potential risks, so it can provide easy credit for SMEs [26,27]. However, SCF cannot completely avoid the credit risk of SMEs [15]. Due to issues such as low credit rating, high probability of fraud and an imperfect credit guarantee system, the credit risk of SMEs in China is still regarded as the main source of risks for SCF [28]. The credit risk management assists FIs to be more confident in granting credit decisions [29] and help FIs expand the scope of financial services. Thus, it is important to identify and understand the credit risk of SCF with the purpose of achieving SSCF. The following literature review is divided into two sections. The first section presents the research status of SSCF. The second section considers the discussion of influencing factors and prediction models of credit risk in SCF.

Sustainable Supply Chain Finance
Sustainability is widely summarized as development that meets the needs of the present without compromising the ability of future generations to meet their own needs [30,31]. The goal of SD is to balance the economic, environmental, and social subsystems [32]. The integration of SD and SCF is essential [29]. As an innovative financing mechanism, SCF has the function of encouraging SD [2,33]. Some scholars have begun to study SCF from the perspective of SD [34][35][36]. Based on the triple bottom line of sustainability, these studies analyze SCF from three dimensions of economic, environmental, and social sustainable development, which provide new insights for supply chain management, and are conducive to improving the sustainability of the supply chain. By researching Vietnam's textile industry, Tseng et al. [8] pointed out that SSCF could promote the establishment of equilibrium among the triple bottom line aspects, and SCF suppliers could gain benefits in risk control and SD. Li et al. [36] believe SSCF to be an operational framework that not only emphasizes the logistics, information flow, and capital flow of each enterprise in the supply chain, but also influences the operation of other enterprises through positive externalities in the economic, environmental, and social subsystems. SSCF creates environmental, social, and economic benefits for all stakeholders in a way that minimizes negative impacts [37]. SSCF is reflected in two aspects: on one hand, SCF providers should consider environmental, social, and economic sustainability, while on the other hand, establish financial inclusion based on SCF that radiates throughout the chain to achieve SSCF [9].
SMEs in the supply chain should be offered greater access to financial resources based on sustainable behavior because this may encourage their sustainability engagement [35,38]. We understand that the goal of SSCF is to provide financial services for SMEs, which is important in improving their livelihood, realizing the common development of upstream and downstream enterprises, and thus promoting the sustainability of the supply chain.
The key to SSCF is to promote the development of the entire supply chain. In addition, risk management assists FIs to be more confident in granting credit decisions.

Influencing Factors and Prediction Models of Credit Risk in SCF
Most of the research on credit risk comes from the theory of default risk prevention in the process of bank loans. Research on the influencing factors is based on typical longterm repayment judgment standards, mainly considering hard information, such as the size of the enterprise, fixed assets, and financial data [39,40]. However, SCF breaks the bank credit model and no longer focuses only on the financial information of borrowing enterprises, rather emphasizing the stability of the entire supply chain and the credit level of transaction partners using the authenticity of trade with its counterparties to grant credit [10]. Research on the factors influencing credit risk of SMEs in SCF is mostly carried out from the supply chain perspective. An increasing number of researchers include non-financial information and non-standard soft information in the research on the factors influencing credit risk, such as the macroeconomic environment, core corporate credit, supply chain level, market trends, corporate credit records, and management team quality [27,41]. In addition, some scholars have studied the factors influencing credit risk of SMEs from the perspective of SD. Cao and Xiong [42] used the fuzzy analytic hierarchy process to construct a credit rating evaluation index system for sustainable financing of Chinese SMEs from the perspective of sustainability. Based on the triple bottom line theory of environmental, social, and economic elements from the perspective of SD, Liang et al. [33] innovatively proposed a credit risk evaluation model for SMEs in SCF by combining the fuzzy multi-indicator evaluation method with TOPSIS.
Methods, such as the support vector machine (SVM), backpropagation (BP) neural network, logistic regression, and the analytic hierarchy process, have been developed and used to predict the credit risk of SMEs [18,43]. Zhang et al. [44] adopted a view of the supply chain that considers the credit status of the leading enterprise as well as the relationships in the supply chain and established a credit risk assessment model based on SVM. Tang et al. [45] established a scientific and effective credit risk assessment model based on a random forest model to measure the credit risk of China's energy industry. Zhang et al. [46] predicted the probability of loan default based on a feedforward artificial neural network. Although machine learning methods, such as SVM and BP neural network models, perform well in terms of prediction accuracy, it is difficult to identify the key factors influencing credit risk. This is not conducive to risk management [47]. Logistic regression has the advantages of explanatory power and robustness. Logistic regression has further received increasing attention in recent years in the field of the prediction of credit risk [48]. Xiong et al. [43] selected the data of 102 listed companies and used a logistic regression model to predict credit risk from four dimensions-borrowing enterprise qualification, counterparty qualification, the characteristic factors of items, and the supply chain operation status. Wang et al. [49] used a logistic regression model to predict the credit risk of commercial bank customers. The model had high prediction accuracy and good performance. However, numerous highly correlated factors influenced the credit risk of the SMEs of SCF. Therefore, based on logistics regression, to improve the security of SCF and develop SSCF, it is necessary to further consider the variable selection ability to solve the identification and prediction of credit risk factors of SMEs [50].

Lasso Model
Lasso is an innovative variable selection method proposed by Tibshirani [51]. The advantage of lasso is that it can select variables and adjust the complexity during the fitting process. The selection of variables is not to put all variables into the model for fitting, but rather to identify key factors in order to obtain good performance parameters and to selectively fit variables into the model. The complexity adjustment prevents the over-fitting of the model. The lasso model was originally introduced in the context of the least squares; its objective function can be summarized as follows: where x ij is the explanatory variable, which satisfies 1 n ∑ i x ij = 0 and 1 n = ∑ i x 2 ij = 1, and y i is the explained variable. Equation (1) compresses the estimated regression coefficient β j by controlling part of the non-negative factors c j . Letβ = β 1 · · ·β k T , then the lasso estimation α,β can be defined as: where t ≥ 0 is a harmonic parameter, and a = y. For the sake of generality, we suppose y = 0 and adjust Equation (2) to Equation (3).
The harmonic parameter t controls the compression of regression coefficients. Assuming thatβ 0 is the least square estimator of the regression parameters, t 0 = ∑ j |β 0 |, t < t 0 compresses some regression coefficients to 0. Equation (3) can be expressed by Equation (4) by introducing the penalty function: In Equation (4), the first and second halves represent the fitting performance of the model and the penalty parameter, respectively, where λ ≥0 as the value of λ continues to increase; the penalty of the model also increases, and the remaining explanatory variables gradually decrease [52]; therefore, the key factors can be selected. The methods that are commonly used to select the penalty parameter, λ include bootstrap, cross-validation, and generalized cross-validation. This study uses the generalized cross-validation method to determine the value of λ.

Lasso-Logistic Model
The lasso method is mainly applied to linear models. When predicting the credit risks of SMEs, the explained variables are binary. Therefore, the lasso-logistic model should be used. The respective model can not only select the key factors influencing the credit risk of SMEs, but also predict the probability of credit risk.
Suppose y i is the explained variable and y i is a binary response variable that takes the value of either 0 or 1. Then, the conditional probability of the logistic regression is: where p is the probability of the occurrence of the credit risk of the sample enterprise and The coefficient estimate in the lasso-logistic regression is determined by the minimum value of the convex function of Equation (6): l(β) is the logarithmic likelihood function of the original logistic regression, which can be expressed as Equation (7): coefficient estimationβ of the lasso-logistic regression model can be expressed by Equation (8):

Model Evaluation
This study uses the accurate classification ratio and risk discrimination accuracy as the main evaluation criteria to evaluate the performance of the model. The classification matrix accurately determines the classification ratio. The receiver operating characteristic curve (ROC) and the area under the curve (AUC) value can determine the risk discrimination accuracy. In binary classification, there are four prediction results, as listed in Table 1. The prediction accuracy, Type I error, and Type II error were calculated using Equations (9)- (11).
The accuracy of prediction = (a + d)/(a + b + c + d) Type I error = c/(c + d) The ROC curve was first used to analyze the classification errors in signal detection, after which it was widely used in credit evaluation. AUC is the area under the ROC curve. Generally, the closer the AUC value is to 1, the higher the risk discrimination accuracy [53]. Figure 1 shows that SSCF is based on the supply chain, which grants credit and supervision to SMEs according to transaction, logistics, and capital flow information. Based on the supply chain, enterprises with large amounts of data resources, including third-party platforms, CEs, and LSPs, rely on their own data advantages to turn the transactions, logistics, and capital information of SMEs into data packages. We call this process "business datafication," which can make the transaction process of the visualization and transparency of SMEs. Second, these enterprises use financial technology to crossvalidate and analyze the transaction behavior and data of SMEs to estimate the authenticity of the data and share information with FIs. In the process of the business datafication of SMEs, FIs have increased their knowledge of SMEs and found new credit information channels. We define the process of turning SME data into credit enhancement assets as data capitalization. In conclusion, FIs rely on data assets to provide financial services to SMEs and achieve data asset valuation. Based on supply chain, SCF provides credit and supervision to SMEs in the chain through logistics, information flow, document flow, and other data, which changes the traditional financing model mainly based on bank credit and changes the traditional credit risk management mode. The development of SSCF is based on the whole supply chain with numerous participants covering various groups in the chain such as SMEs, third-party platforms, CEs, LSPs, and FIs, forming a stable ecosystem. In the ecosystem, the benefits can be gained in each cluster, and the value added in the whole supply chain can continue to grow and achieve sustainability.

Credit Risk Triggering Mechanism
The ROC curve was first used to analyze the classification errors in signal detection, after which it was widely used in credit evaluation. AUC is the area under the ROC curve. Generally, the closer the AUC value is to 1, the higher the risk discrimination accuracy [53]. Figure 1 shows that SSCF is based on the supply chain, which grants credit and supervision to SMEs according to transaction, logistics, and capital flow information. Based on the supply chain, enterprises with large amounts of data resources, including thirdparty platforms, CEs, and LSPs, rely on their own data advantages to turn the transactions, logistics, and capital information of SMEs into data packages. We call this process "business datafication," which can make the transaction process of the visualization and transparency of SMEs. Second, these enterprises use financial technology to cross-validate and analyze the transaction behavior and data of SMEs to estimate the authenticity of the data and share information with FIs. In the process of the business datafication of SMEs, FIs have increased their knowledge of SMEs and found new credit information channels. We define the process of turning SME data into credit enhancement assets as data capitalization. In conclusion, FIs rely on data assets to provide financial services to SMEs and achieve data asset valuation. Based on supply chain, SCF provides credit and supervision to SMEs in the chain through logistics, information flow, document flow, and other data, which changes the traditional financing model mainly based on bank credit and changes the traditional credit risk management mode. The development of SSCF is based on the whole supply chain with numerous participants covering various groups in the chain such as SMEs, third-party platforms, CEs, LSPs, and FIs, forming a stable ecosystem. In the ecosystem, the benefits can be gained in each cluster, and the value added in the whole supply chain can continue to grow and achieve sustainability.  Under the SCF based on data to grant credit, the credit rating process no longer pays much attention to the pledged assets and the credit guarantee of the CEs. It pays more attention to real transaction data and reputation supervision data to make up for the lack of corporate credit. Figure 2 shows the credit risk-triggering mechanism of SMEs. SMEs grant credit based on real transaction information in SCF [54]. Therefore, the credit risk of SMEs mainly stems from two aspects: the (1) corporate and (2) transaction behavior of SMEs. On one hand, SMEs are major beneficiaries of SCF business activities; their own operating and managing abilities are the source of credit risk. Specifically, in the process of obtaining financing for SMEs, their own basic conditions, profitability, operating ability, and solvency may trigger credit risk. On the other hand, in the absence of pledged assets, the transaction revenue of SMEs is the main repayment source. The authenticity of a transaction is the most important basis for SCF. The sustainability of transactions reflects the durability of SMEs transaction cooperative relationships, which is directly related to SMEs' future income and can be used as a basis for predicting SMEs' future debt solvency. The performance of transactions reflects the transaction history of SMEs, which can be used as reference for credit records. If SMEs fail to fulfil the order, the sustainability of the entire supply chain will be threatened [50]. Accordingly, the authenticity, performance, and sustainability of transactions influence credit risk. Additionally, some scholars have shown that supervision in the supply chain network can help reduce moral hazard behavior and encourage enterprises to pay attention to their reputation through corresponding rewards and defaulting punishments [13,55,56], hence reducing the probability of the occurrence of credit risk [57]. Drawing on the results of previous studies, this paper weakens the dependence on corporate credit of SMEs and constructs a three-dimensional SME credit risk evaluation index system of "corporate credit + transaction credit + reputation supervision". Based on the supply chain transaction process, the paper provides a comprehensive consideration of the authenticity, performance, and sustainability of the transaction, as well as the business reputation in the supply chain ecosystem and the records of rewards and punishments from government departments, so that the evaluation scope of FIs is wider. By mining the credit information of SMEs, the difficulty of obtaining financing for SMEs is alleviated, and the development of sustainable supply chain finance is promoted. Under the SCF based on data to grant credit, the credit rating process no longer pays much attention to the pledged assets and the credit guarantee of the CEs. It pays more attention to real transaction data and reputation supervision data to make up for the lack of corporate credit. Figure 2 shows the credit risk-triggering mechanism of SMEs. SMEs grant credit based on real transaction information in SCF [54]. Therefore, the credit risk of SMEs mainly stems from two aspects: the (1) corporate and (2) transaction behavior of SMEs. On one hand, SMEs are major beneficiaries of SCF business activities; their own operating and managing abilities are the source of credit risk. Specifically, in the process of obtaining financing for SMEs, their own basic conditions, profitability, operating ability, and solvency may trigger credit risk. On the other hand, in the absence of pledged assets, the transaction revenue of SMEs is the main repayment source. The authenticity of a transaction is the most important basis for SCF. The sustainability of transactions reflects the durability of SMEs transaction cooperative relationships, which is directly related to SMEs' future income and can be used as a basis for predicting SMEs' future debt solvency. The performance of transactions reflects the transaction history of SMEs, which can be used as reference for credit records. If SMEs fail to fulfil the order, the sustainability of the entire supply chain will be threatened [50]. Accordingly, the authenticity, performance, and sustainability of transactions influence credit risk. Additionally, some scholars have shown that supervision in the supply chain network can help reduce moral hazard behavior and encourage enterprises to pay attention to their reputation through corresponding rewards and defaulting punishments [13,55,56], hence reducing the probability of the occurrence of credit risk [57]. Drawing on the results of previous studies, this paper weakens the dependence on corporate credit of SMEs and constructs a three-dimensional SME credit risk evaluation index system of "corporate credit + transaction credit + reputation supervision". Based on the supply chain transaction process, the paper provides a comprehensive consideration of the authenticity, performance, and sustainability of the transaction, as well as the business reputation in the supply chain ecosystem and the records of rewards and punishments from government departments, so that the evaluation scope of FIs is wider. By mining the credit information of SMEs, the difficulty of obtaining financing for SMEs is alleviated, and the development of sustainable supply chain finance is promoted.

Variable Definitions
In the absence of the credit transmission of CEs and pledged assets, according to the traditional credit risk evaluation index system, the credit rating of SMEs evaluated by FIs is low, so it is difficult to obtain credit. We construct the SME credit risk evaluation index system from the perspective of transaction credit and reputation supervision, aiming to dig more credit information to make up for the lack of corporate credit. It helps SMEs to improve their credit rating and obtain financing to alleviate their financing difficulties for sustainable development. At the same time, constructing a credit risk evaluation index system for SMEs under SCF based on data to grant credit is conducive to achieving a win-win situation for multiple parties and promoting the SD of the supply chain.
The data of SMEs are not transparent and the financial system is not sound. It is difficult to obtain key factors that can truly reflect the possibility of default. SMEs can take advantage of the supply chain, and use financial technology to reduce information asymmetry. Thus, SMEs can obtain financial services and alleviate the difficulties in financing. Under SCF based on data to grant credit, third-party platforms, CEs, LSPs, and other data-advantaged companies provide FIs with data information that can reflect the true operation status and future funding capabilities of SMEs. On the one hand, it is conducive to the expansion of the business scope of these companies and increases new sources of profit; on the other hand, building an information bridge between FIs and SMEs, reduces information asymmetry and provides SMEs with better access to financial resources. SMEs will be promoted to establish long-term cooperative relations with these enterprises and improve the competitiveness of the entire supply chain. FIs can accurately grasp the credit risk of SMEs by examining the operation of the entire supply chain and reduce the uncertainty of loans with the help of the operating model of supply chain finance, thereby improving their sustainability performances and controlling credit risk.
The explained variable represents whether SMEs default and is indicated as Y; a value of 1 indicates default, which means that the credit risk of SMEs is high; a value of 0 indicates non-default, which means that the credit risk of SMEs is low. Based on the above analysis, FIs can evaluate the credit risk of SMEs in the following three dimensions: corporate credit (financial and non-financial factors), transaction credit (transaction authenticity, performance, and sustainability), and reputation and supervision behavior (records of rewards and punishments from government departments and business reputation). According to the suggestions of Liang et al. [33], Zhu et al. [16], Zhang et al. [44], Chu et al. [58], and Chi et al. [59], the factors influencing these three dimensions are divided into 20 explanatory variables. As shown in Table 2, X1, X2, and X3 are the non-financial factors; X4, X5, X6, X7, and X8 are the financial factors; X9 indicates the authenticity of transactions; X10, X11, and X12 indicate the performance of transactions; X13 and X14 indicate the sustainability of transactions; X15, X16, X17, X18, and X19 indicate records of rewards and punishments from government departments; and X20 indicates the business reputation of SMEs. With reference to the Likert scale, X9, X13, and X20 are all divided into five levels and obtained through expert scoring method in this paper. X9 is divided according to whether the order information, tax information, and financial information of small and medium-sized logistics enterprises match each other after cross-validation. Logistics delays and uncertainty in delivery time may hurt manufacturers and are not conducive to SD [60]. X10 refers to the ratio of small and medium-sized logistics enterprises that deliver goods at the time according to the contract and reflects the reliability of delivery. X11 can reflect the proportion of delivery according to the contract. A high contract enforcement ratio can reduce secondary transportation and is conducive to environmental protection. X12 reflects the small and medium-sized logistics enterprise's past credit history. X13 is divided according to the counted number of cargo owner enterprises and reflects whether a small and medium-sized logistics enterprise's business is dispersed among several enterprises or concentrated in a certain enterprise. X14 reflects the interpersonal relationship between management. The management is embedded in the social and economic environment to better understand customer needs and use interpersonal relationships to expand business for the enterprise, which significantly affects the actual operation of the enterprise [61]. The reputation supervision can promote enterprises to accumulate their reputation and form a good credit culture [62], which to a certain extent, reflects the credit environment of the whole society. X15-X19: all of these are the records of government departments from 2018 to 2019. X15 reflects the small-and medium-sized logistics enterprises' litigation history. X16 reflects the counted number that they have been recorded on a credit blacklist; X17 reflects the counted number of tax rating as A; X18 reflects the counted number of administrative penalties; X19 reflects the counted number of abnormal operations. Business reputation can help enterprises obtain financing. It is also an intangible asset [63,64].X20 is based on the list of A-level logistics enterprises published by the China Federation of Logistics and Purchasing, the first organization in the logistics and purchasing industry in China. The more times you are selected into the list of A-level logistics companies, the higher the industry recognition.

Data Sources and Pre-Processing
Small and medium-sized logistics enterprises are considered the sample object of this study. On one hand, small and medium-sized logistics enterprises provide intangible services and do not provide effective pledged assets. It is difficult for these enterprises to obtain credit from CEs following the multilevel subcontracting service relationship. In recent years, driven by new technologies such as big data, cloud computing, the Internet of Things, and artificial intelligence, the logistics industry has continued to change rapidly in the direction of datafication, informatization, and platformization [65]. This change provides a basis for small and medium logistics companies to rely on data to grant credit. The characteristics of small and medium-sized logistics enterprises are suitable for the SCF based on data to grant credit studied in this paper. On the other hand, as a bridge connecting every link of production, the logistics industry plays an important strategic role in the global sustainable supply chain. However, from the current situation of China's logistics development, there are still some relatively common problems that affect the realization of China's logistics sustainable development. In China's "Medium and Long-term Plan for the Development of the Logistics Industry (2014-2020)", small and medium-sized logistics enterprises are encouraged to use low-energy, low-emission vehicles, and the realization of energy conservation and emission reduction will inevitably generate huge financing needs. Therefore, to promote the SD of the logistics industry and alleviate financing difficulties, it is vital to construct a credit risk evaluation index system that meets the characteristics of small and medium-sized logistics enterprises. The data sample comes from the crosssectional dataset of credit information from a fintech company in Shanghai at the end of December 2019, with a total sample of 177. The company carries out SCF businesses based on data to grant credit.
Data normalization is a prerequisite for excluding dimensional differences and improving the robustness of the model. This study uses the Z-score standardization method to process the data. To evaluate the prediction effect of the model, we divide the samples into training and test sets. Twenty percent of the dataset is randomly selected from the original sample as the training set; 35 sample enterprises are used as test sets. The remaining 142 are used as the training sets [66].

Simulations
We applied data simulation to test the variable selection ability and prediction accuracy of the lasso-logistic model. This study refers to Example 1 of Tibshirani [51] for data simulation from the model.
In the data simulation, we set the sample size as 100, 200, and 500, and randomly select 20% of the sample as the test set and the rest as the training set. Table 3 shows the simulation results of the three models, including the lasso-logistic, ridge regression, and logistic regression models. In Table 3, the number of non-zero coefficients indicates the number of key variables estimated by the model; the number of correctly estimated zero coefficients indicates the ability to reject the unimportant variables of the model; the number of correctly estimated zero coefficients indicates the ability to identify the key variables of the model, and the prediction accuracy ratio and AUC indicate the prediction ability of the model. Table 3 shows that although the lasso-logistic model retains some unimportant variables, it correctly identifies all the key variables. Additionally, the AUC and prediction accuracy of the lasso-logistic model were higher than those of the other two models. Therefore, the lasso-logistic model not only has a good variable selection ability but can also obtain a better prediction effect.

Identification of the Factors Influencing the Credit Risk of SMEs
In this study, we built a lasso-logistic model based on the glmnet package in R software. The corresponding penalty coefficient λ is selected by generalized cross-validation; Figure 3 shows the corresponding trend of the penalty parameter λ and the number of variables. In Figure 3, the horizontal axis represents the logarithmic value of λ, and the vertical axis represents the model mean square error. The number at the top shows the number of selected variables, and the interval between the two dashed lines shows the range of positive and negative standard deviations. Furthermore, the model prediction bias varies less when λ is taken between the dashed lines [51]. In Figure 3, the dashed line on the left shows the value of λ, which minimizes the error of the model when the penalty parameter is minimized, while the dashed line on the right shows the value of λ when the error of the model and the number of explanatory variables are minimized. As the value of λ changes, the degree to which the model is compressed varies. This results in a change in the number of variables selected by the model.   Figure 4, when λ = e −1 , the model selects variable X12. To obtain relatively important variables, referring to the results of Tibshirani [51], this study considers the value of λ near the left dashed line in Figure 3; that is, λ = 0.021.    Figure 4, when λ = e −1 , the model selects variable X12. To obtain relatively important variables, referring to the results of Tibshirani [51], this study considers the value of λ near the left dashed line in Figure 3; that is, λ = 0.021.   Figure 4, when λ = e −1 , the model selects variable X12. To obtain relatively important variables, referring to the results of Tibshirani [51], this study considers the value of λ near the left dashed line in Figure 3; that is, λ = 0.021.  The explanatory variables corresponding to the non-zero regression coefficients are the key variables selected by the lasso model, as shown in Table 4. In conclusion, we obtain five key variables, including the matching degree of order data, ratio of contract enforcement, number of contract defaults, degree of business concentration, and number of administrative penalties Table 4 shows the multicollinearity test for these variables. Furthermore, none of the VIF values were greater than 10, indicating that there was no multicollinearity for the key variables [67]. Figure 5 shows a correlation analysis of the key variables, indicating that the β that corresponds to the selected variables, and the parameter λ can be used in the logistic regression model of credit risk prediction. Each curve represents a coefficient; the labels are on the lower right side of the figure. The explanatory variables corresponding to the non-zero regression coefficients are the key variables selected by the lasso model, as shown in Table 4. In conclusion, we obtain five key variables, including the matching degree of order data, ratio of contract enforcement, number of contract defaults, degree of business concentration, and number of administrative penalties Table 4 shows the multicollinearity test for these variables. Furthermore, none of the VIF values were greater than 10, indicating that there was no multicollinearity for the key variables [67]. Figure 5 shows a correlation analysis of the key variables, indicating that the β that corresponds to the selected variables, and the parameter λ can be used in the logistic regression model of credit risk prediction. Each curve represents a coefficient; the labels are on the lower right side of the figure.  Among them, the matching degree of order data, ratio of contract enforcement, and number of administrative penalties negatively influence the credit risk of SMEs while the number of contract defaults and business concentration positively influence the credit risk of SMEs. SMEs lack effective pledged assets. Furthermore, their business income is the main guarantee of repayment. The matching degree of order data reflects the authenticity of business transactions. Moreover, real transactions can reduce the risk of credit default Among them, the matching degree of order data, ratio of contract enforcement, and number of administrative penalties negatively influence the credit risk of SMEs while the number of contract defaults and business concentration positively influence the credit risk of SMEs. SMEs lack effective pledged assets. Furthermore, their business income is the main guarantee of repayment. The matching degree of order data reflects the authenticity of business transactions. Moreover, real transactions can reduce the risk of credit default and fraudulent loans [68]. The ratio of contract enforcement and the number of contract defaults directly reflect the recent transaction performance of the enterprise. The number of contract defaults indicates the willingness to repay and the probability of credit default. The degree of business concentration reflects the SME's reliance on counterparties. The transaction interruption, loss, and delay of payment by a single large counterparty can greatly increase the business risk of SMEs. Jiang and Yao [69] found that enterprises' businesses are concentrated in a single counterparty and are subject to increased influence from the respective counterparty. This greatly increases their operational or financial risk and consequently indirectly reduces banks to provide credit to SMEs. Administrative penalties reflect whether SMEs operate in accordance with regulations and are fundamental to their growth and profitability. Therefore, this variable is the most fundamental variable in the credit risk assessment of SMEs.

Evaluation of the Lasso-Logistic Model
We compared the lasso-logistic model with classical prediction models, such as the ridge regression and the BP neural network models, following the evaluation criteria given in Section 3.3, in order to demonstrate the performance of the lasso-logistic based SME credit risk prediction model. Table 5 present the experimental results for the training set of the three models. Among the three models, the lasso-logistic model has the highest prediction accuracy rate of 96.5%. In addition, the type II error of the lasso-logistic model is the lowest at 0.037. In practice, FIs incorrectly classify non-default enterprises into default enterprises, which results in FIs losing potential customers [27]. Figure 6 shows the ROC curves of the three models, where the red, green, and blue lines indicate the ridge regression, lasso-logistic, and the BP neural network models, respectively. Evidently, the lasso-logistic model proposed in this study has better performance and can accurately predict the credit risk of SMEs. To further demonstrate the effectiveness of the lasso-logistic warning model, as described in Section 4.3, we divided the raw data sample into two sets and randomly selected 20 times to obtain 20 test sets. The test sets were simulated based on the model obtained from the training sets. Further, the prediction accuracy, Type I error, Type II error, and AUC of the three models were recorded to obtain the average of the 20 results, as shown in Table 5. It can be seen from Table 5 that the proposed lasso-logistic method outperforms the other two models. Scholars point out that decreasing the Type I error is more important to a forecasting model than decreasing the Type II error [70,71]. In the test set, the Type I error of the lasso-logistic model is the lowest at 0.044, which can greatly reduce the credit risk faced by FIs. In addition, as shown in Table 5, the prediction accuracy rate of the lasso-logistic model training set is 96.5%, and the average prediction accuracy rate of the test set samples is 96.4%, which further illustrates the prediction accuracy rate of the lasso-logistic model credit risk prediction model. Moreover, compared with the test set and the training set, the model evaluation indexes in the lasso-logistic model do not change much, indicating that the model is more stable. The lasso-logistic model can select the variables and compress some unimportant variable coefficients to zero; hence, avoiding the interference of redundant information variables and improving the validity of the model. It also has several good characteristics, such as interpretability and numerical stability.
Note: the data of the test set are averaged after randomly sampling 20 observations. Figure 6. The ROC of three models based on the same training set.

Conclusions
In China, the logistics industry has encountered obstacles in implementing SSCF due to credit risk. The identification and prediction in SCF have strong implications for FIs and the SD of the economy. Although the credit risk prediction of SMEs has been studied in recent years, limited studies have identified the key factors influencing the credit risk of SMEs. In addition, most previous research objects are listed enterprises that use financial data to construct an SCF credit risk prediction model. However, the financial reports of SMEs are incomplete and difficult to access. With the continuous innovation of SCF, SMEs can rely on data to obtain credit. Therefore, the inclusion of non-financial data in the SCF credit risk evaluation index system of SMEs and strengthening the credit risk management of SMEs to promote the development of SSCF have become urgent problems to be solved.
To address these problems, we constructed a new SME credit risk evaluation index system and an SME credit risk prediction model for China's FIs in SCF. Specifically, we investigated the financial and non-financial data of 177 SMEs. We first normalized the source data, excluded the dimensional differences, and randomly divided the samples into two sets for model construction and testing. Additionally, we introduced the lassologistic model into the credit risk identification model to select the key influencing factors to further construct the credit risk prediction model. Based on the key influencing factors, credit risk control should be strengthened to reduce the possibility of credit default and promote the development of SSCF.
The factors influencing the credit risk of SMEs are divided into three dimensionscorporate credit, transaction credit, and reputation supervision-with a total of 20 variables. According to the empirical results of the lasso-logistic model, the degree of matching

Conclusions
In China, the logistics industry has encountered obstacles in implementing SSCF due to credit risk. The identification and prediction in SCF have strong implications for FIs and the SD of the economy. Although the credit risk prediction of SMEs has been studied in recent years, limited studies have identified the key factors influencing the credit risk of SMEs. In addition, most previous research objects are listed enterprises that use financial data to construct an SCF credit risk prediction model. However, the financial reports of SMEs are incomplete and difficult to access. With the continuous innovation of SCF, SMEs can rely on data to obtain credit. Therefore, the inclusion of non-financial data in the SCF credit risk evaluation index system of SMEs and strengthening the credit risk management of SMEs to promote the development of SSCF have become urgent problems to be solved.
To address these problems, we constructed a new SME credit risk evaluation index system and an SME credit risk prediction model for China's FIs in SCF. Specifically, we investigated the financial and non-financial data of 177 SMEs. We first normalized the source data, excluded the dimensional differences, and randomly divided the samples into two sets for model construction and testing. Additionally, we introduced the lasso-logistic model into the credit risk identification model to select the key influencing factors to further construct the credit risk prediction model. Based on the key influencing factors, credit risk control should be strengthened to reduce the possibility of credit default and promote the development of SSCF.
The factors influencing the credit risk of SMEs are divided into three dimensionscorporate credit, transaction credit, and reputation supervision-with a total of 20 variables. According to the empirical results of the lasso-logistic model, the degree of matching of order data, ratio of contract enforcement, number of contract defaults, degree of business concentration, and number of administrative penalties are the key factors influencing the credit risk of SMEs. Therefore, FIs can focus on these variables when assessing credit risk.
The lasso-logistic model can overcome the problem of logistic model multicollinearity and accurately select the key explanatory variables. This increases the model interpretation. Additionally, compared with the ridge regression and the BP neural network models, the lasso-logistic model has the best prediction accuracy and discrimination ability.
However, this study also has a few limitations, which can be addressed with further research. First, this study only uses the data of SMEs in the logistics industry to verify the effectiveness of the model. Further, the sample size is not large enough. Future studies can select a greater number of SMEs from several industries to improve and enrich the model and theory proposed in this study. In the current practice of SCF, the Type I error is too high-that is, the loan is granted to the wrong borrowers, leading to the FIs being more likely to be reluctant to approve loan applications, which is not conducive to the development of SSCF.