Forecasting Corporate Failure in the Chinese Energy Sector: A Novel Integrated Model of Deep Learning and Support Vector Machine

: Accurate forecasts of corporate failure in the Chinese energy sector are drivers for both operational excellence in the national energy systems and sustainable investment of the energy sector. This paper proposes a novel integrated model (NIM) for corporate failure forecasting in the Chinese energy sector by considering textual data and numerical data simultaneously. Given the feature of textual data and numerical data, convolutional neural network oriented deep learning (CNN-DL) and support vector machine (SVM) are employed as the base classiﬁers to forecast using textual data and numerical data, respectively. Subsequently, soft set (SS) theory is applied to integrate outputs of CNN-DL and SVM. Hence, NIM inherits advantages and avoids disadvantages of CNN-DL, SVM, and SS. It is able to improve the forecasting performance by taking full use of textual data and numerical data. For veriﬁcation, NIM is applied to the real data of Chinese listed energy ﬁrms. Empirical results indicate that, compared with benchmarks, NIM demonstrates superior performance of corporate failure forecasting in the Chinese energy sector.


Introduction
Energy is an essential material basis for human survival and development. Along with economic development and social progress in China, large amounts of investments are required in the energy sector to meet the increasing needs of energy. According to the estimation of the International Energy Agency, China will be the world's largest consumer of energy by 2040, accounting for 22% [1]. This means that one trillion dollars should be invested in the energy sector in China. Meanwhile, the Chinese energy sector is experiencing challenges due to the geopolitical uncertainty [2]. To address concerns of climate changes, the Chinese energy sector is also working for national energy policies and actions of strengthening energy security and sustainability. Hence, it is of great significance to keep investing in the Chinese energy sector. However, it suffers high risks.
As main operators and investment targets, the financial performance of energy corporates has attracted tremendous interest from both practitioners and academic researchers recently [3,4], for the financial and the social damage inflicted by energy corporate failure cannot be overstated [2]. More specifically, energy corporate failure, in which the firm is legally bankrupt or cannot pay for bills, etc. [5], not only makes investors and energy firms suffer huge economic losses but also brings strong negative impacts on both national economies and society stability. the unanimous voting method (IMUV) [30], and the Dempster-Shafer evidence theory [31] (IMET) are included as benchmarks.
The important contributions of this study can be summarized as follows.
• Compared to developed economies, energy investment in emerging economies climbs faster. Due to immature business environments, it is easier for energy firms in emerging economies to fail. It is significant to identify the failure as early as possible, though it is complex. To date, little attention has been paid to firm failure forecasting in the energy sector of emerging economies. This paper complements prior literature with new empirical evidence from China. • A novel integrated model is proposed for corporate failure forecasting in the Chinese energy sector. It integrates CNN-DL and SVM into SS. More specifically, CNN-DL is employed to forecast with textual data, SVM is applied to forecast with numerical data, and results of CNN-DL and SVM are integrated by SS. The algorithm enables NIM to effectively improve performance by taking full use of textual data and numerical data.

•
Empirical results demonstrate that textual data can play an important role in corporate failure forecasting in the Chinese energy sector as the complement of numerical data, but the validity is decreasing with a longer forecasting horizon.
The rest of this paper is organized as follows. Section 2 reviews the pertinent literature on corporate failure forecasting. In Section 3, we introduce the proposed NIM in detail. Section 4 presents the application of NIM to real data. Section 5 reports and compares empirical results. We conclude and discuss the future work in Section 6.

Literature Review
During past decades, some literature has reviewed corporate failure forecasting in detail, such as Sun et al. [5], Alaka et al. [20], Prusak [32], etc. Here, we briefly review more recent literature of corporate failure forecasting (shown as Table A1) and summarize the recent development as follows.
First, more and more studies have started to forecast corporate failure in a specific sector to improve forecasting performance in recent years because each sector has its characters, such as financial characters, organizational characters, environment characters, etc. [33]. However, past studies as those above mainly focus on a specific sector such as manufacture, bank, hotel, agribusiness, etc. [22][23][24]34]. Until now, to the best of our knowledge, only Doumpos et al. [2] has explored corporate failure forecasting in the energy sector. While their samples were collected from developed European, little attention has been paid to forecast corporate failure in the energy sector of developing economics.
Second, non-financial variables are being more widely applied for corporate failure forecasting recently, though financial ratios are still the most popular variables [9,20], such as market information, macroeconomic, industry information, and so on [2,24,34]. With the development of artificial intelligence, some literature has started to adopt textual data to forecast corporate failure [5,27,28]. It can be applied to forecast corporate failure as the complement of financial ratios.
Third, forecasting methods proposed by recent literature can be divided into two categories-individual models and integrated models. Individual statistic methods are widely employed to forecast corporate failure, such as discriminant analysis and its expansions [35], logistic regression and its expansions [21], a proportional hazards model and its expansions [36], etc. It is easy to analyze and explain the impact of each variable on corporate failure using individual statistic models. However, we have to meet some stringent model assumptions about sample data to apply those models [9]. To overcome the limitations above, more and more individual machine learning methods have been proposed for corporate failure prediction, such as neural networks [10], genetic algorithm [13], decision tree [14], support vector machine [15], rough set [16], deep learning [17], etc. The main advantage of a machine learning algorithm is that it is able to consider multiple features simultaneously and capture the hidden relationship between them, which enables it to perform better when compared to the statistical Energies 2019, 12, 2251 4 of 20 models [37][38][39][40]. This enables machine learning methods to have better flexibility in corporate failure forecasting. It is significant for both academic research and real practices.
To achieve a better forecasting performance, integrated models, which employ individual models as basis classifiers, have become a new exploring trend. A great deal of literature has demonstrated how to integrate basis classifiers for corporate failure forecasting [5,32]. To date, integrated models can be divided into two groups. One includes horizontal integrated models, such as a UV ensemble model [30], a spline-rule ensemble model [41], etc. Horizontal integrated models are used to employ a combination technique to integrate basis classifiers. The other group includes vertical integrated models. Vertical integrated models are mainly employed to improve another method. For example, Chen [42] adopts particle swarm optimization (PSO) techniques to obtain appropriate parameter settings for subtractive clustering. Integrated models can capture more information and result in much more accurate and stable forecasting performance.
Based on the review above, the main contribution of this paper to corporate failure forecasting can be summarized as follows. First, this study is the pioneering work of corporate failure forecasting in the Chinese energy sector. To date, little attention has been paid to the corporate failure forecasting in the energy sector of emerging economies. China is one of the largest energy consuming countries, and it is also the largest developing state. This paper complements prior literature by providing new empirical evidence from China. Second, given the role of textual data in discriminating failure corporates from normal ones [17] and the characters of the Chinese energy sector [26], we propose a novel integrated model with CNN-DL and SVM based on SS to make full use of textual data and numerical data. Until now, there has been no literature reporting the novel integrated model. This paper complements prior literature by proposing a novel integrated model for corporate failure forecasting.

The Proposed NIM
Deep learning models, which can obtain better identification performance than conventional methods in text analysis [17], have been applied to forecast corporate failure with textual data and numerical data simultaneously and have achieved excellent performance [10,17]. However, past studies have also demonstrated that deep learning models seem to be more suitable for identifying images and less suitable for numerical data analyses [10]. Furthermore, considering the big challenge in forecasting textual data and numerical data simultaneously (due to their widely different features), we believe that an effective way to get a better forecasting performance is by adopting a model to forecast with textual data and numerical data respectively, as shown as Figure 1. demonstrated how to integrate basis classifiers for corporate failure forecasting [5,32]. To date, integrated models can be divided into two groups. One includes horizontal integrated models, such as a UV ensemble model [30], a spline-rule ensemble model [41], etc. Horizontal integrated models are used to employ a combination technique to integrate basis classifiers. The other group includes vertical integrated models. Vertical integrated models are mainly employed to improve another method. For example, Chen [42] adopts particle swarm optimization (PSO) techniques to obtain appropriate parameter settings for subtractive clustering. Integrated models can capture more information and result in much more accurate and stable forecasting performance. Based on the review above, the main contribution of this paper to corporate failure forecasting can be summarized as follows. First, this study is the pioneering work of corporate failure forecasting in the Chinese energy sector. To date, little attention has been paid to the corporate failure forecasting in the energy sector of emerging economies. China is one of the largest energy consuming countries, and it is also the largest developing state. This paper complements prior literature by providing new empirical evidence from China. Second, given the role of textual data in discriminating failure corporates from normal ones [17] and the characters of the Chinese energy sector [26], we propose a novel integrated model with CNN-DL and SVM based on SS to make full use of textual data and numerical data. Until now, there has been no literature reporting the novel integrated model. This paper complements prior literature by proposing a novel integrated model for corporate failure forecasting.

The Proposed NIM
Deep learning models, which can obtain better identification performance than conventional methods in text analysis [17], have been applied to forecast corporate failure with textual data and numerical data simultaneously and have achieved excellent performance [10,17]. However, past studies have also demonstrated that deep learning models seem to be more suitable for identifying images and less suitable for numerical data analyses [10]. Furthermore, considering the big challenge in forecasting textual data and numerical data simultaneously (due to their widely different features), we believe that an effective way to get a better forecasting performance is by adopting a model to forecast with textual data and numerical data respectively, as shown as Figure  1.
In the framework, we propose an NIM to improve the performance of corporate failure forecasting in the Chinese energy sector by making full use of textual data and numerical data. Specifically, we divide the row samples and data into two groups-textual data and numerical data. CNN-DL is applied to textual data, and SVM is used for numerical data. Subsequently, SS is employed to integrate outputs of CNN-DL and SVM. Details of NIM are demonstrated as follows.

Corporate Failure Forecasting with Textual Data
CNN-DL is employed for corporate failure forecasting in the Chinese energy sector with textual data. There are three key points of CNN-DL: data preprocessing, word embedding, and convolutional neural network (CNN).

Data Preprocessing and Word Embedding
Textual data is a natural language [17]. It cannot be directly employed as inputs in many conventional forecasting models. Textual data have to be transformed into numerical data so that mathematic models can be adopted. Therefore, raw data have to be cleaned to reduce noise at first. Numbers and Hypertext Markup Language (HTML) tags included in the textual data are removed. In the framework, we propose an NIM to improve the performance of corporate failure forecasting in the Chinese energy sector by making full use of textual data and numerical data. Specifically, we divide the row samples and data into two groups-textual data and numerical data. CNN-DL is applied to textual data, and SVM is used for numerical data. Subsequently, SS is employed to integrate outputs of CNN-DL and SVM. Details of NIM are demonstrated as follows.

Corporate Failure Forecasting with Textual Data
CNN-DL is employed for corporate failure forecasting in the Chinese energy sector with textual data. There are three key points of CNN-DL: data preprocessing, word embedding, and convolutional neural network (CNN).

. Data Preprocessing and Word Embedding
Textual data is a natural language [17]. It cannot be directly employed as inputs in many conventional forecasting models. Textual data have to be transformed into numerical data so that mathematic models can be adopted. Therefore, raw data have to be cleaned to reduce noise at first. Numbers and Hypertext Markup Language (HTML) tags included in the textual data are removed. Second, given the difference between Chinese and English, the Jieba package of Python is employed to segment the textual document into words [43]. To reduce the number of dimensionality, words with low frequency are deleted.
After the preprocessing of textual data, words need to be converted to numerical representations. Many techniques have been proposed for this purpose, such as one-hot representation, distributed representation, word embedding, FastText, embedding from language, etc. [44]. One can refer to the review literature [45] for details. Here, given that understanding the semantics of textual data is much more important for corporate failure forecasting, we employ the skip-gram model to convert textual words to numerical word vectors. The skip-gram model, one of the most famous word embedding models, has proved to be an excellent tool for understanding the meaning of textual documents and converting them to word vectors [46]. Assuming that there are N training words, w 1 , w 2 , w 3 , · · · , w N , the object of the skip-gram model is to maximize the log probability, as shown as in Formula (1).
where c is the size of the context. Past studies have demonstrated that the skip-gram model is useful to represent a word w using a numerical vector v w with d dimensions. However, the time cost of the skip-gram model is higher, thus we adopt a negative sampling technique to address this issue.
For more details about the skip-gram model, please refer to the literature [46].

Convolutional Neural Network (CNN)
Each document can be converted to an n × d numerical metric by the vectorized presentation of words. The metric can be used as an input of the CNN to forecast corporate failure in the Chinese energy sector. CNN is widely used for mining textual data and has been successfully applied in some financial forecasting fields recently [10,17]. The most important point of CNN is that it can detect local features of documents by adopting m convolving filters. For more information about CNN, please refer to literature [47].
Assume there are some convolutional filters, denoted as θ = (θ 1 , θ 2 , θ 3 , · · · , θ M ). Then, θ is a function mapping R m to R. Given an input word vector of a document x ∈ R m , the i-th entry of the output u that transformed x to a phrase of length h by the filter θ can be calculated using Formula (2): where b ∈ R is the bias parameter of CNN, and f is the activation function, including sigmoid function, tanh function, ReLu function, etc. [11]. Here, we employ the ReLu function as f to expedite the convergent speed of CNN, shown as Formula (3): By conducting the method above to the textual document, a full version of features U ∈ R i−h+ 1 is obtained. To forecast corporate failure in the Chinese energy sector, the document including a key phrase (features) or not is the discriminate criteria. Therefore, the pooling operation is employed to maximize the value of each feature map vector, U = max{u}. At last, a sigmoid output unit and two layers of hidden filters can be added to obtain the forecasting outputs [17].

Corporate Failure Forecasting with Nemuerical Data
Various statistical models and machine learning models have been proposed for corporate failure forecasting with numerical data [32]. SVM has been widely used in various fields [48][49][50][51][52], including corporate failure forecasting [44], and has received great attention in recent years. By mapping original data into the high dimensional space using different kernel functions, SVM not only has advantages in forecasting with linear and non-linear financial data but also performs well in forecasting with high dimensional data and small sample sizes. Hence, given the character of Chinese energy firms, SVM is adopted to forecast with numerical data. Here, we present a brief review; for more information, please refer to the literature.
Given the training data set (x n , y n ), (n = 1, 2, 3, · · · , N), x n ∈ R M presents a vector in the M dimensional feature space, and y n ∈ {− 1, 1}, y n = 1 means that x n belongs to one category, and y n = − 1 means that x n belongs to the other category. The calculation of SVM can be presented as Formula (4), and constraint conditions are shown as Formula (5): where C, ξ is the key parameter. Generally, there are four widely applied kernel functions for mapping original data into high dimensional space, including Gaussian kernel, polynomial kernel, radial basis function (RBF) kernel, and sigmoid kernel.

Integration of Individual Outputs
An important innovation of this paper is that we introduce the SS as the integration method to integrate outputs of CNN-DL and SVM. To date, unanimous voting algorithm, equal weighted method, Borda count, Bayesian, neural network, evidence theory, rough set theory, etc., are well-known integration methods [20]. However, it is a big challenge to determine the weight of each individual model. To overcome the limitations above and make full use of individual outputs effectively, SS is employed as the integration method of NIM. SS, initiated by Molodtsov [29], has advantages in decision making and information discovering [9,53]. Details of applying SS to integrate individual outputs are demonstrated as follows.
Let U be a non-empty universe of objects, let E be a non-empty set of parameters related to objects in U, and the power set of U is P(U). A soft set over U is a pair S = (F, A), where F : A → P(U) is the approximate function of S and A ⊆ E. In others words, S is a parameterized family of subsets of U. With the definition of SS, a binary operation, named uni-int decision making method, is proposed to improve the performance of decision making by taking full information of SSs [9].
Let S = (F, A) and T = (G, B) be two SSs over the universe U, and the ∧-product (and product) of S and T equals to (P, The uni-int operators for S ∧ T are defined as follows and denoted as uni x int y and uni y int x . Then, the uni-int decision set is the union of two uni-int operators, shown as Formula (8).

Algorithm
Based on the analysis above, the algorithm of NIM, which is the key innovation of this paper, is illustrated in Figure 2.  1. The collected raw data are divided into two groups-textual data and numerical data. Textual data are text documents. Numerical data are financial ratios. 2. Textual data are cleaned by removing numbers and HTML tags and are segmented using the Jieba package of Python. At the same time, numerical data are normalized using Formula (9).
3. Apply the skip-gram model to convert each word of the textual document to a numerical vector. 4. Train CNN-DL with transformed textual data and train SVM with normalized financial ratios. 5. Obtain individual outputs of CNN-DL and SVM. 6. Input a universe to be the set of energy firms and to be the set of parameters. In particular, is the set of selected textual variables and financial ratios for corporate failure forecasting in the Chinese energy sector. 7. Construct two soft sets = ( − , ) and = ( , ) over . For soft set , the approximate function is CNN-DL, and parameter set A is the set of textual variables, ⊆ . For soft set , the approximate function is SVM, and parameter set B is the set of selected financial ratios, ⊆ . 8. Find the ∧-product (and product) of SSs and . 9. Apply the uni-int operations on ∧ . 10. Obtain the final integrated outputs of NIM.
In such a way, the proposed NIM integrates CNN-DL and SVM into SS and hence inherits advantages of three methods. We hope for an excellent performance of NIM on corporate failure forecasting in the Chinese energy sector with textual data and numerical data.

1.
The collected raw data are divided into two groups-textual data and numerical data. Textual data are text documents. Numerical data are financial ratios.

2.
Textual data are cleaned by removing numbers and HTML tags and are segmented using the Jieba package of Python. At the same time, numerical data are normalized using Formula (9). 3.
Apply the skip-gram model to convert each word of the textual document to a numerical vector.

4.
Train CNN-DL with transformed textual data and train SVM with normalized financial ratios.
Input a universe U to be the set of energy firms and E to be the set of parameters. In particular, E is the set of selected textual variables and financial ratios for corporate failure forecasting in the Chinese energy sector. In such a way, the proposed NIM integrates CNN-DL and SVM into SS and hence inherits advantages of three methods. We hope for an excellent performance of NIM on corporate failure forecasting in the Chinese energy sector with textual data and numerical data.

Model Evaluation Metrics
Many metrics have been proposed to evaluate the performance of forecasting models, such as accuracy (ACC), Matthews correlation coefficient (MCC), F1-score (F1), the area under curve (AUC) of receiver operating characteristic (ROC), etc. In this paper, due to the imbalanced testing sample set, AUC is employed as the evaluation metric [54]. AUC is widely used to measure the overall discriminatory power of models for flexibility and comprehensiveness [17]. Commonly, AUC scores range from zero to one, which means that the classification performance is the worst. One indicates the best classification performance. The bigger the AUC score is, the better the classification performance is.

Sample and Data
In this paper, real samples and data from Chinese listed energy firms are adopted for empirical experiments to verify the performance of NIM on corporate failure forecasting in the Chinese energy sector. In China, if the net profit of a listed firm is negative in two consecutive years, the firm will be labeled as special treatment (ST). According to the China Securities Supervision and Management Committee (CSSMC), negative net profit will increase the possibility of corporate failure [9,11].
Here, we treat ST as the corporate failure. Such listed energy firms are viewed as failure samples. The rest of the listed energy firms that have not been labeled as ST are regarded as non-failure samples.
During 1998-2018, 705 energy-related corporates were listed on the Shenzhen Stock Exchange and the Shanghai Stock Exchange. With no missing observations, there are 651 Chinese listed energy-related corporates adopted as empirical samples in this work, including 605 non-failure firms and 46 failure firms. In terms of modeling, all samples are divided into the energy training data set and the testing data set using the 10-times split technique. Here, 80% of the non-failure samples (484 samples) and the failure samples (36 samples) are employed as training samples, and the rest of the samples are used to evaluate the performance of models as testing samples. This percentage was widely used in many prior studies [2,17].
In addition, to observe the performance change of corporate failure forecasting with samples from the energy sector, we randomly collect another comprehensive training data set, including 484 non-failure samples and 36 failure samples in all sectors from the Shenzhen Stock Exchange and the Shanghai Stock Exchange during the period of 1998-2018. All data employed in this paper are collected from the CSMAR database and the CNINF database.

Variables
It is more difficult to forecast corporate status at the year t using data of the year (t − 2) or (t − 3) than it is using data of the year (t − 1) [9]. Here, we attempt to challenge it. Given that characters of listed firm failure for the year (t − 2) and (t − 3) are different [11], two variable sets are selected for forecasting corporate failure in the Chinese energy sector using data of the year (t − 2) and (t − 3) Numerical variables and textual variables are included in the selected variable set.

Numerical Variables Selection
For numerical data, we treat financial ratios as variables. Various financial ratios have been selected for corporate failure forecasting [7,9,55]. In this paper, based on the literature review, financial ratios that have been widely adopted in prior studies are summarized in Table A2. Then, we select numerical  variables from Table A2 with a training data set of the year (t − 2) and (t − 3) using the following approaches. First, financial ratios with null values are removed. Second, key financial ratios are filtered out by the significant test with 95% confidence interval. Third, the multi-collinearity test is employed to remove variables with high multi-collinearity relationships. Final financial ratios are treated as numerical variables for corporate failure forecasting in the Chinese energy sector, as listed in Tables 1-4. Table 1. Financial ratios selected using the energy training data set of the year (t − 2).

No. Financial Ratio No. Financial Ratio
x 2 Net income/total asset x 15 Earning per share x 19 Cash flow/total debt x 25 Debt ratio x 29 Market value equity/total debt x 38 Account receivable turnover x 45 Working capital turnover x 52 Sales growth rate of major operation x 57 One if total liabilities exceeds total assets, zero otherwise x 58 (NI t − NI t− 1 )/(|NI t | + |NI t− 1 |), NI t : Latest net income Table 2. Financial ratios selected using the energy training data set of the year (t − 3).

No. Financial Ratio No. Financial Ratio
x 2 Net income/total asset x 13 Equity growth ratio x 15 Earning per share x 19 Cash flow/total debt x 25 Debt ratio x 32 Long-term debt ratio x 38 Account receivable turnover x 48 Net cash flow of investing activities per share x 49 Growth ratio of net profit x 52 Sales growth rate of major operation x 57 One if total liabilities exceeds total assets, zero otherwise x 58 (NI t − NI t− 1 )/(|NI t | + |NI t− 1 |), NI t : Latest net income Table 3. Financial ratios selected using the comprehensive training data set of the year (t − 2).

No. Financial Ratio No. Financial Ratio
x 2 Net income/total asset x 9 Continuous 4 quarterly EPS x 19 Cash flow/total debt x 25 Debt ratio x 29 Market value equity/total debt x 30 Equity ratio x 38 Account receivable turnover x 45 Working capital turnover x 47 Net assets per share x 57 One if total liabilities exceeds total assets, zero otherwise x 58 (NI t − NI t− 1 )/(|NI t | + |NI t− 1 |), NI t : Latest net income EPS: earning per share. Cash flow/total debt x 25 Debt ratio x 30 Equity ratio x 32 Long-term debt ratio x 38 Account receivable turnover x 43 Total assets turnover x 45 Working capital turnover x 47 Net assets per share x 55 Cash flow to current liability x 56 Cash to main business income ratio x 57 One if total liabilities exceeds total assets, zero otherwise x 58 (NI t − NI t− 1 )/(|NI t | + |NI t− 1 |), NI t : Latest net income

Textual Variables Description
For textual data, as pointed out in prior literature [17,56], the management discussion and analysis section included in the annual report of listed firms can be used to distinguish firm risks. In China, the CSSMC intends for the management discussion and analysis section to offer more information for readers to improve the understanding of the current operating and financial status and to forecast the future status with higher accuracy. Hence, we employ the management discussion and analysis section as the textual data of a Chinese listed energy firm. It can be downloaded from the CNINF database.
More specifically, the Perl script is applied to extract the management discussion and analysis section at first. Then, samples with empty management discussion and analysis sections are excluded. To reduce noisy data, numbers, HTML tags, etc., are removed from extracted documents. The final preprocessed document is segmented into words using the Jieba package of Python. After the numerical vector presentation of words using the skip-gram model, we apply the convolutional process of CNN to extract features as textual variables.

Experiment Design
To investigate whether NIM has an acceptable performance for corporate failure forecasting in the Chinese energy sector, we design a comprehensive empirical experiment. CNN-DLT, SVM, CNN-DLM, IMUV, and IMET are included as benchmarks. Figure 3 illustrates the empirical experiment. More specifically, the Perl script is applied to extract the management discussion and analysis section at first. Then, samples with empty management discussion and analysis sections are excluded. To reduce noisy data, numbers, HTML tags, etc., are removed from extracted documents. The final preprocessed document is segmented into words using the Jieba package of Python. After the numerical vector presentation of words using the skip-gram model, we apply the convolutional process of CNN to extract features as textual variables.

Experiment Design
To investigate whether NIM has an acceptable performance for corporate failure forecasting in the Chinese energy sector, we design a comprehensive empirical experiment. CNN-DLT, SVM, CNN-DLM, IMUV, and IMET are included as benchmarks. Figure 3 illustrates the empirical experiment.  Details of the empirical experiment are presented as follows.
Step 1. Energy samples are randomly divided into the energy training data set and the testing data set using the 10-times split technique. Meanwhile, the comprehensive training data set with samples from different sectors is proposed as well.
Step 2. Select financial ratios for SVM and CNN-DLM with the numerical training data of the year ( − 2) and ( − 3).
Step 3. Train CNN-DLT, SVM, CNN-DLM, IMUV, IMET, and NIM with the energy training data set and the comprehensive training data set.
Step 4. Output forecasting results with the testing data set and compare the performance of each forecasting model.

Results and Discussion
As the key process of corporate failure forecasting is mapping inputs to binary outputs, we use the back propagation algorithm to train the CNN-DL. The early stopping technique is employed to prevent the overfitting problem [57]. The empirical experiment is repeated 20 times on CNN-DLs Details of the empirical experiment are presented as follows.
Step 1. Energy samples are randomly divided into the energy training data set and the testing data set using the 10-times split technique. Meanwhile, the comprehensive training data set with samples from different sectors is proposed as well.
Step 2. Select financial ratios for SVM and CNN-DLM with the numerical training data of the year (t − 2) and (t − 3).
Step 3. Train CNN-DLT, SVM, CNN-DLM, IMUV, IMET, and NIM with the energy training data set and the comprehensive training data set.
Step 4. Output forecasting results with the testing data set and compare the performance of each forecasting model.

Results and Discussion
As the key process of corporate failure forecasting is mapping inputs to binary outputs, we use the back propagation algorithm to train the CNN-DL. The early stopping technique is employed to prevent the overfitting problem [57]. The empirical experiment is repeated 20 times on CNN-DLs and selects the optimal set of forecasting results as the final output of CNN-DLs. For SVM, RBF function is applied as kernel function, and optimal parameters (C, ξ) are searched using the grid search technique and the cross validation method. This paper is executed with Matlab (2016b) and Python (3.6). Some codes are presented in Supplementary Materials.

Forecasting Results and Analysis
The out-of-sample forecasting results of SVM, CNN-DLT, CNN-DLM, IMUV, IMET, and NIM using the testing data set of the year (t − 2) and (t − 3) are illustrated in Figures 4 and 5, respectively. function is applied as kernel function, and optimal parameters ( , ) are searched using the grid search technique and the cross validation method. This paper is executed with Matlab (2016b) and Python (3.6). Some codes are presented in Supplementary Materials.

Forecasting Results and Analysis
The out-of-sample forecasting results of SVM, CNN-DLT, CNN-DLM, IMUV, IMET, and NIM using the testing data set of the year ( − 2) and ( − 3) are illustrated in Figures 4-5, respectively.

Results of Models Trained Using the Energy Training Data
For models trained using the energy training data set, the forecasting results using the testing data set of the year ( − 2) are illustrated in Figure 4a. It is easy to find out that the proposed NIM has the biggest AUC score, and CNN-DLT has the smallest AUC score. Without any surprise, all integrated models (IMUV, IMET, and NIM) have a much better forecasting performance than CNN-DLM does, because the algorithm of integrated models cans mine numerical data and textual data more efficiently. Consistent with the study of Mai et al. [17], CNN-DLM, which is trained and tested using both numerical data and textual data simultaneously, performs better than CNN-DLT and SVM. According to the out-of-sample forecasting performance, the models can be ranked as follow: NIM > IMET > IMUV > CNN-DLS > SVM > CNN-DLF.  For forecasting models trained using the comprehensive training data set, forecasting results using the testing data set of the year ( − 2) are illustrated in Figure 5a. It is easy to see that the proposed NIM performs the best, and CNN-DLT performs the worst. The more important point is that the performance of SVM is better than that of CNN-DLM. Because the management discussion and analysis section of listed firms in different sectors are quite different, a huge volume of useless textual data results in decreasing the AUC score. For models trained using the comprehensive training data set, forecasting results using the testing data set of the year ( − 3) are presented in Figure 5b. The conclusion is similar to results using the testing data set of the year ( − 2).

Comparsions and Discussions
For comparison and analyses, AUC scores of each model are summarized in Tables 5 and 6.  Tables 5 and 6, one can easily find that models trained using the energy sector data set uniformly outperform models trained using the comprehensive training data set no matter which

Results of Models Trained Using the Energy Training Data
For models trained using the energy training data set, the forecasting results using the testing data set of the year (t − 2) are illustrated in Figure 4a. It is easy to find out that the proposed NIM has the biggest AUC score, and CNN-DLT has the smallest AUC score. Without any surprise, all integrated models (IMUV, IMET, and NIM) have a much better forecasting performance than CNN-DLM does, because the algorithm of integrated models cans mine numerical data and textual data more efficiently. Consistent with the study of Mai et al. [17], CNN-DLM, which is trained and tested using both numerical data and textual data simultaneously, performs better than CNN-DLT and SVM. According to the out-of-sample forecasting performance, the models can be ranked as follow: NIM > IMET > IMUV > CNN-DLS > SVM > CNN-DLF.
For forecasting models trained using the energy training data set, the forecasting results using the testing data set of the year (t − 3) are shown in Figure 4b. It is similar to the performance using the testing data set of the year (t − 2), but there are two differences. One is that the forecasting performance of IMUV is better than IMET's. That is because evidence theory (ET) has disadvantages in integrating outputs of SVM and CNN-DLT when the output of SVM and CNN-DLT is seriously conflicted. Longer forecasting terms will result in decreasing the consistency of outputs of SVM and CNN-DLT. The other one is that the forecasting performance of CNN-DLM becomes worse than SVM's. This is not a surprise due to the loss of timely textual information. Moreover, the useless textual data becomes noisy data for forecasting and results in inferior performance.

Results of Models Trained Using the Comprehensive Training Data
To verify the performance of corporate failure forecasting in one sector, we train models using the comprehensive training data set of the year (t − 2), (t − 3), and evaluate models using the testing data set of the year (t − 2), (t − 3) respectively. Out-of-sample forecasting ROC curves of each model are summarized in Figure 5.
For forecasting models trained using the comprehensive training data set, forecasting results using the testing data set of the year (t − 2) are illustrated in Figure 5a. It is easy to see that the proposed NIM performs the best, and CNN-DLT performs the worst. The more important point is that the performance of SVM is better than that of CNN-DLM. Because the management discussion and analysis section of listed firms in different sectors are quite different, a huge volume of useless textual data results in decreasing the AUC score.
For models trained using the comprehensive training data set, forecasting results using the testing data set of the year (t − 3) are presented in Figure 5b. The conclusion is similar to results using the testing data set of the year (t − 2).

Comparsions and Discussions
For comparison and analyses, AUC scores of each model are summarized in Tables 5 and 6.   Tables 5 and 6, one can easily find that models trained using the energy sector data set uniformly outperform models trained using the comprehensive training data set no matter which year of the testing data set is employed for evaluation. Therefore, it is an effective way to improve performance of corporate failure forecasting in the Chinese energy sector by focusing on this sector.
Specifically, the proposed NIM has the highest AUC score no matter which year of the testing data set is employed or what training data set is used. As shown in Figure 6, the performance of NIM changes the least when the energy training data set is replaced by the comprehensive training data set no matter which year of the data set is used for forecasting. This means that NIM is an effective model for corporate failure forecasting in the Chinese energy sector. However, one can also see that the performance of NIM trained using the energy training data set is better than NIM trained using the comprehensive training data set. IMUV, IMET, and SVM have similar performances. For Chinese listed firms in different sectors, the management discussion and analysis sections have great differences [58]. As a result, there are many useless textual data included in the training data set if it is collected from different sectors. Under such context, it is more difficult to mine valuable information for corporate failure forecasting. The performance of CNN-DLT and CNN-DLM have big changes when the energy training data set is replaced by the comprehensive training data set no matter which year of the testing data set is used. This means that textual data can play a much more significant role in forecasting corporate failure by focusing on one sector.

Results Comparison and Discussion with the Year of (t − 2) and (t − 3)
From Tables 5 and 6, for each employed model, one can easily find that corporate failure forecasting in the Chinese energy sector with the data set of the year ( − 2) outperforms forecasting with the data set of the year ( − 3). This is not a surprise due to timely information loss with longer forecasting periods. Forecasting corporate failure on a long horizon is more complex and difficult than short term forecasting.
Similar to the results above, no matter which year of the testing data set is used, our NIM performs the best. As shown in Figure 7, no matter what the training data set is applied to forecast, NIM not only has the highest AUC score but also obtains the least change when the data set of the year ( − 2) is replaced by the data set of the year ( − 3). This means that NIM can effectively forecast corporate failure in the Chinese energy sector with textual data and numerical data under the longer forecasting period. SVM and IMUV have similar results. IMET, CNN-DLM, and CNN-DLT have worse performances.  For Chinese listed firms in different sectors, the management discussion and analysis sections have great differences [58]. As a result, there are many useless textual data included in the training data set if it is collected from different sectors. Under such context, it is more difficult to mine valuable information for corporate failure forecasting. The performance of CNN-DLT and CNN-DLM have big changes when the energy training data set is replaced by the comprehensive training data set no matter which year of the testing data set is used. This means that textual data can play a much more significant role in forecasting corporate failure by focusing on one sector.

Results
Comparison and Discussion with the Year of (t − 2) and (t − 3) From Tables 5 and 6, for each employed model, one can easily find that corporate failure forecasting in the Chinese energy sector with the data set of the year (t − 2) outperforms forecasting with the data set of the year (t − 3). This is not a surprise due to timely information loss with longer forecasting periods. Forecasting corporate failure on a long horizon is more complex and difficult than short term forecasting.
Similar to the results above, no matter which year of the testing data set is used, our NIM performs the best. As shown in Figure 7, no matter what the training data set is applied to forecast, NIM not only has the highest AUC score but also obtains the least change when the data set of the year (t − 2) is replaced by the data set of the year (t − 3). This means that NIM can effectively forecast corporate failure in the Chinese energy sector with textual data and numerical data under the longer forecasting period. SVM and IMUV have similar results. IMET, CNN-DLM, and CNN-DLT have worse performances.
Similar to the results above, no matter which year of the testing data set is used, our NIM performs the best. As shown in Figure 7, no matter what the training data set is applied to forecast, NIM not only has the highest AUC score but also obtains the least change when the data set of the year ( − 2) is replaced by the data set of the year ( − 3). This means that NIM can effectively forecast corporate failure in the Chinese energy sector with textual data and numerical data under the longer forecasting period. SVM and IMUV have similar results. IMET, CNN-DLM, and CNN-DLT have worse performances.

Summary
For corporate failure forecasting in the Chinese energy sector, empirical results demonstrate three important conclusions. First, NIM can improve the performance of corporate failure forecasting in the Chinese energy sector by integrating CNN-DLT and SVM based on SS. CNN-DLT is applied to the textual data, and SVM is applied to the numerical data. Then, outputs of CNN-DL and SVM are integrated by SS. Second, it is useful to improve the performance of corporate failure forecasting in the Chinese energy sector by focusing on this sector. Third, textual data can play an important role in corporate failure forecasting in the Chinese energy sector, but the validity decreases with longer forecasting horizons.

Conclusions
In this study, we extend the research of corporate failure forecasting by proposing a novel integrated model with convolutional neural network oriented deep learning and support vector machine based on soft set theory for corporate failure forecasting in the Chinese energy sector. Given characters of energy firms in China, both numerical data and textual data are considered as inputs here. Due to different features of numerical data and textual data, CNN-DL is employed to forecast corporate failure based on the textual data, and SVM is used to forecast based on the numerical data. Then, outputs of CNN-DL and SVM are integrated using SS. Hence, NIM inherits advantages and simultaneously avoids disadvantages of CNN-DL, SVM, and SS. This algorithm enables NIM to make full use of numerical data and textual data. Compared with benchmarks, NIM shows superior performance for corporate failure forecasting in the Chinese energy sector. Empirical results also demonstrate that it is an effective way to improve the performance of corporate failure forecasting in the Chinese energy sector by focusing on this sector.
Though empirical results are satisfactory, there is some work needed to be done in the future to improve the forecasting performance. First, as the key component for the success of NIM, the word segmentation technique with high computing efficiency should be studied more with consideration to the features of Chinese. Second, the management discussion and analysis section is used as textual data for corporate failure forecasting in the Chinese energy sector. Some related national polices and news should be investigated as textual data in future research. Third, financial ratios are employed as numerical variables in this study. More numerical variables should be included for corporate failure forecasting in the Chinese energy sector, such as market data, governance data, national economic data, etc. Net profit margin of total assets x 4 Retained earnings/total asset x 5 Tax rates x 6 Earnings before interest and taxes/total asset x 7 Equity value per share x 8 No-credit interval x 9 Continuous 4 quarterly EPS (earning per share) x 10 log(total assets/Gross National Product price-level index) x 11 Operating earnings per share x 12 Return on equity x 13 Equity growth ratio x 14 Return on total assets x 15 Earning per share x 16 Return on invested capital x 17 Current ratio x 18 Operating margin x 19 Cash flow/total debt x 20 Profit margin x 21 Cash flow/total asset x 22 Asset-liability ratio x 23 Cash flow/sales x 24 Tangible net debt ratio x 25 Debt ratio x 26 Working capital ratio x 27 Working capital/total asset x 28 Working capital/net assets x 29 Market value equity/total debt x 30 Equity ratio x 31 Current assets/total asset x 32 Long-term debt ratio x 33 Quick asset/total asset x 34 Equity to liability ratio x 35 Sales/total asset x 36 Interest coverage ratio x 37 Current debt/sales x 38 Account receivable turnover x 39 Quick asset/sales x 40 Account payable turnover x 41 Working capital/sales x 42 Inventories turnover x 43 Total assets turnover x 44 Fixed assets turnover x 45 Working capital turnover x 46 Net operating cash flow per share x 47 Net assets per share x 48 Net cash flow of investing activities per share x 49 Growth ratio of net profit x 50 Capital maintenance and appreciation x 51 Provident fund per share x 52 Sales growth rate of major operation x 53 Growth ratio of total assets x 54 Price-to-book ratio x 55 Cash flow to current liability x 56 Cash to main business income ratio x 57 One if total liabilities exceeds total assets, zero otherwise x 58 (NI t − NI t− 1 )/(|NI t | + |NI t− 1 |), NI t : Latest net income