Data were collected for failed and survived banks, public sector banks (public sector banks (PSBs) are a major type of bank in India, where a majority stake, i.e., more than 50%, is held by the government) and private banks (private sector banks in India are banks where the majority of the shares or equity are not held by the government but by private shareholders) in India for the period January 2000 to December 2017. In this research study, we assumed that a bank failed if any of these conditions occurred: merger or acquisition, bankruptcy, dissolution, and/or negative assets (Pappas et al. 2017
; Shrivastava 2019
). Data were collected for only those banks where data existed for four years before their failure. In the same way as the surviving banks, the last four years of data were considered in the sample. In our final sample, out of 59 banks, 42 banks were surviving banks and 17 were failed banks. For each bank, 25 features and financial ratios were calculated for four years (Pappas et al. 2017
). In this way, in the sample, we had a total of 25 × 4 = 100 variables per bank (Pappas et al. 2017
). The information about 25 features and financial ratios is given in Appendix A
. During data collection, we treated data of the immediately preceding year (t−1) as recent and the data of (t−2), (t−3), and (t−4) as past information.
Data collected for the failed and survived banks for the period Jan. 2001 through Dec. 2017 contained a mix of importance and redundancy features. All features of the data collected for modeling are not equally important. Some have significant contributions, and some others have no significance in modeling. A model that contains redundant and noisy features may lead to the problem of overfitting. To reduce non-significant features and computation complexity in the forecasting model, well-known two-step feature selection techniques, relief algorithm and support vector machine, have been used.
3.1. Two-Step Feature Selection
In this study, we used the relief algorithm, an instance-based learning algorithm, which comes from the family of a filter-based feature selection technique (Kira and Rendell 1992
; Aha et al. 1991
). This algorithm maintains trade-off between the complexity and accuracy of any statistical or machine learning model by adjusting to various data features. In this feature selection method, the relief algorithm assigns the weight to explanatory features that measure the quality and relevance based on the target feature. This weight ranges between −1 and 1, where −1 shows the worst or most redundant features and +1 shows the best or most useful features. The relief algorithm is a non-parametric method that computes the importance of input features concerning other input features and does not make any assumptions regarding the distribution of features or sample size.
This algorithm just indicates the weight assigned to each explanatory feature but does not provide a subset of the features directly. Based on the output, we discarded all such features that had less than or equal to zero weight. The features having weights greater than zero will be relevant and explanatory. A computer pseudo-code for the feature selection relief algorithm is given below (Kira and Rendell 1992
; Aha et al. 1991
Initial Requirement: We need features for each instance where target classes are coded as −1 or 1 based on bankruptcy prediction. In this study, we used ‘R’ programming for the relief algorithm, where “M” denotes the number of records or instances selected in the training data, “F” denotes the number of features in each instance of training data, “N” denotes the random training instances from “M” instances of training data to update the weight of each feature, and “A” represents the randomly selected feature for randomly selected training instances.
Initially, we assume that the weight of each feature is zero, i.e., W [A]: = 0.0
For i: = 1 to N do
Select a random target instance say Li
Check the closest hit ‘H’ and closest miss ‘M’ for the randomly selected instance.
For A: = 1 to F do
Weight [A]: = Weight [A] − diff [A, Li, H]/N + diff [A, Li, M]/N
End for (second loop)
End for (First loop)
Return the weight of features.
The relief algorithm selects the instances from the training data say “Li” without replacement. For selected instances from training data, the weight of all features is updated based on the difference observed for target and neighbor instances. This is a cyclic process, and in each round the distance of the “target” instance with all other instances is computed. This method selects the two closest neighbor instances of the same class (−1 or 1) called the closest hit (‘H’) and the closest neighbor with a different class called the closest miss (‘M’). The weights are updated based on closest hit or closest miss. If it is the closest hit, or when the features are different for the same class of instances, the weight decreases by the amount 1/N, and when the features are different in the instances of a different class, the weight increases by the amount 1/N. This process continues until all features and instances are completed by the loop. The following is an example of the relief algorithm:
Class of Instances
Target Instance (Li) CDCDCDCDCDCDCD 1
Closest Hit (H) CDCDCDCCCDCDCD 1
Here, due to mismatch between features in the instances of the same classes, indicated in the red color, a negative weight −1/N is allocated to the feature.
Class of Instances
Target Instance (Li) CDCDCDCDCDCDCD 1
Closest Hit (H) CDCDCDCCCDCDCD −1
Here, due to mismatch between features in the instances of the different classes, indicated in red color, a positive weight 1/N is allocated to the feature. This process follows until the last instances and is valid for only discrete features.
The diff. function in the above pseudo-code computes the difference in the value of feature “A” with two instances I1
is either “H” or “M” when performing weight updates. The diff. function for the discrete feature is defined as
and the diff. function for the continuous feature is defined as
The maximum and minimum values of feature “A” are calculated over all instances. Due to normalization of the diff. function, the weight updates for discrete and continuous features always lie between 0 and 1. While updating the weight of feature “A”, we divide the output of the diff. function by “N” to bring the final weights of features between 1 and −1. The weight of each feature calculated through the relief algorithm is listed in Appendix B
. The selected features from the relief algorithm are fed into SVM to find the combination of significant features through an iterative process based on the target feature. The feature set that gives the highest accuracy of SVM is known as optimal features. There are several benefits of using a relief algorithm as an initial screening of features. First, the relief algorithm calculates the quality of the feature by comparing it with other features. Second, the relief algorithm does not require any assumptions on the features of the dataset. Third, it is a non-parametric method.
We used the R package “relief (formula, data, neighbours.count, sample.size)” to find the relief score as given in Appendix B
, where argument “formula” is a symbolic description of the model, argument “data” is the data to process, argument “neighbours.count” is the number of neighbors to find for every sampled instance, and the argument “sample.size” is the number of instances to sample.