Input/Output Variables Selection in Data Envelopment Analysis: A Shannon Entropy Approach

: The purpose of this study is to provide an efﬁcient method for the selection of input–output indicators in the data envelopment analysis (DEA) approach, in order to improve the discriminatory power of the DEA method in the evaluation process and performance analysis of homogeneous decision-making units (DMUs) in the presence of negative values and data. For this purpose, the Shannon entropy technique is used as one of the most important methods for determining the weight of indicators. Moreover, due to the presence of negative data in some indicators, the range directional measure (RDM) model is used as the basic model of the research. Finally, to demonstrate the applicability of the proposed approach, the food and beverage industry has been selected from the Tehran stock exchange (TSE) as a case study, and data related to 15 stocks have been extracted from this industry. The numerical and experimental results indicate the efﬁcacy of the hybrid data envelopment analysis–Shannon entropy (DEASE) approach to evaluate stocks under negative data. Furthermore, the discriminatory power of the proposed DEASE approach is greater than that of a classical DEA model.


Introduction
Data envelopment analysis (DEA) is a non-parametric approach based on mathematical programming and multi-criteria decision making (MCDM) that is capable of evaluating the performance, ranking, classification, and benchmarking of a set of homogeneous decision-making units (DMUs), according to the desired inputs and outputs [1][2][3][4][5][6]. The DEA approach is one of the most powerful, applicable, and effective methods in the field of performance evaluation among researchers and is widely used in various fields such as agriculture, airline, airport, bank, gas, hospital, hotel, information technology, insurance, manufacturing, mutual fund, power, production system, research and development, school, sport, stock exchange, supply chain, university, water, etc. .
The main advantages of the DEA approach are as follows: it needs no knowledge about the production function and its constraints; able to use multiple inputs and multiple outputs simultaneously; needs no knowledge about the weight of each input and output indicator; able to use various inputs and outputs with different measurement scales; compares inefficient DMUs with reference sets directly; ranks decision-making units; and benchmarks for inefficient DMUs. The main disadvantages of the DEA approach can be summarized as follows: it measures relative efficiency instead of absolute efficiency; finds it difficult to solve large problems due to high computational value; experiences many deviations in results due to measurement error; potential change in performance evaluation results due to change in type and number of inputs and outputs; difficulty of statistical tests including hypothesis test due to its non-parametric nature; fragility of performance obtained due to the sensitivity of the results to sample change [28][29][30][31][32][33][34][35].
In addition to the above advantages and disadvantages, the lack of general agreement on the selection and determination of input and output variables is one of the most important challenges in applying the DEA method in various applications [36][37][38][39][40][41]. Another important point that should be considered in using the DEA approach is to increase the discriminatory power of the model in evaluating the performance of DMUs, and to differentiate as much as possible their performance results. Accordingly, in the current study, the hybrid data envelopment analysis-Shannon entropy (DEASE) approach is proposed. Notably, the proposed DEASE can be employed under negative data and values.
The rest of this paper is organized as follows. The concepts, definitions, and explanations of the Shannon entropy technique are introduced in Section 2. Then, the steps of the hybrid data envelopment analysis-Shannon entropy approach as a proposed approach of the current research is presented in Section 3. In the following, the proposed DEASE approach is implemented in a real-world case study from the Tehran stock exchange (TSE), and the experimental results are analyzed in Section 4. Finally, the conclusions and future research directions are discussed in Section 5.

Shannon Entropy Technique
Determination of the relative weights of indicators in multi-criteria decision making (MCDM) is always one of the basic and required steps in the problem-solving process. It should be noted that among the well-known and widely used methods in determining the weights of indicators, expert opinions-based approaches, least squares method, special vector technique, and Shannon entropy can be mentioned. In the following, the Shannon entropy technique is introduced as one of the most important methods for determining the weight of criteria.
Entropy in information theory is a measure of the amount of uncertainty, and it is expressed by a discrete probability distribution [42][43][44][45][46][47]. Notably, in entropy method, more fluctuations and scattering in the values of criterion indicate its greater importance factor and weight [48][49][50][51][52][53]. Accordingly, the steps of the Shannon entropy technique to determine the weights of the indices are as follows: Step (1) First, the decision matrix is created with m alternatives and n criteria in the form of Equation (1), which x ij is the value of i th alternative in terms of j th criterion.
Step (2) The decision matrix is normalized using Equation (2). By dividing the value of each column by the sum of its column, the normalized value p ij is obtained as follows.
Step (3) The entropy of each criterion E j is calculated using Equation (3). A constant value keeps the value of E j between 0 and 1.
Step (4) The degree of deviation d j from the information that is generated for j th criterion is calculated from Equation (4). The degree of deviation indicates the amount of useful information that the relevant criterion provides to the decision maker.
Step (5) Finally, the weight w j is calculated from Equation (5), in which the weight of j th criterion is obtained by dividing d j by the sum of d j .
Thus, the criterion with more weight w j is chosen, because less weight indicates that the effect of the criterion is almost the same for all the alternatives.

The Proposed Approach
In this section, the process of proposing and implementing the hybrid data envelopment analysis-Shannon entropy approach is presented to the input/output selection to improve the discriminatory power of the model for performance measurement of the DMUs. In order to be more comprehensive and applicable to the proposed approach, the basic steps of this approach, by assuming the presence of negative values and data, are presented as follows: Step (1) Modifying Indicators with Negative Values: As observed in the steps of the Shannon entropy technique, the existence of a function ln for computation of p ij means the entropy method is used only for positive indicators and quantities. Therefore, to solve this problem, a suggested method is used in this research. In this way, first, in the criterion column whose values are negative for some alternatives, the largest and smallest numbers are determined, and then their difference from each other (Max − Min) is calculated. Then, the value of Max − Min + 1 and 1 are assigned to the largest column number and the smallest column number, respectively. The values of the other column numbers are obtained using the relation Value − Min + 1. Thus, using the suggested method, all the values that are related to the criterion are presented in a positive value.
Step (2) Implementing the Shannon Entropy Technique: After modifying and reviewing to change the amplitude of negative values in the previous step and preparing a new data structure, the Shannon entropy approach for each of the input and output groups that have similar indicators is implemented and calculated.
Step (3) Selecting of Inputs and Outputs: According to the values of the Shannon entropy technique, in each of the input and output groups, among the similar indicators in each group, the criterion with more weight as the final input or output of the DEA model is selected.
Step (4) Checking the Isotonicity Relations between Inputs and Outputs: Since the inputs and outputs used in DEA should satisfy the condition that greater quantities of inputs provide increased output, the appropriateness of the inputs and outputs that were included in the previous step was tested by conducting an isotonicity test [54][55][56][57][58][59]. An isotonicity test involves the calculation of all inter-correlations between inputs and outputs for identifying whether increasing amounts of inputs lead to greater outputs [60][61][62][63][64][65][66].
Step (5) Choosing the Data Envelopment Analysis Model: After determining the inputs and outputs, the data envelopment analysis model should be selected. In this study, due to the presence of negative data, the range directional measure (RDM) model [67] is used. Now, suppose that there are g homogeneous decision-making units that con-vert v inputs θ k = (θ 1k , θ 2k , . . . , θ vk ) into u outputs ϕ k = (ϕ 1k , ϕ 2k , . . . , ϕ uk ). Finally, the envelopment form of the RDM model is Model (6).
Ψ * expresses the measurement of inefficiency, and the efficiency of the DMU under evaluation is equal to 1 − Ψ * and 1/(1 + Ψ * ) in the input-oriented model and output-oriented model, respectively. Moreover, ξ αq and ξ βq in Model (6) are the range of possible improvements for the DMU under evaluation, which are defined as Equations (7) and (8), respectively.
Step (6) Calculating the Efficiency Scores of DMUs: Finally, the research data envelopment analysis model is implemented for the extracted data and related to the selected input and output indicators, and the performance assessment results of the decision-making units are calculated and analyzed.

Case Study
In order to implement the proposed DEASE approach, 15 stocks from the food and beverage industry of the Tehran stock exchange are selected as a case study for the research. It should be noted that in order to evaluate the stocks fundamentally and comprehensively, 15 indicators in the form of five groups (including liquidity, asset utilization, leverage, profitability, and growth) have been considered [68][69][70][71][72][73]. By applying the Microsoft Excel Software, the Shannon entropy technique is implemented, and the obtained results are presented in Table 1. A description of all the financial parameters is introduced in Table 2. Then, in each of the five groups related to different financial ratios, the criterion with more weight is selected from the three criteria. Concerning the Shannon entropy technique, the inputs and outputs of the data envelopment analysis model are presented in Figure 1. In the DEA approach, performance metrics can be classified as the larger the better for the outputs, and the smaller the better for the inputs. In other words, positively connoted (the more the better) factors are used as outputs; conversely, negatively connoted (the fewer the better) factors are classified as inputs [74,75].

O (3)
Cash Ratio Now, after determining the selected variables of the DEA model, the information related to the mentioned indicators in the form of two inputs and three outputs for 15 stocks that are active in the food and beverage industry is presented in Table 3. Pearson's correlation is taken to test the isotonicity relationship between the chosen input and output parameters. Accordingly, the inter-correlations of all the indicators are positive and significant, suggesting that the specification of the DEA model is valid. Now, after determining the selected variables of the DEA model, the information related to the mentioned indicators in the form of two inputs and three outputs for 15 stocks that are active in the food and beverage industry is presented in Table 3. Pearson's correlation is taken to test the isotonicity relationship between the chosen input and output parameters. Accordingly, the inter-correlations of all the indicators are positive and significant, suggesting that the specification of the DEA model is valid. Finally, using the range directional measure model and LINGO Software, the performance of all 15 stocks is calculated based on the data that are extracted from the Tehran stock market. The results can be seen in Table 4. According to stock market experts' views, the proposed DEASE approach is an efficient, applicable, and powerful approach with the ability to calculate performance and evaluate all stocks in the presence of negative data. Similar to the findings of Xie et al., [47], the experimental results also indicate the acceptable discriminatory power of the hybrid data envelopment analysis-Shannon entropy approach. Notably, the results that are obtained from the DEASE approach can be applied for the construction of desirable investment portfolios in the stock market by recognizing good stocks and filtering bad stocks.

Data Availability Statement:
The data used in the study are available from the authors and can be obtained upon reasonable requests.