Next Article in Journal
The Deep Learning LSTM and MTD Models Best Predict Acute Respiratory Infection among Under-Five-Year Old Children in Somaliland
Previous Article in Journal
New Method for Calculation of Radiation Defect Dipole Tensor and Its Application to Di-Interstitials in Copper
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Global Search Method for Inputs and Outputs in Data Envelopment Analysis: Procedures and Managerial Perspectives

School of Management, Universiti Sains Malaysia, Penang 11800, Malaysia
Symmetry 2021, 13(7), 1155; https://doi.org/10.3390/sym13071155
Submission received: 24 May 2021 / Revised: 8 June 2021 / Accepted: 10 June 2021 / Published: 28 June 2021

Abstract

:
Effective decision-making techniques are essentially dependent on the capacity to balance (symmetry) requirements and their fulfilment, that is, the capacity to accurately identify a collection of factors that have the greatest influence on performance. Data envelopment analysis (DEA) is a useful nonparametric method in operations research for performance estimation by measuring the efficiency scores of the decision-making units. In this paper, we develop a global search method (GSM) for selecting the key input and output variables in DEA models. The GSM measures the effects of variables with respect to the efficiency scores directly, i.e., by considering the average change when a variable is added or removed from the analysis. It aims to produce DEA models that include only the key variables with the largest impact on the results. The effectiveness of the GSM is demonstrated using a case study from 15 US banks, with the results analyzed and discussed. The outcomes indicate that the GSM yields useful insight for decision-makers to make informed decisions in undertaking their problems.

1. Introduction

Data envelopment analysis (DEA) has been regarded as a powerful technique to select and combine models for general k-class classification problems in machine learning [1,2]. The application of DEA as an ensemble for classifiers in machine learning is inspired by the ROCCH (receiver operating characteristics convex hull) [3] which was mainly for the two-class classification problem. DEA was first proposed by [1] to construct ensembles for classifiers and they showed that DEA identified a convex hull that is identical to that of ROCCH for a classification problem with two classes. From then onwards, DEA has been utilized as an ensemble of classifiers that can be applicable to problems with multiple classes [2]. Baumgartner and Serpen [4] had further shown that integrating multiple base classifiers into an aggregated outcome (or ensemble) has turned out to be an efficient strategy for achieving superior prediction performance.
The underlying fundamentals of DEA is based on a nonparametric approach that addresses the issue of determining the efficiency of various “decision-making units” (DMUs) based on how inputs are converted into outputs [5]. A DMU is rated as fully efficient (100%) if and only if the performance of other DMUs does not show that some of its inputs or outputs can be improved without worsening some of its other inputs or outputs [6]. DEA, which is extensively used to investigate a wide range of industries [7,8] and has lately been implemented in the big-data toolbox [9], employs mathematical programming to discover efficient DMUs, which constitute an efficient frontier. The efficiency score in DEA analysis highly relies on the set of input and output variables used in the efficiency measure. Hence, if DEA is to be fully utilized in evaluating as many different classifiers as possible, inputs and outputs variables selection in a DEA model is critical. We therefore expect to address this problem of DEA by developing a global search method (GSM) for optimizing variables selection.
The contributions of this paper are as follows. Firstly, this study enhances DEA for efficiency measurement which is the key concept for performance. Secondly, this paper generates a searching algorithm for variables selection that include variables with the largest impact on the DEA results, in which the algorithm is grounded on optimization approach. Finally this study yields useful managerial insights for decision-makers to make reliable judgements and to be used as guidelines to adjust or balance (symmetrize) their strategies and needs with proper allocation of resources.
This paper is organized as follows. Section 2 presents the literature on variables selection in DEA. Section 3 presents the methodology of the global search method (GSM). In Section 4, we illustrate this method using sample datasets and discuss the new managerial insights resulting from the GSM. In Section 5, further illustration and validations on GSM are presented using two established numerical examples and a case study on US banks. Concluding remarks are presented in Section 6.

2. Past Research on Variables Selection in DEA

It is very important to select the potential variables to be considered in a DEA model. In general, any resource used by a DMU should be treated as an input variable, and the outputs come from the performance and activity measures when the DMU converts its resources to produce products or services. However, how to choose the right input and output variables has attracted only little attention in the existing literatures. Most of the existing studies on DEA simply treat the input and output variables as “givens” and then go on to deal with the analysis. As it was until 1989, Golany and Roll [10] gave an overall view of DEA that should focus on the choice of variables in addition to the methodology itself. The attention to variable selection is important because the increasing number of input and output variables will constrain the weights assigned to the variables, and the analysis of the results will become less discerning. Jenkins and Anderson [11] applied regression and correlation analysis to identify which variables were to be omitted from the DEA model on the basis of the minimum loss of information. Information was related to the variance of an input or output variable about its mean value. Morita and Avkiran [12] proposed a statistical approach to find an optimal inputs/outputs combination by using diagonal layout experiments.
While there is no consensus on how best to select the variables, many guidelines have been proposed in the literature suggesting limiting the number of variables relative to the number of DMUs. In general, a rough rule of thumb in the envelopment model of DEA is to choose n (= the number of DMUs) equal to or greater than max{m × s, 3 × (m + s)}, where m and s are the inputs and outputs variables respectively (see [13] for more details). The challenge in DEA is to find a ‘parsimonious’ model, using as many input and output variables as needed but as few as possible. The greater the number of input and output variables in a DEA, the higher is the dimensionality of the linear programming solution space, and the less discerning is the analysis [11].
Several methods have been proposed that involve the analysis of correlation among the variables, with the goal of choosing a set of variables that are not highly correlated with one another. These methods purport those variables which are highly correlated with existing model variables are merely redundant and should be omitted from further analysis. Unfortunately, Nunamaker [14] figured out that these methods yield results which are often inconsistent in the sense that removing variables that are highly correlated with others can still have a large effect on the analysis results. In addition, a parsimonious model typically shows generally low correlations among the input and output variables, respectively [15,16]. Appa et al. [17] proposed a method of adding variables to the DEA model one at a time. They claimed that high statistical correlation was an indicator that a particular variable influenced the performance. The authors did note that the observation of high statistical correlation alone was not sufficient. After that, Jenkins and Anderson [11] applied regression and correlation analysis to identify which variables were to be omitted from the DEA model on the basis of the minimum loss of information. Information was related to the variance of an input or output variable about its mean value. Their statistical approach using partial correlation analysis resulted in a measure of information contained in each variable. The authors found that the DEA results could vary greatly according to which highly correlated variables were included or omitted from the DEA model.
At the same time, some investigations start to evaluate the marginal impact on the efficiencies of an adding or omitting a given variable, and focusing on evaluating the statistical significance of the changes in the efficiencies [18]. Another statistical approach for variable selection was developed by [19]. They focused on the inner models which data differed in one single input or output variable. They evaluated a reduced DEA model without one particular variable, and an extended model that included one variable. Then, for each DMU, the efficiency scores were calculated under both the reduced and extended model. A statistical test was conducted to determine the significance of the efficiency contribution of the particular variable being evaluated. Amirteimoori et al., [20] developed an approach that aggregates selected high correlated inputs/outputs to reduce the total number of variables and increase the degree of discrimination. While Ref. [21] pointed out that such approach is unstable due to the epsilon is not unique, they have improved the approach to only one step iteration.
In contrast to correlation based methods, which look at the input and output variables before applying DEA to determine the likely effect on the efficiency scores after the application of DEA, other approaches examine directly the effect on the efficiency scores when the input and output DEA variables are changed. The initial model was compared with those of a new model in which one additional variable was added. Ref. [22] developed a “stepwise” selection approach to examine the changes in the efficiencies as variables are added and removed from the DEA model, often with a focus on determining when the changes in the efficiencies can be considered statistically significant.
In addition, their approach has not considered the rule of thumb, and each selection step is only based on the minimum efficiency change with the last step that is just local optimal—it may not lead to the optimal global decision. Toloo et al. [21] developed selecting models of performance measures in DEA; their models applied the rule of thumb to keep the balance between the number of DMUs and the number inputs/outputs by solving a series of mixed-integer linear programming (MILP) model. However, whether viewing from individual DMU or aggregate, such a model is still unable to determine exactly which variables should be selected, because they consider those performance measures “appear the most often” and take the risk of losing important managerial information.
In this study, we advance the work on variable reduction methods in DEA by formalizing a “global search method (GSM)” for the selection process, and examine the managerial insights gained from using this method. Our proposed GSM measures the effect of influence of variables directly on the efficiencies by considering their average change as variables are added or removed from the analysis. This method is intended to produce DEA models that include only those variables with the largest impact on the DEA results. Moreover, it is useful for models which do not have sufficient number of DMUs and violate the rules of DEA. This can happen in niche classifications (e.g., markets) where the number of comparable DMUs is few, or new classifications (e.g., industries) where the number of measures far exceeds the total number of DMUs. This method is easy to understand, and therefore, it is useful to managers and decision-makers, as it does not need extensive additional calculations.

3. A Global Search Method for Selecting Variables in DEA

We begin by describing the procedures of GSM. The GSM aims to optimize the number of DEA variables and to find the key input and output variables which influence the efficiency scores. We now explain in detail the GSM procedure for effective omission of DEA inputs and outputs.
This approach starts by considering all possible combinations of input and output variables in the DEA model. Assume an original DEA model that has m inputs and s outputs, the total number of DMUs is n. The rule of thumb in [13] provides a guidance for determining a numerical relation between the number of DMUs and number of inputs/outputs, i.e.,
n ≥ max{3(m + s), m × s}
Set a1 input variables and a2 output variables are planned to be kept in the model, where a 1 , a 2 N * . The selection procedure will be divided into N cases that depends on the condition of formula (1).
N = { c a r d   ( { ( a 1 , a 2 ) | a 1 + a 2 n 3 } ) , i f   3 ( m + s ) m s c a r d   ( { ( a 1 , a 2 ) | a 1 a 2 n } ) , i f   3 ( m + s ) < m s
where card(A) denotes to count the number of elements in a set A. For each case I, where I = {1, 2, 3, … , N}. NI represents the number of possible combinations of inputs and outputs, where:
N I = ( m a 1 ) * ( s a 2 )
The algorithm for selection procedure is conducted by the following steps.
  • Step 1: Run the original DEA model that includes the full set of m input variables and s output variables. Record the efficiency scores of each DMU for this run (set E * ).
  • Step 2: Run a set of k = 1, ... , NI DEA analyses, keep setting a1 input variables and a2 output variables at a time in each run. For each analysis, record the efficiency scores of each DMU (set E I , k ) for all k runs.
  • Step 3: Calculate, for each DMU, the average differences ADI in the respective DMU efficiency scores by
    A D I = 1 n ( E * E I , k )
  • Step 4: Choose the optimal variables combination CI * to be kept by selecting the variable with the minimum average difference in the efficiency scores from above.
    CI* = min {ADI}
  • Step 5: For the variables selected to be kept, label the DEA results EI* based on the efficiency scores of the DMUs for the remaining input and output variables.
Through steps 1 to 5, the optimal variables combination CI* and the corresponding DEA results EI * are worked out by searching through all the variables’ combinations for case I, which means the optimal a1 input variables and a2 output variables have been selected to remain in the model with the minimum average difference in the efficiency scores. Figure 1 shows the flow chart of the GSM algorithm for case I.
Then, for all N cases, calculate all the possible efficiency scores under all combinations of the input and output variables by comparing the changes in efficiency with that of the original model. The total number of possible combinations of the input is:
T c = I = 1 N N I
Theoretically, the method reiterates until only one input and one output variable remain in the model (i.e., for case I = 1). From the practical viewpoint, how many cases should be evaluated depends on the decision criterion to create a parsimonious DEA model. It should also be noted that the GSM procedure does not rely on the particular form of the DEA model. This procedure can be used with either CRS or VRS, or with static or stochastic data, as long as the same model is used consistently in all steps. The complexity analysis of this method is attached in Appendix A.

4. Results

The proposed GSM of DEA variables can easily be demonstrated by using an example. We consider the data sets from eighteen logistics companies (as shown in Table 1), with the labels of DMU1 to DMU18. The data set contained information of six input variables and three output variables. In this case, the inputs are the following operations indicators.
  • I1: total asset
  • I2: total capital
  • I3: total current liabilities
  • I4: total operating expenses
  • I5: no. of employees
  • I6: selling, general & administrate
The outputs are the following variables:
  • O1: operating income
  • O2: net sales or revenues
  • O3: net profit

4.1. Search the Best Combination in All Possible Cases

In this conciliation, first we ignore the rule of thumb and let N = 8, try to consider all possible combinations of input and output variables in the DEA model and run the GSM model with all cases from step 1 to step 5. Figure 2 shows the trend of average change of efficiency with number of omitted variables. It indicates that as the number of variables decreases, the average of the efficiency change will increase.
Table 2 shows the optimal combinations in all possible cases. As for managers, the GSM model not only gives a method of efficiency analysis for decision-making, but also gives alternative options even the number of variables are determined. When examining which of the input and output variables can be kept and the effect on the previously efficient DMUs as they do, provides valuable managerial information. We can also see the output variable “net sales or revenues” has vital effect on the analysis, because, among all the optimal cases, such a variable has always been kept and never been omitted.

4.2. Search the Best Combination under the Rule of Thumb

In this sample, m = 6, s = 3, and n = 18. By applying the rule of thumb, here 3(m + s) = 27, ms = 18. Hence we have
n < max { 3 ( m + s ) , m s } = 3 ( m + s )
This indicates that the number of inputs/outputs should be omitted to match the condition in (1). Denote a1 input variables and a2 output variables will be kept, then it will match
( a 1 + a 2 ) n 3 = 6 ,   where   a 1 , a 2 N *
Therefore, the total optimal number of input/output variables should be no more than 6. Here, if the manager chose six variables of inputs and outputs to keep, this indicates that three variables need to be omitted from total nine inputs/outputs variables. Considering that at least one input and one output should be kept in normal DEA model, and then the possible cases are shown in the following Table 3.
In Table 3, for each case, the number of combinations can be calculated by (3). By using the GSM model to do the analysis, the best combination for each case can be easily figured out by comparing the efficiency scores with the original DEA model. As a result, the optimal input variables and output variables have been selected to remain in the model with minimum average difference in efficiency scores. Table 4 shows the optimal combination for each case with six variables.
From Table 4, we can find that the combination (I2, I3, I4, I5, I6 and O2) in Case 1 shows the minimum average difference in efficiency scores and hence it is selected as the optimal combination when six variables are selected to be remained. This is due to about 99.83% of the information has been kept after omitting three variables. It means that the input variable “total assets” and output variables “operating income” and “net profit”, which have less contribution to the efficiency scores, could be omitted with a minimum loss of information and no change in DEA scores.

4.3. Find the Key Input and Output Variables

The GSM model can also be used to identify the key variables i.e., the factors that play a significant role in the company’s operations. Identification of key variables is important to managers because this can help them focus on the primary issue of the company. In Table 2, I4 and O2 are identified as the key input and key output; this is because, after the omission of the other variables, the remaining two variables can still keep about 68.93% (where the average efficiency change is 31.07%) of information from the original model with nine variables. However, in most applications this modest change in efficiencies is outweighed by the gains that result in developing a more parsimonious model.

5. Further Illustration and Validations

In this section, the proposed GSM method is further tested and validated using two established numerical examples then followed by a case study. The examples from [11,22] are used here.

5.1. Example 1: Compared with Partial Correlation in Jenkins and Anderson

We begin with a simple exercise using the CCR-I primal model and compare our results with Jenkins and Anderson [11]. In Table 5, there are six inputs, two outputs and only eight DMUs.
In order to compare with the method of partial correlation in [11], we omitted the same number of input variables and kept all outputs. Table 6 shows the results of GSM and Jenkins and Anderson’s [11].
From Table 6, we can see the advantage of the GSM model with less efficiency change. If considering two input variables to be kept, the GSM model selects I3 and I5, the partial correlation model selects I1 and I3. However, the GSM analysis shows that if I3 and I5 are to be kept as to retain as much information as possible (measured by average efficiency change), I3 and I5 are the best pair to be kept. The most surprising result is perhaps the choice of variables to keep, which is certainly not accurate from the partial correlation, and how much information is retained by a judicious choice of fewer variables. The partial correlation is indirectly related to the resulting changes in efficiencies, while the GSM model can retain as much as information when choosing the same number of input variables.

5.2. Example 2: Compared with Wagner and Shimsak

In this section, we conduct a further analysis by comparing our GSM model with other related variables selection methods, i.e., stepwise [22] and selective measures [21]. Using the data provided earlier in Table 1 above, we obtain the following results.
Table 7 shows the results of GSM and stepwise. As a general view, GSM model is able to choose the more important variables with less efficiency change, and the results of GSM have 5.63% improvement compared with stepwise model. If we want to choose the ‘core’ variable of the DEA model, which means to select one representative input and output variable with least information lost. The GSM model selects I6 and O2 with average efficiency change of 0.302, which is less than 0.304 from the stepwise method that chooses I4 and O2. In addition, the GSM method can provide valuable and accurate managerial information to the decision-maker that is not available from traditional DEA analysis.
To compare with selective measures method [21], for instance now, here if managers choose to keep five input/output variables, then the results are shown in Table 8.
The results in Table 8 indicate that, when choosing five variables to keep, the GSM model gives three alternative options: four inputs and one output, three inputs and two outputs, two inputs and three outputs, while the stepwise model and selective measures can give only one choice. Overall, if the manager chooses four inputs and one output to keep, both GSM and stepwise selected inputs: “total capital”, “total current liabilities”, “total operating expenses, selling, general & administrate” and output: “net sales or revenues”. This option is the best choice because it has smallest information lost and kept 99.47% information compared with original model. However, stepwise does not consider the rule of thumb, and each selection step is only based on the minimum efficiency change with the last step that is just local optimal, so it may not lead to the optimal global decision in some cases. As for selective measures, it has greater efficiency change and may lose more managerial information, because this approach mainly focuses on maximizing its individual or aggregate efficiency, not considering the information losing from the global views. In addition, selective measures cannot determine exactly which variables and how many should be selected, because they consider those performance measures “appear the most often”, while, here, in order to compare the result, we choose the result case with smallest efficiency change, even though doing so may incur the risk of losing important information.
From the above analysis, we can see that our GSM model has shown a great advance in performance variables selection in the normal DEA model. First, it has considered the rule of thumb to keep the balance between the number of DMUs and the number of variables. Second, it can determine the exactly which variables to be selected and alternative options for different decision-making. Third, it can help decision-makers to find the key input and output variables that make the main contribution to improving efficiency.

5.3. Case Study: US Banks

The GSM model helps to select variables in DEA and provides a framework for a number of alternative implementations. As previously mentioned, as long as a normal DEA model is used in each step, the GSM algorithm can be used with a variety of efficiency models. In this section, we conduct the analysis in the banking industry using the model by [23]. The data used in this model were captured from fifteen US banks with six ratios in 2011. The GSM is suitable to be applied to this US banks example because there are many ratios in the analysis of efficiency. Most of the time, the number of DMUs is not enough to meet the minimum criteria. Therefore, the use of GSM here helps greatly to overcome this problem. Table 8 shows the fifteen US banks with six ratios. The ratios are as follows.
  • R1: Current Ratio
  • R2: Return on Total Assets
  • R3: Price Earning Ratio
  • R4: Profit Margin
  • R5: Equity/Total Assets
  • R6: Dividend Pay-Out
Table 9 shows the ratios of the banks and Table 10 shows the efficiency scores of each DMU. The last row in Table 10 indicates the average change in the efficiency score. At the beginning, the analysis of the ratio model containing all six ratio variables yields four efficient banks (B6, B12, B14, and B15). For Case 1, removing “Current Ratio” shows the smallest average change in the efficiency scores (2.62E−10). When it is omitted from the model, the same four banks remain efficient. For Case 2 with four ratio variables, “Current Ratio” and “Profit Margin” are selected to be dropped with an average change in efficiency score of 0.008 resulting in the same efficient banks.
For Case 3, the ratio variables of “Return on Total Assets”, “Price Earning Ratio” and “Dividend Pay-Out” are kept and the average change in the efficiency score is 0.0223. For Case 4 with two ratio variables, “Return on Total Assets” and “Price Earning Ratio” are kept and the average change in the efficiency score is 0.0457. For Case 5 with only one variable (“Dividend Pay-Out”) kept, a fairly large average change in the efficiency score of 0.094 occurs. The efficiency scores for some DMUs (e.g., B6) are reduced by as much as 59%. In this case, there is only one efficient bank, i.e., B15. When the GSM algorithm is taken to its conclusion, there will always be one ratio variable identified as the most important for the efficiency score. In this US banks analysis, the key variable that has been identified for these banks is “Dividend Pay-Out” (the single remaining ratio). Managerially, we interpret this result as indicating that the core strategy for banks is to focus their capability of making profits, therefore gaining greater “Dividend Pay-Out”.

6. Implications

According to the illustrations and case studies presented in Section 5, the implications pertaining to the proposed method can be deduced. Effective decision-making approaches are fundamentally based on the ability to precisely identify a set of factors or criteria that have the greatest effect on performance. Knowledge of these factors is needed by decision-makers in taking appropriate strategy to improve their performance. This study sheds light on how the suggested methodology, which is based on the information regarding changes in efficiency ratings, is useful for evaluating efficiency, as well as offering prescriptive recommendations that managers can follow in controlling the performance of their business. This study improves the DEA method for measuring efficiency, which is a crucial notion in performance. It provides a searching method for variable selection, which includes factors having the greatest influence on the DEA findings, and the methodology is based on an optimization method.
This research provides important management insights for decision-makers to make trustworthy decisions and to utilize as recommendations to alter or symmetrize their plans and needs with effective resource allocation. According to the results of the preceding investigation, the proposed GSM model outperformed the standard DEA model in terms of performance variable selection. The GSM model examined the general guidelines of maintaining a symmetry between the number of DMUs and the number of variables. The model also specifies which variables should be used and provides alternatives for various decision-making scenarios. The method can assist decision-makers in identifying the important input and output factors that have the greatest impact on efficiency.

7. Concluding Remarks

In conclusion, the present study has proposed a GSM model to select the optimal combinations of input and output variables in DEA efficiency analysis. This method acts directly upon information regarding the change in the efficiency scores and it provides tips for DMUs as to which input or output variable has the most influence in maintaining the efficiency. Nevertheless, it is significant to note that the process of making a strategic decision is complex and can be affected by many factors (e.g., negotiation, persuasion and environment). Therefore, in future it is suggested to focus on the efficient variables selection and their impacts on ensemble selection with the issue of fuzzy and big datasets, which will help decision-makers to refine the performance estimation. In particular, investigations as to whether the required number of variables in terms of classes can be relaxed are required and the effect of using different DEA models needs further analysis.

Funding

The APC was funded by British Academy and Academy of Sciences Malaysia (304/PMGT/650912/B130).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

Appendix A.1. Complexity Analysis of GSM

The quality of the performance of the algorithm can be evaluated using computational time of the big O-notation analysis [24]. The big O-notation analysis calculates the worst-case computational time of an algorithm, say function f(n) = an2 + bn + c where n represents independent variable of an algorithm with constants a, b, and c. It is used to present the asymptotic efficiency of a particular algorithm such as f(n) ≤ cg(n) if there are positive constants n0 and c [25]. Function f(n) resides below function g(n) with constant c under a sufficiently large n. f(n) = O(g(n)) indicates an asymptotic upper bound of function f(n), which is also a member of the set O(g(n)). In other words, f(n) is said to have an asymptotic upper bound at n2 as n grows very large, which can be inferred as O(n2).
The time complexity of GSM for a total of N = m + s − 1 cases, with m inputs and s outputs as its independent variables, is analyzed asymptotically in the following section.
Suppose NI is defined with assumption of a1m and a2s, as shown in Figure 1. I consists of m and s variables for each round of processing. The time of looping N cases is at most m × s × N, as shown in line 4. In other words, the time required in computing EI* is ms(m + s − 1) under the situation of N = m + s − 1. Note that another set of NI cases is formed for each I, as shown in line 7. The worst scenario happens when I is equivalent to N − 1, or at the last case of N, where a1 = m and a2 = s. Its time of looping is at most of ms(m + s − 1) × N. As such, the time required in computing EI,k is expected to be ms(m + s − 1)(m + s − 1). Algorithm A1 shows the algorithm of the GSM.
Algorithm A1 The algorithm of the GSM
1: Procedure Global Search Method
2: Create a combination of m and s variables (C*)
3: set I = {1, 2, 3, …, N}
4: while I < N do
5:   Compute EI* based on m and n variables
6:   set N I = { ( m a 1 ) * ( s a 2 ) | | a 1 + a 2 = I + 1 }
7:   while k < NI do
8:   Compute EI,k based on a1 and a2 variables
9:   end while
10:     set ADI = 1 n k=0NI(EI* − EI,k)
11:     set CI*← a1 and a2 of min(ADI)
12: end while
13: return C*
The computational of each ADI is based on averaging NI cases with the summation of EI* − EI,k, as shown in line 10. The expected time until the (NI − 1)-th case is at most m + s − 1. The combination of variables of a1 and a2 for an identified minimum ADI is assigned to CI*, which occurs at the end of the lopping of a particular I. Note that the time to assign values to both CI* and NI (as in line 6, Figure 1) is at most 1.
In short, an optimized combination variables m and s is yielded through C* at the end of the GSM procedure. As function f(n) is an increasing function in yielding C*, the constant variable c as well as other variables become insignificant as compared with m3s + ms3, as required in computing variable EI,k when m and s grow very large in values. Function f(n), which represents the GSM procedure, is asymptotically equivalent to O(m3s + ms3) as both m and s grow to infinity.

References

  1. Zhu, D. A hybrid approach for efficient ensembles. Decis. Support Syst. 2010, 48, 480–487. [Google Scholar] [CrossRef] [Green Version]
  2. Zheng, Z.; Padmanabhan, B. Constructing Ensembles from Data Envelopment Analysis. INFORMS J. Comput. 2007, 19, 486–496. [Google Scholar] [CrossRef] [Green Version]
  3. Provost, F.; Fawcett, T. Robust Classification for Imprecise Environments. Mach. Learn. 2001, 42, 203–231. [Google Scholar] [CrossRef]
  4. Baumgartner, D.; Serpen, G. Performance of global-local hybrid ensemble versus boosting and baggin ensembles. Int. J. Mach. Learn. Cybern. 2013, 4, 301–317. [Google Scholar] [CrossRef]
  5. Charnes, A.; Cooper, W.; Rhodes, E. Measuring the efficiency of decision-making units. Eur. J. Oper. Res. 1979, 3, 339. [Google Scholar] [CrossRef]
  6. Cooper, W. Data Envelopment Analysis in Encyclopedia of Operations Research and Management Science; Gass, S., Fu, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 349–358. [Google Scholar]
  7. Jomthanachai, S.; Wong, W.-P.; Lim, C.-P. A Coherent Data Envelopment Analysis to Evaluate the Efficiency of Sustainable Supply Chains. IEEE Trans. Eng. Manag. 2021, PP, 1–18. [Google Scholar] [CrossRef]
  8. Misiunas, N.; Oztekin, A.; Chen, Y.; Chandra, K. DEANN. A healthcare analytic methodology of data envelopment analysis and artificial neural networks for the prediction of organ recipient functional status. Omega 2016, 58, 46–54. [Google Scholar]
  9. Zhu, Q.; Wu, J.; Song, M. Efficiency evaluation based on data envelopment analysis in the big data context. Comput. Oper. Res. 2018, 98, 291–300. [Google Scholar] [CrossRef]
  10. Golany, B.; Roll, Y. An application procedure for DEA. Omega 1989, 17, 237–250. [Google Scholar] [CrossRef]
  11. Jenkins, L.; Anderson, M. A multivariate statistical approach to reducing the number of variables in data envelopment analysis. Eur. J. Oper. Res. 2003, 147, 51–61. [Google Scholar] [CrossRef]
  12. Morita, H.; Avkiran, N.K. Selecting inputs and outputs in data envelopment analysis by designing statistical experiments(Operations Research for Performance Evaluation). J. Oper. Res. Soc. Jpn. 2009, 52, 163–173. [Google Scholar] [CrossRef] [Green Version]
  13. Cooper, W.W.; Seiford, L.M.; Tone, K. Data Envelopment Analysis: A Comprenhensive Text with Models, Applications, Ref-erences and DEA-Solver Software, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  14. Nunamaker, T.R. Using data envelopment analysis to measure the efficiency of non-profit organizations: A critical evaluation. Manag. Decis. Econ. 1985, 6, 50–58. [Google Scholar] [CrossRef]
  15. Chilingerian, J.A. Evaluating physician efficiency in hospitals: A multivariate analysis of best practices. Eur. J. Oper. Res. 1995, 80, 548–574. [Google Scholar] [CrossRef]
  16. Salinas-Jimenez, J.; Smith, P. Data envelopment analysis applied to quality in primary health care. Ann. Oper. Res. 1996, 67, 141–161. [Google Scholar] [CrossRef]
  17. Appa, G.; Norman, M.; Stoker, B. Data Envelopment Analysis: The Assessment of Performance. J. Oper. Res. Soc. 1992, 43, 919. [Google Scholar] [CrossRef]
  18. Banker, R.D. Hypothesis tests using data envelopment analysis. J. Prod. Anal. 1996, 7, 139–159. [Google Scholar] [CrossRef]
  19. Pastor, J.T.; Ruiz, J.L.; Sirvent, I. A Statistical Test for Nested Radial Dea Models. Oper. Res. 2002, 50, 728–735. [Google Scholar] [CrossRef]
  20. Amirteimoori, A.; Despotis, D.K.; Kordrostami, S. Variables reduction in data envelopment analysis. Optimization 2012, 63, 735–745. [Google Scholar] [CrossRef]
  21. Toloo, M.; Barat, M.; Masoumzadeh, A. Selective measures in data envelopment analysis. Ann. Oper. Res. 2015, 226, 623–642. [Google Scholar] [CrossRef]
  22. Wagner, J.M.; Shimshak, D.G. Stepwise selection of variables in data envelopment analysis: Procedures and managerial per-spectives. Eur. J. Operat. Res. 2007, 180, 57–67. [Google Scholar] [CrossRef]
  23. Halkos, G.E.; Salamouris, D.S. Efficiency measurement of the Greek commercial banks with the use of financial ratios: A data envelopment analysis approach. Manag. Account. Rese. 2004, 15, 201–224. [Google Scholar] [CrossRef]
  24. Hofri, M. “Introduction,” in Probabilistic Analysis of Algorithms: On Computing Methodologies for Computer Algorithms Performance Evaluation; Springer: New York, NY, USA, 1987; pp. 1–10. [Google Scholar]
  25. Rayward-Smith, V.J.; Cormen, T.H.; Leiserson, C.E.; Rivest, R.L. Introduction to Algorithms. J. Oper. Res. Soc. 1991, 42, 816. [Google Scholar] [CrossRef]
Figure 1. The flow chart of the GSM algorithm for case I.
Figure 1. The flow chart of the GSM algorithm for case I.
Symmetry 13 01155 g001
Figure 2. The average efficiency change will increase if more variables are omitted.
Figure 2. The average efficiency change will increase if more variables are omitted.
Symmetry 13 01155 g002
Table 1. Data of 18 logistics companies.
Table 1. Data of 18 logistics companies.
DMUI1I2I3I4I5I6O1O2O3
DMU17,173,0394,665,5462,220,17311,430,10911,0001,076,631815,16112,245,269577,488
DMU2153,707145,47671817277280219490581824457
DMU3939,409902,44936,960290,0851832,467415,204705,289379,699
DMU4493,906307,173147,059517,766154937,47317,141534,90726,262
DMU535,33325,084982622,173974559191224,0851441
DMU6466,368396,44570,260530,22249324,63039,389569,61125,323
DMU798,99466,52932,388112,5528316,2479994122,5469641
DMU8719,315505,479192,045293,4212288162,686142,624436,045150,716
DMU9638,625528,93672,211173,320139232,95217,494190,81415,970
DMU10466,216334,537125,959225,573144546,28627,270252,84321,727
DMU11213,201166,99838,928134,98556327,05428,037163,02216,580
DMU122,187,7082,117,11469,256257,92037129,239350,222608,142481,361
DMU1374,54769,426551867,645154018,799623473,8794441
DMU14130,82694,92935,848227,19527615,6282880230,0752418
DMU15522,852232,016266,412222,26476222,8859358231,62212,690
DMU16305,799232,07969,433277,17155113,41310,697287,8688080
DMU1727,951,84525,189,7362,700,8678,688,4228916909,2242,510,52311,198,9452,861,949
DMU18930,044748,004163,564492,28957333,75665,324557,61335,763
Table 2. Optimal combinations in all possible cases.
Table 2. Optimal combinations in all possible cases.
No. of Kept VariablesInputsOutputsAverage Efficiency Change
2I6O20.3107
3I2O2, O30.2769
I6, I4O10.0406
4I2O1, O2, O30.2718
I1, I6O1, O30.0389
I1, I4, I5O10.0169
5I2, I3, I4, I6O20.0053
I2, I4, I6O2, O30.0152
I2, I4O1, O2, O30.0486
6I2, I4, I6O1, O2, O30.0152
I2, I3, I4, I6O2, O30.0036
I1, I2, I4, I5, I6O10.0017
7I2, I3, I4, I6O1, O2, O30.0117
I1, I2, I3, I4, I6O1, O31.04E-09
I1, I2, I3, I4, I5, I6O20.0017
8I1, I2, I3, I4, I6O1, O2, O39.94E−10
I1, I2, I3, I4, I5, I6O2, O39.63E−10
9I1, I2, I3, I4, I5, I6O1, O2, O30
Table 3. Possible cases of combinations with six variables.
Table 3. Possible cases of combinations with six variables.
CasesNo. of Inputs (a1)No. of Outputs (a2)No. of Combinations
Case15118
Case24245
Case33320
Table 4. Optimal combinations with six variables.
Table 4. Optimal combinations with six variables.
CasesInputsOutputsAverage Efficiency Change
Case1I2, I3, I4, I5, I6O20.0017
Case2I2, I3, I4, I6O2, O30.0036
Case3I2, I4, I6O1, O2, O30.0152
Table 5. Data for Example 1.
Table 5. Data for Example 1.
DMUI1I2I3I4I5I6O1O2
A1.52.7702.31.83.38582
B0.50.2701.51.10.59693
C2.52.6752.22.43.27887
D1.81.5751.81.62.38788
E0.90.4800.51.42.68994
F0.60.2801.30.92.89393
G1.40.6851.41.32.19291
H1.71.7900.31.71.89792
Table 6. The results of GSM model and partial correlation.
Table 6. The results of GSM model and partial correlation.
No. of Input VariablesGSMPartial Correlation
Inputs KeptE*Inputs KeptE*
2I3, I50.005I1, I30.063
3I3, I4, I50I1, I3, I60.063
4I2, I3, I4, I50I3, I4, I5, I60
5I1, I2, I3, I4, I50I2, I3, I4, I5, I60
Table 7. Results of GSM and Stepwise.
Table 7. Results of GSM and Stepwise.
No. of Variables to Be KeptGSMStepwiseImproved by (%)
Input KeptOutput KeptE*Input KeptOutput KeptE*
2I6O20.302I4O20.3040.13%
3I1, I6O10.197I2, I4O20.2909.21%
4I1, I4, I5O10.174I2, I4, I6O20.28811.48%
5I1, I4, I5O1, O30.173I2, I3, I4, I6O20.28811.49%
6I1, I2, I4, I5, I6O10.217I2, I3, I4, I5, I6O20.2887.10%
7I1, I2, I3, I4, I5, I6O25.45E−16I1, I2, I3, I4, I5, I6O25.45E−160.00%
8I1, I2, I3, I4, I5, I6O2, O31.86E−16I1, I2, I3, I4, I5, I6O2, O31.86E−160.00%
Average------5.63%
Table 8. GSM model vs other methods.
Table 8. GSM model vs other methods.
DMUGSM ModelStep-WiseSelective MeasuresE*
Variables to Be Kept
I2, I3, I4, I6, O2I2, I4, I6, O2, O3I2, I4, O1, O2, O3I2, I3, I4, I6, O2I2, I4, I5, I6, O2All
DMU11.000001.000001.000001.000001.000001.00000
DMU20.462450.462810.462810.462450.462450.46281
DMU31.000001.000001.000001.000001.000001.00000
DMU40.933220.933220.887210.933220.933220.93322
DMU50.756870.756870.756870.756870.756870.75687
DMU61.000001.000000.864861.000001.000001.00000
DMU70.935910.935910.935910.935911.000001.00000
DMU80.857630.857630.857630.857630.857630.85763
DMU90.458430.458430.458430.458430.458430.45843
DMU100.695270.695270.695270.695270.695270.69527
DMU110.811390.811390.811390.811390.811390.81139
DMU120.969791.000001.000000.969790.969791.00000
DMU131.000000.790070.790071.000000.790071.00000
DMU141.000001.000000.941001.000001.000001.00000
DMU150.749070.749070.749070.749070.749070.74907
DMU160.931890.931890.806830.931890.931890.93189
DMU170.565200.565200.554440.565200.565200.56520
DMU180.744980.744980.694700.744980.744980.74498
average change with E*0.00530.01520.04860.00530.01340
Table 9. Fifteen US banks with six ratios.
Table 9. Fifteen US banks with six ratios.
Bank NameR1R2R3R4R5R6
B1CITIGROUP INC0.620.786.8615.159.580.72
B2ZIONS BANCORPORATION0.190.989.30−19.7013.142.28
B3CAPITAL ONE FINANCIAL CORP0.052.236.1826.6714.402.89
B4DISCOVER FINANCIAL SERVICES0.105.105.8819.0611.984.93
B5ASSOCIATED BANC-CORP.0.040.8413.87−4.2013.075.01
B6FIRST MIDWEST BANCORP, INC0.100.5220.64−10.3312.078.14
B7WEBSTER FINANCIAL CORP0.021.1211.7911.849.869.23
B8SUNTRUST BANKS0.060.4214.400.2411.359.70
B9METLIFE, INC.0.641.254.748.067.5211.31
B10MORGAN STANLEY1.620.826.2820.549.3513.82
B11WELLS FARGO & COMPANY0.121.808.9722.5010.7815.65
B12TD AMERITRADE HOLDING CORP15.735.9413.0437.6124.0317.91
B13PRUDENTIAL FINANCIAL INC0.120.856.4927.066.0518.94
B14PNC FINANCIAL SERVICES GROUP0.051.509.8826.2013.7319.67
B15US BANCORP0.051.9510.7823.2910.2820.07
Table 10. GSM in US banks with ratios.
Table 10. GSM in US banks with ratios.
Case 5Case 4Case 3Case 2Case 1E*
Ratio KeptR6R3, R4R2, R3, R6R2, R3, R5, R6R2, R3, R4, R5, R6All 6 Ratios
B10.03590.48740.37220.45740.48740.4874
B20.11360.45060.49950.62350.62350.6235
B30.1440.70910.43550.59930.70910.7091
B40.24560.50680.85860.85860.85860.8586
B50.24960.70520.70420.78330.78330.7833
B60.405611111
B70.45990.71920.70330.70330.71920.7192
B80.48330.75980.81360.81360.81360.8136
B90.56350.31670.56740.57450.57450.5745
B100.68860.54610.68860.7010.71650.7165
B110.77980.65980.79750.79950.80710.8071
B120.892411111
B130.94370.71950.94370.94370.97480.9748
B140.98010.73850.9801111
B1510.76161111
Average change with E*0.09400.04570.02230.00802.62E−100
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wong, W.-P. A Global Search Method for Inputs and Outputs in Data Envelopment Analysis: Procedures and Managerial Perspectives. Symmetry 2021, 13, 1155. https://doi.org/10.3390/sym13071155

AMA Style

Wong W-P. A Global Search Method for Inputs and Outputs in Data Envelopment Analysis: Procedures and Managerial Perspectives. Symmetry. 2021; 13(7):1155. https://doi.org/10.3390/sym13071155

Chicago/Turabian Style

Wong, Wai-Peng. 2021. "A Global Search Method for Inputs and Outputs in Data Envelopment Analysis: Procedures and Managerial Perspectives" Symmetry 13, no. 7: 1155. https://doi.org/10.3390/sym13071155

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop