deaR-Shiny: An Interactive Web App for Data Envelopment Analysis

: In this paper, we describe an interactive web application (deaR-shiny) to measure efﬁciency and productivity using data envelopment analysis (DEA). deaR-shiny aims to ﬁll the gap that currently exists in the availability of online DEA software offering practitioners and researchers free access to a very wide variety of DEA models (both conventional and fuzzy models). We illustrate how to use the web app by replicating the main results obtained by Carlucci, Cir à and Coccorese in 2018, who investigate the efﬁciency and economic sustainability of Italian regional airport by using two conventional DEA models, and the results given by Kao and Liu in their papers published in 2000 and 2003, who calculate the efﬁciency scores of university libraries in Taiwan by using a fuzzy DEA model because they treat missing data as fuzzy numbers.


Introduction
Data Envelopment Analysis (DEA) is a widely used mathematical programming technique for evaluating the relative efficiency of a set of homogeneous DMUs (Decision Making Units) which consume the same inputs (in different quantities) to produce the same outputs (in different quantities).
In the last two decades, there have been remarkable advances in both DEA methodologies and practical applications in many different fields (education, banking and finance, sustainability, arts and humanities, hospitals and healthcare, industrial sectors, agriculture, transportation, etc.). Although there are several bibliographic reviews available-see, for instance [1][2][3][4]-the recent bibliographic compilation by [5] is noteworthy. Emrouznejad and Yang provide a full listing of more than 10,000 DEA-related articles ranging from 1978 to late 2016. Recently, [6] reviewed the literature on DEA applications in the field of sustainability, covering papers published in journals indexed by the Web of Science from 1996 to 2016 (a total of 320 relevant papers were analyzed).
The first original DEA model is known as the CCR model since it was proposed by Charnes, Cooper and Rhodes [7] and assumes that the technology exhibits constant returns to scale (CRS). However, as the assumption of constant returns to scale is not always real, Banker et al. (1984) [8] relax this assumption by including the so-called convexity restriction. Thus, the resulting model, popularly known as the BCC model (or Banker, Charnes and Cooper's DEA model), allows the efficient frontier to exhibit variable returns to scale (VRS). Both CCR and BCC models can be either input-or output-oriented and provide radial efficiency measures.
Based on these basic DEA models, several extensions have been proposed in the DEA literature. For example, and without being exhaustive, the possibility of considering non-discretionary (or uncontrollable) inputs and/or outputs [9], the presence of categorical or ordinal inputs and/or outputs [10,11], imposing restrictions on the weights of inputs and/or outputs [12][13][14][15][16] or taking into account the presence of undesirable factors [17,18].
Over the years, and at the same time as these variations of the basic radial DEA models were emerging, methodological developments have also led to a wide variety of new DEA models. One of these first new DEA models is the additive model [19], which simultaneously allows input reductions and output increases. A disadvantage of the additive model, which is a non-radial (models that do not preserve the mix within inputs and within outputs in movements toward the frontier [20]) and non-oriented model, is that it identifies inefficient DMUs but does not provide an efficiency score. To solve this problem, other non-radial models were proposed. For example, the Slacks-Based Measure (SBM) of efficiency model introduced by Tone (2001) [21], the Russell measure of efficiency models [22][23][24], the preference structure model [25], the range-adjusted measure of efficiency model [26], etc.
Typically, the efficiency scores obtained by executing a DEA model allow us to classify the DMUs into two groups: efficient DMUs, which define the best practice frontier, and inefficient DMUs. However, it is not possible to rank the DMUs based on their efficiency score in principle. In order to get a ranking of the DMUs (see [27] for a review of methods), super-efficiency models [28][29][30] or cross-efficiency models [31,32] are commonly used models.
On the other hand, if we have panel data and we want to compare the performance of a set of DMUs over time, the most frequently used method is the Malmquist productivity index [33][34][35][36][37][38] and, to a lesser extent, window analysis [39].
So far, the models reviewed have one common feature: they consider that inputs and outputs are precise data. However, what if it is acceptable to assume that the inputs and/or outputs are imprecise data? In these situations, although bootstrapping [40][41][42][43] or stochastic DEA [44][45][46] models have been proposed, fuzzy set theory has prevailed to quantify the imprecise or vague inputs and/or outputs. DEA models with imprecise data are often called Fuzzy DEA models (FDEA). Hatami-Marbini, Emrouznejad and Tavana [47] provide a taxonomy and review of FDEA methods. Zhou and Xu [48] review the research status, method development, real application, and theory trend of the FDEA.
In our opinion, many of the theoretical models referred to in the previous paragraphs are not widely applied because many practitioners and researchers do not have the proper basic tools, that is to say: the software. Therefore, to fill this gap between theory and practice, we developed deaR-shiny, a free web application that allows them to run an amazingly wide variety of DEA models, both traditional and fuzzy, and decompositions of the Malmquist Index.

DEA Software: A Brief Review
In parallel to the development of DEA models, some software has also emerged to facilitate their use by practitioners. Barr (2004) [49] provides an overview and compares the features and capabilities of several commercial and non-commercial DEA software:

1.
Commercial software: More recently, Daraio, Kerstens and Cavalcante [51] conducted a comprehensive review of current software options for productivity and efficiency analysis and carried out a comparative analysis on them. It is worth noting the great diversity of models implemented in MaxDEA (http://www.maxdea.cn (accessed on 12 June 2021)), which is developed with VBA Access. MaxDEA offers two versions to users: (1) MaxDEA Basic, which allows running basic models (CCR, BCC, Slacks Based Model and Cost model) under different orientations (input, output and non-orientation) and returns-to-scale with no limit of DMUs, and (2) MaxDEA Ultra, the commercial version in which a wide variety of models are available (a list of them can be consulted at: http://maxdea.com/Features.htm (accessed on 12 June 2021)). It should also be mentioned that some of the most popular software used by mathematicians, engineers and econometricians-such as GAMS, STATA, SAS or Matlab-have also incorporated modules to estimate efficiency through DEA models (for example: [52,53]). The main drawback we see in all these software options is commercial software and their access is therefore limited for many practitioners.
Regarding non-commercial software, one of the alternatives that has had the greatest growth in recent years is the programming language R (it is a programming language and free software environment for statistical computing) [54]. One of the greatest strengths of the R programming language is in the ease of creating extensions through packages contributed by a constantly growing user community. Currently (as of 2020), there are more than 16,000 user-contributed packages available at CRAN (Comprehensive R Archive Network). Those packages contain applications in almost every scientific field, e.g., Biology, Finance, Data Science, Meteorology, etc. Data Envelopment Analysis is no exception. In recent years the contribution of R packages to estimate efficiency has increased significantly, and a wide variety of DEA models can be applied. One of the first R packages is FEAR [55] and it is the first DEA software that includes functions for running the bootstrap methods proposed in [40]. Other popular R packages (with more than 10,000 downloads from 2019-01-01) are Benchmarking [56], nonparaeff [57], rDEA [58], DJL [59], and deaR [60,61].
All the applications we have referred to so far need to be installed locally on the user's computer and in many cases require an intermediate knowledge in order to be used effectively and extract their full potential (applications in R, Matlab, Python, etc.).
There are also some commercial and non-commercial web-based DEA solutions. For example:

•
There is a free version (lite version) limited to a maximum of 15 DMUs and 4 indexes (inputs/outputs) in a maximum of 2 projects. • DEA online (http://www.onlineoutput.com/dea-software/ (accessed on 12 June 2021)). The company offers a demo version with a limited number of inputs/outputs and number of DMUs.

2.
Non-commercial online software: • WebDEA (https://sites.google.com/site/dsslabunipi/home (accessed on 12 June 2021)). This application, which currently is not working, includes DEA models under constant and variable returns to scale assumptions as well as their oriented and non-oriented variants. Additionally, weight restrictions can be incorporated into the aforementioned models and post-DEA analysis can be carried out based on cross-efficiency analysis (aggressive and benevolent versions). deaR-shiny aims to fill the gap that currently exists in the availability of online DEA software and it offers practitioners free access to a wide variety of DEA models and without any requirement of prior registration.

deaR-Shiny: A Web App for Data Envelopment Analysis
deaR-shiny is an interactive web application for DEA built with Shiny [64] and shiny dashboard [65], and it serves as a frontend for the functions defined in the deaR package [60] to execute the selected DEA model and to get the results. Anyone can access the web app deaR-shiny in the URL: https://rbensua.shinyapps.io/deaR/ (accessed on 12 June 2021).
All shiny applications, from the simplest to the most complex ones, consist of two components: a user interface (UI) object and a server function. In the case of deaR-shiny: • the UI contains the R code that controls the design and appearance of the application.
That is, what the application will look like: sidebar panels, main or body panels, tabs, colors, etc. For example, as Figure 1 shows, the deaR-shiny's user interface consists of 5 sidebars: Data, DEA Models, Results, Plots, and About. • By default, the application displays the Data body panel, which consists of two main horizontal boxes. The first box is related to the selection of data and has two tabs: Data Import and Data Table. The second box is related to the variable selection: DMUs and inputs/outputs. We will discuss this in more detail in the next section • the server section contains the necessary instructions to build the application. The relationships between the inputs that are introduced in the UI and the outputs that will be obtained are defined in the server. Thus, in this section we will find the R code for loading data, transforming data, running models or creating plots. The interactivity of the shiny apps lets the user interact with the data without manipulating the R code, and it is based on reactive programming: any changes made by the user in the interface make the application instantly update itself.
The models that can currently be chosen in deaR-shiny are listed in Table 1. Detailed technical information about the DEA models and built-in datasets available in deaR-shiny can be consulted at: https://rdrr.io/cran/deaR/man/ (accessed on 12 June 2021).  1 Other models have not been implemented yet (for example, bootstrapping [40,56] or RDM model [75]). Table 2 shows the average computation time of an input-oriented BCC model under different scenarios. Note that deaR-shiny has no limitation on the number of DMUs or inputs/outputs. As we said in the introduction section, the increasing number of published articles on DEA applications in sustainability gives an idea of the growing interest of practitioners and researchers in this area. The availability of powerful software, such as deaR-shiny, that allows a wide variety of DEA models to run can help advance sustainability performance analysis research. Now, we will explain how deaR-shiny works.
The deaR-shiny interface is structured in three parts, which roughly correspond to the logic for performing an efficiency analysis using DEA (see Figure 2): (1) load data, (2) select and run a DEA model, and (3) show the results (numerically and graphically). Each of these parts is discussed below with two case studies. In both cases, there are missing values in the dataset. However, their treatment is different. Thus, in the first case, the DMU with missing values will be omitted from the analysis and a conventional output-oriented DEA model will be applied. In the second case, the missing values will be transformed into triangular fuzzy numbers and, consequently, a fuzz DEA model will be used. Specifically, we are going to reproduce the main results given in Carlucci, Cirà and Coccorese [76] (case study 1), and Kao and Liu [77,78] (case study 2).

Case Study 1. Carlucci, Cirà and Coccorese (2018)
In this case study, we are going to illustrate how to use deaR-shiny by reproducing some of the results given in Carlucci, Cirà and Coccorese (2018) [76]. These authors analyze the impact of several external factors, such as the size of the airport and the presence of low-cost carriers, on the efficiency and environmental sustainability of regional airports in Italy. They estimate the efficiency of 34 airports by means of an output-oriented DEA model (both under constant and variable returns to scale). To do that, they consider six outputs (APM = Passenger movements (number); CAR = Cargo; AAM = Aircraft movements; AR = Revenues from aeronautical activities; HR = Revenues from handling activities; CR = Revenues from commercial activities) and three inputs (LC = Labor costs; IC = Invested capital; OC = Other expenses).
Data used for the DEA analysis are shown in Table 3 and supplied as Supplementary Material with this article in order to reproduce the results. The dataset can also be downloaded from: https://go.uv.es/dearshiny/case_study1 (accessed on 12 June 2021).  Being on the web app (https://rbensua.shinyapps.io/deaR/ (accessed on 12 June 2021)), the first step is to load the data. As Figure 2 shows, this step is done in two phases:

•
Step 1 a: Loading data. Firstly, we can upload our own dataset (the app identifies the most common file extensions: txt, csv, tsv, xls, xlsx, sav, dta, xpt, etc.) or load a built-in dataset (deaR-shiny is not only oriented to research but also to teaching, so the application provides 24 built-in datasets from published articles). Once the data file is selected, we have to indicate the type of data we will work with, which depends on the analysis model to be applied. Thus, we will select the Normal Data option if a conventional DEA model is going to be executed, the Malmquist Data option to perform the Malmquist productivity index, or the Fuzzy Data option to run an FDEA model with uncertain data. To reproduce the results given in [76], we need to click on the Upload file tab and then on the button Browse to select the Excel file (case_study1.xlsx). As the option Normal Data is selected by default, we only have to click on the button Load Data to load the dataset. We should see something similar as Figure 3. To take a quick glimpse at the uploaded data, we click on the Data table Table  In doing this, we can see that Comiso Airport has three missing values.

•
Step 1 b: Selecting variables. Once the dataset is loaded, deaR-shiny automatically reads it and identifies the names of the header as the names of the variables. By default, the application considers that the DMUs are in the first column (if it is not, we can select the correct DMU column by pulling down the menu) and the rest of the columns are the inputs/outputs. At this point, all the DMUs are selected for evaluation. Alternatively, we can select the DMUs that will constitute the evaluation reference set (DMU ref) and the DMUs to be evaluated (DMU eval). As Comiso Airport has missing values and we do not know how [76] treated them, we will omit this DMU. Therefore, we deselect Comiso from both DMU eval and DMU ref.
In addition, as Figure 4 shows, we also must identify which variables are inputs and which are outputs. This is done in the tab Inputs/Outputs. To measure the efficiency of regional Italian airports, [76] use three inputs (LC, IC, and OC) and six outputs (APM, CAR, AAM, AR, HR, and CR). To select them, we click on the corresponding check box. Note that we could also indicate whether there are uncontrollable, non-discretionary or undesirable inputs/outputs.  Once we have selected the DMUs to be evaluated and have identified the inputs and outputs, the second step (see Figure 2) is to choose the model that we want to execute (see Figure 5). All DEA models available in deaR-shiny are written in R and are called from the deaR package [60]. Carlucci, Cirà and Coccorese [76] run two output-oriented conventional DEA models: the CCR and the BCC DEA models. So, to run the CCR model, we pull down the Select model, which is located on the left-hand column side, and choose the Basic radial model (see Figure 5). New options appear immediately. From the Orientation menu we select the option output, and from the Returns to scale menu the option Constant. If we had selected the option Variable, we would have selected the BCC DEA model.
At this point, we are ready to run the selected model. To do so, we just have to click on the button: Go! Immediately, a dialog box appears with the following message: "Model run finished! Check the Results Tab!". Note that if deaR-shiny finds some error in the data (for example, negative values or blank cells), the message "Disconnected from server. Reload" will be shown. This message can also be shown if the system detects inactivity.
The main numerical results obtained from the executed model are shown in the Results Table The graphical results are shown in the Plots tab.
The Results tab is structured in two panels: • In the left panel, the different types of results, organized in tabs, are shown in a table.
In general, the numeric results offered by deaR-shiny refer to efficiency scores, slacks, target values, intensities (lambdas) or multipliers, and the reference set for inefficient DMUs. However, the results shown depend on the model executed. For example, if we run the Malmquist index, the results provided by deaR-shiny will be related to total factor productivity change, technical efficiency change (under CRS), pure technical efficiency change (under VRS), scale efficiency change, and technological change. Three action buttons are also provided in the left panel: Copy, Print and Download (to download the results in csv, pdf or excel format).

•
In the right panel are some options for saving the results. Thus, we can select the results to be exported as well as give a name and extension format to the file. deaRshiny gives the name ResultsDEAyear-month-day_hour:minute:second.xlsx by default. To save the results, we click on the button Create Excel file and then on Download excel file.
To get the results of our example, we click on the Results Table Results are shown in Figure 6.  Table 4 shows the efficiency scores obtained by the application of the output-oriented DEA model under constant and variable returns to scale. Airports with efficiency scores equal to 1 are efficient while those with scores greater than 1 are inefficient. Under constantreturns-to-scale, 12 out of 33 airports are inefficient: Aosta, Bologna, Bolzano, Florence/Pisa, Olbia, Palermo, Parma, Pescara, Reggio/Calabria, Salerno, Turin, and Verona/Brescia. Under de assumption of variable returns-to-scale only 9 airports are inefficient: Aosta, Bolzano, Palermo, Parma, Pescara, Reggio/Calabria, Salerno, Turin, and Verona/Brescia. Table 4. Efficiency scores of the 34 Italian regional airports.  The graphical results also depend on the selected model. As we executed a basic output-oriented CCR DEA model (or alternatively an output-oriented BCC DEA model), the plots obtained are: (1) bar plot of the number of efficient and inefficient DMUs, (2) distribution of the efficiency score of the inefficient DMUs, (3) horizontal bar plot of the number of times that efficient DMUs appear in the peer set of the inefficient DMUs, and (4) the references chart. In this last plot, which is a novelty in the exploitation of DEA results, a network graph in which the green circles represent the efficient DMUs and the red circles the inefficient ones. As it is well known, an inefficient DMU has a reference set that is built with the efficient DMUs with which it is directly compared. These efficient DMUs can be used as the benchmarks of the inefficient one, since they represent the projection point on the efficient frontier in order to eliminate the inefficiency. Efficient DMUs in a reference set can have different weights or intensities (lambdas). Reference sets obtained from the output-oriented CCR model are shown in Table 5.  Thus, the reference plot (see Figure 7) is a graphical depiction of the reference sets of the inefficient DMUs. In this plot, the inefficient DMUs are represented as red nodes in the inner circle, while the efficient DMUs correspond to the green nodes lying in the outer circle. From each inefficient DMU, there is an arrow joining it with each one of the efficient DMUs in its corresponding reference set. Additionally, the size of each efficient DMU (i.e., each green node) is proportional to the sum of the intensities (lambdas) obtained in the different reference sets to which it belongs. 3.2. Case Study 2. Liu (2000, 2003) In this second case study, we reproduce the results in Kao and Liu [77,78]. In this article, the authors calculate the efficiency scores of the 24 university libraries in Taiwan. To measure the relative efficiency of the libraries, Kao and Liu [77,78]  However, there are three libraries that are unable to provide all the data. That is, there are missing values in Expenditures and Services. Kao and Liu treat these missing data as imprecise data (fuzzy numbers) that can be represented by a triangular membership function. More specifically, the smallest possible, most possible, and largest possible values of the missing data are set to the minimum, median, and maximum values from the observed data in the corresponding category. Since Kao and Liu have imprecise data, they use a fuzzy DEA model [68] to calculate the fuzzy efficiency scores.

Output-Oriented VRS
The data used by Kao and Liu [77,78] are available in deaR-shiny. Therefore, to load the data we only need to pull down the menu Select built-in dataset and select Kao_liu_2003. Then we select the option Fuzzy data because we are working with imprecise data, and finally, we click on the button Load Data. Now, we have to identify which variables are inputs and which are outputs. As mentioned above, Patronage is a crisp (non-fuzzy) input; Collections, Personnel, and Buildings are crisp outputs. Expenditures and Services have some missing observations. So, Kao and Liu [77,78] represent these two outputs by triangular fuzzy numbers, which are described by the triplet Y = (dL, mR, dR). In the Kao_Liu_2003 dataset, dL = beta_3_l and dR = beta_3_u for Expenditures. Similarly, dL = beta_5_l and dR = beta_5_u for Services.
The selection of inputs/outputs variables should be similar to the one shown in Figure 8. The relative efficiency scores of the university libraries in Kao and Liu [77,78] are obtained by running an output-oriented fuzzy DEA model under variable returns to scale. Therefore, to replicate these results, we choose Kao-Liu fuzzy model from the DEA models tab and the Basic radial model among the options in the drop-down menu Select crisp DEA model. Finally, we select an output orientation, variable returns to scale for the model, and set the Alpha-Cuts at the value 1 because efficiency scores in [77] are estimated for an alpha-cut at α = 0. If we want to get the results in [78], we will set the Alpha-Cuts at the value 11, since in this article the authors give fuzzy efficiency scores expressed by the α-cuts at eleven α values. Figure 9 shows the different selected options. To run the model, we click on the button Go! Now, to get the results from the fuzzy DEA model, we need to click on the Results Table. deaR-shiny allows us to graphically visualize the fuzzy efficiency scores by means of a dumbbell plot (see Figure 10). The lowest and highest efficiency scores (fuzzy efficiency scores) are shown for each library.

Conclusions
deaR-shiny is an interactive web application used to evaluate efficiency and productivity using Data Envelopment Analysis. deaR-shiny is the result of linking several R packages. On one hand, it uses the shiny and shinydashboard packages, which are R packages for creating reactive web applications. On the other hand, it calls the deaR package to execute the selected DEA model, inheriting the versatility of this package and a large number of different models it integrates. Both practitioners and researchers will find deaR-shiny a very intuitive and user-friendly web application, and at the same time, a very powerful tool for efficiency and productivity analysis. In this paper, we illustrated how to use the web app in two case studies. In the first case study, we have replicated the results given in Carluci, Cirà and Carcasee [76], who use two traditional output-oriented DEA models: CCR model (under constant returns to scales) and BCC model (under variable returns to scale). In the second case study, we reproduced the results obtained by Kao and Liu [77,78], who apply a fuzzy DEA model because they are treating with imprecise data.
We are currently working to extend the capabilities of deaR-shiny by incorporating more models such as network DEA, models treating with negative data, stochastic DEA, or FDEA models. Moreover, we are also trying to improve the graphic results of DEA models.