Microgrid Infrastructure Compendium Analysis with a Model Creation Tool and Guideline Based on Machine Learning Techniques

: A microgrid (MG) is an electric power distribution system that may provide a suitable ecosystem for distributed generation. Detailed information about the infrastructure layer in MG projects is available, so this study aimed to propose a compendium and a model creation guideline for MGs. The aggregated information based on 1618 MGs was summarized into di ﬀ erent tables and analyzed based on various parameters. Two MG infrastructure model creation tools were developed. First, a simple guideline was created based on the information in the tables, and then a machine learning tool based on decision trees was proposed that generates more accurate MG models using two main inputs: latitude and the segment in which they operate.


Introduction
Countries that are mostly in accord with the international climate agreement are reducing greenhouse gas emissions and shifting towards a low-carbon, sustainable economy. They are trying to achieve those goals by implementing energy-efficient strategies and producing energy with renewable sources [1]. The ability to produce electrical power with renewable energies has steadily increased, resulting in a significant increase in the generation of renewable sources of energy. This expansion has been led by the use of solar photovoltaic (PV) systems, which represent more than a half of the total renewable energy generation capacity that is currently in place. This significant increase in the generation capacity of renewable energy is mainly due to the decrease in the costs of solar PV and wind power systems [2]. Since most renewable energy resources may be decentralized, distributed generation (DG) is an adequate integration environment for these systems. Microgrids (MGs) may emerge as a way to support the operation and control of energy systems, not just for DG but also to address the demand, storage and protection of these systems without completely redesigning the entire electric power distribution system [3]. MGs are also considered to be a power system that can provide clean and renewable energy in rural and isolated areas. In fact, in isolated rural areas, MGs represented 6% of all new electrical connections from 2012 to 2016 [2]. This paper presents a compendium showing many different types of MGs.
First, it is important to define MGs. According to [4], "Microgrids are comprised of Low Voltage (LV) distribution systems with distributed energy resources (DER) together with storage devices and flexible loads. Such a system can be operated in a non-autonomous way if interconnected to the grid or in an autonomous way if disconnected from the main grid. The operation of microsources in the network can provide distinct benefits to the overall system performance if managed and coordinated With the aim of reviewing a vast number of MGs, this study adapted and followed a systematic literature review guideline from the software engineering research field [23]. These kinds of methodologies and guidelines allow a researcher to comb through large amounts of information in a more organized, rigorous, and consistent way. This approach is typically used in the medical research field. The analysis of a large number of different types of MGs should be done by reviewing both industrial documents and academic manuscripts. The present paper addresses the aggregated information obtained from 1618 MGs [6,[17][18][19] from around the world in order to: (1) study how the infrastructure layer of an MG varies with location to determine if the environment has an impact on an MG project and (2) create guidelines and tools to produce models which synthesized the information from the infrastructure layer of an MG. This paper is structured as follows. Section 2 presents the study's methodology. First, it explains the guideline used to collect and analyze the data; it notes the quantity of that data and it discusses how the data were clustered. The section ends with the methodology used in the analysis and the creation of the MG model. Section 3 presents an analysis of the MG compendium; the aggregated information is presented in tables and its significance is discussed. Section 4 discusses two main outcomes. The first outcome is the creation of an MG model based on the information presented in the tables in Section 3. The second outcome is the proposal of a machine learning (ML) tool, based on decision trees, that generates an MG model out of the information obtained from 1618 MGs and two main inputs: latitude and the segment in which the MG operates. Section 5 presents a conclusion about the information included in this paper.

Methodology
As part of the systematic review, a template was created to gather the infrastructure information and classify the MGs based on the following criterion: a unique ID for each MG, the name of the project; the continent and country where it was built; whether it works in AC, DC, or both; a Boolean matrix with its generation technologies; the power capacity of those technologies; the total amount of power capacity installed; the control strategy; the segment in which it operates; the year of implementation; the geographical coordinates; whether it is connected to the main grid; and the gross domestic product (GDP) of the country where it was built.
The systematic review provided a reliable methodology to collect and structure the information of two industrial databases and our own research. The three of them together made out the entire compendium. However, it was not always possible to collect the information of every components in the MG. The information from different sources was not always complete or consistent; even some large databases of MGs, such as Navigant or the one compiled by GTM, did not have all of these basic sections completely filled out. In fact, depending on the source, the information about the MGs differed, less than 7% of the information may be affected by this issue. When incoherent or missing data was found, deep web research was performed. If nothing was found, MGs' managers were emailed. Then, if still there was no clue, we used the information of the database which was more reliable. Actually, the power capacity data is only 60% complete. This means that, while it might be known that an MG has a PV system, its power capacity in MW may be unknown. Therefore, from the 1618 MGs that were analyzed, the present study only has complete information for 968 MGs, and this has been contrasted with various sources, when possible.
All MGs have a segment in which they are operating. The segment can be conceived as a tag that summarizes the MG's purpose. In the present study, the segments chosen to aggregate the MGs of the compendium are differentiated into seven categories: remote, community, commercial and industrial (C&I), campus, testbeds, military, and utilities. Remote MGs always work in isolation from the main grid. The community segment includes residential and rural areas, villages, private homes and national parks. The C&I segment includes a very wide variety of facilities, such as offices, hotels, mines, farms, data centers, airports, resorts, hospitals, offices, shops, and warehouses, among others. The campus segment includes universities that have added this kind of energy generation and control.
The testbed MGs are research facilities and laboratories. The military segment includes all bases in which MGs are used to generate energy. The utilities segment includes MGs that may support the main grid of a large network or a concrete location.
The MG data aggregation was developed to show the differences in the electrical power technologies in an MG based on the location or type of segment. The model creation section is divided into two parts. In the first part, the methodology followed in the model creation guideline was used to guide the researcher through the possible readings of the information presented in the different tables in Section 3. In the second part, the ML tool was designed for easy and fast implementation using MATLAB, where a few files with very low memory usage can create analogous models compared with the real MGs information reviewed.

Microgrid Compendium Analysis
In this section, the infrastructure information obtained from 1618 MGs was aggregated into tables, showing the differences in their location, segment and power generation technologies, among other characteristics such as the gross domestic product. The information about each of the MGs presented in the tables is further explained in the section.
In Figure 1, the percentage and number of MGs over the total number of MGs analyzed is divided by segments. The remote segment is the largest one, representing 41% of the worldwide MGs. For example, while there are around 82 inhabited islands in Mexico with an electrical grid system that can be defined as an isolated MG, the compendium only reports on 11 MGs due to the lack of information. If this is extrapolated to the rest of the world, it may mean that the remote MGs might comprise an even greater percentage of MGs, and, by far, they would be the largest segment. The rest of the segments in decreasing order are: community, C&I, campus, utilities, military, and testbeds. In some cases, the testbeds and campus segments were difficult to disaggregate due to the use of the MGs as both power and heat supply sources for researchers at universities or in laboratories. The information from 1618 MGs gathered in the compendium has been summarized into 14 different tables. Most of the data are from the infrastructure layer perspective, taking into account the generation technologies used in each MG and the power capacity. The generation technologies gathered in the compendium are: PV panels, wind (W) generators, fuel cell (FC) generators, fossil fuel (FF) generators, microturbines (M), combined heat and power (CHP), solar thermal (STh), hydro (H) generators, biomass and biofuels (BioF) generators, and storage systems (St). The technologies that are connected in each MG will be referred to as a combination (see Table 1). Using the information from the 1618 microgrids, 78 different and unique combinations were differentiated. Table 1 presents 16 of those combinations, covering 75% of the MGs worldwide. As seen in Table 1, the PV and St combination is the one that is most often used; no matter the segment, 12% of the MGs use this combination. This is followed by MGs that only use PV panels (9% of the MGs). The third most often used combination integrates PV, FF, and St technologies (7% of the MGs). These same 16 combinations are divided by segment and latitude, as seen in Table 2. The composition of each cell can be identified in the table's legend; it follows the same pattern used in Figure 1. For each combination in a specific latitude, the segment with more MGs is highlighted in bold. Latitudes are divided in increments of 30 • , except in the southern hemisphere where there is less information. In that hemisphere, the information was aggregated based on latitudes ranging from 30 • S to 90 • S. The vast majority of MGs are built north of the equator, more concisely between 30 • N and 60 • N latitude.
As seen in the legend in Table 2, the number of MGs in each segment are shown base on combination and latitude, representing a worldwide map of MGs usage. There are also aggregations of the number of MGs by latitude and combinations in the right and bottom section of Table 2, respectively. Table 3 presents the percentage of the generation technologies from the total number of MGs in each latitude division. When not disaggregating by latitude, it is worth noting that 56% of the MGs use PV panels, and 42% still use FF to power their generators. For example, the latitude of an MG in Europe might be located between 45 • N and 60 • N, and the most used component would be FF. Another read of the same row is that MGs in that latitude range are most probably composed of one or more generation technologies, such as PV, W, FF, H, and St. Table 4 presents the same percentage; however, instead of comparing it by latitude it is compared by segment. In this case, in terms of stability and resiliency, it should be highlighted that FF is used in almost 63% and 57% of the military and remote segments, respectively. More than 50% of the MGs in those segments incorporate non-renewable energy generation sources.    5  5  7  1  4  2  22  0  3  2  2  0  0  0  1  0  4  8  2  5  2  3  0  1  0  0  4  1  1  0  3  0  14  1  5  8  1  20  1  0  0  3  2  0  0  2  6  1  Subtotal MGs per  combination  184  138  106  93  89  82  76  72  61  59  58  47  42  38  34 29  The power capacity of the different generation technologies is also a key component needed to establish an MG model. As seen in Table 5, the median power capacity of the generation technologies disaggregated by segment was aggregated. As seen in the information presented in Tables A1-A8 in the appendices section, each segment is represented, and different percentiles of power capacity for each generation technology were determined. The creation of different models is discussed in the following section. For a wider perspective of the MGs, the information presented in Table 5 is sufficient to establish a model. To add more security to the model and identify the power capacity of a larger number of MGs, the tables in the appendices section offer more information. From the different information shown in the tables presented in this section, it can be seen how the geographical position (latitude) of an MG directly influences the infrastructure layer. The weather conditions, particular to each position on the globe, should be considered when trying to model the infrastructure layer and manage the MG. Furthermore, depending on the business model of the MG or the segment in which it operates, all of the internal layers can be affected. It should also be noted that the GDP of a country is directly connected to the number of MGs and the power capacity. As shown in Table 6.

Microgrid Infrastructure Model Creation
In this section, two different methodologies are proposed. One is related to the aggregated information presented in the tables shown in the previous section and the other is based on ML classification and regression algorithms, which generate MG models based on latitude and segment as the inputs.

Guideline Based on Aggregated Information from the Tables
The information gathered in the analysis of the 1618 MGs can be used to establish different MG models for scientific and simulation purposes. After that analysis, the combinations most commonly used in each segment will be identified to determine the power capacity of their generation technologies, thus creating different example models. The aggregated information from the 16 combinations presented in Table 1 represents MGs from 1978 to 2017. Although a wide range of combinations is used, in every single combination, most of the MGs were built between 2010 and 2016, so the information extracted from these models will primarily correspond to that time period. Figure 2 shows a flowchart depicting different ways to develop the MG model based on the information presented in the tables in Section 3. The process to create the model has two parts. One part is related to the technology used in the MG and the other is related to the power of the technology that is selected. Each part refers to data presented in different tables, depending on the researcher's needs or the information available. Based on the flowchart depicted in Figure 2, this paper proposes two models. One model utilizes the most common MGs in each segment and latitude and the other model utilizes rural areas in developing countries.  To create the model based on the available information, three paths may be followed within the first layer. For the model with restrictions about the segment and latitude of the MG, the information presented in Table 2 should be used. The examples provided in this paper will follow this path. If the only constraint is the latitude, the information in Table 2 or Table 3 can be used to select the technology of the MG. Lastly, if the restriction is the segment, the information in Table 2; Table 4 should be used to select the technology of the model. To create the model based on the available information, three paths may be followed within the first layer. For the model with restrictions about the segment and latitude of the MG, the information presented in Table 2 should be used. The examples provided in this paper will follow this path. If the only constraint is the latitude, the information in Table 2 or Table 3 can be used to select the technology of the MG. Lastly, if the restriction is the segment, the information in Table 2; Table 4 should be used to select the technology of the model.
To follow the segment and latitude's restriction path Table 2 must be used. The latitude range of the MG is contained in the rows in Table 2; to select the range, one is restricted to choosing the information presented in those rows. Information about the MG segments is listed in the cells in Table 2 (see the legend in the upper left side of Table 2). The number of each combination is displayed in the different columns, except for the last column, which aggregates all the combinations based on latitude. By selecting the path with only the latitude restriction, in Table 2 the combination can be selected by adding all of the MGs in each column (except the last column) within the row of the latitude range that is selected. The combination with more MGs will be the one that is most often used in that latitude range. Moreover, Table 3 shows the percentage of each technology used in the MGs from each latitude range. By establishing a representative minimum percentage, the combination can be selected. When the segment is the only restriction, the penultimate row aggregates the information from all the latitudes (Table 2). According to the legend, the most used combination can be selected. When the segment is the restrictive parameter, the information presented in Table 4 can be also used. In this case, a representative percentage might be selected and, with that, a combination can be created. To select the power of the model, a layer depicted in Figure 2 must be changed. Table 5 shows the median power capacity of each technology by each segment and the aggregated data. If the median does not fit the needs of the model creator, the appendices provide different tables for MGs by segment or aggregated with the power capacity divided into percentiles.
The United States, central Europe, China, and Japan are primarily located between the latitudes of 30 • N and 60 • N. According to the information presented in Table 2, most MGs also operate in this latitude range. Since the aim of the first example is to use the most common MG in each segment, the MG model will be created in this latitude range. The methodology used to develop the example models involves two core steps based on the layers depicted in Figure 2. The first step is to decide the generation technologies of the model. Then, with the information presented in Table 5, the power capacity for those generators will be determined. Although the examples are focused on the main combinations used, the researcher can adapt the model to his/her needs following the instructions presented in the previous paragraph.
Starting with the remote segment, which is more extensive than any of the other segments, as seen in Figure 1, the three most commonly used combinations in the latitude range mentioned above are: C13 (FF-H), C7 (W-FF), and C5 (PV-FF). Now, to set the power capacity, the statistical value of the median is used, which is shown in Table 5 Table 7. Nevertheless, in both the testbed and campus segments, there is a draw between the third most often used combinations, so all of the combinations are gathered in that position. Another interesting latitude range for MG projects is between 30 • S and 30 • N. Rural areas in developing countries in Africa, South America, and Asia are within that latitude range. In these cases, and assuming remote rural areas, two combinations must be highlighted. From 0 • to 30 • N, the most often used combination, taking into account the median power capacity presented in Table 5 Table A2 in the appendices section.
These two approaches can be used and tailored to address a researcher's interests and purposes. Although the models are generated from the infrastructure layer, with the statistical information gathered, they take into account environmental conditions from the business or the climate condition layers.

MG Model Based on ML Algorithms
The previous model creation guideline covers the MGs infrastructure paradigm, generically. To establish more detailed MG models based on the information from the compendium, an ML algorithm was trained. This subsection provides details about the algorithms, the data used, and how the algorithm was trained.
In 1997, Tom Mitchell provided a precise definition of ML: "A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E" [64]. The compendium data gathered in this study has already been labeled, so the input and output vectors used in the ML algorithms are known. These kinds of examples are recognized as supervised learning [65]. The desired output matrix of the model has one column with Boolean categories referring to the type of technology used and another column with continuous variables referring to the power capacity output in MW of those technologies. This task addresses a classification algorithm and regression algorithm, respectively. Since the model does not gradually learn online, it can also be defined as a batch learning system. In summary, for the model creation based on ML algorithms, a batch supervised learning system based on classification and regression algorithms was developed.
The classification and regression learner application of the MATLAB platform were used to train the model. The training data of the algorithms is based on the MGs compendium. The algorithms that were used were required to support both numeric and categorical data, so decision trees, logistic regression, and support vector machines were selected as possible options. Moreover, prediction speed, memory usage, interpretability, and model flexibility were additional criteria used to determine the algorithms. Based on the fastest speed, smallest memory usage, easiest interpretability for easy implementation, and highest flexibility, the decision tree algorithm was chosen-allowing a maximum number of 100 splits in each tree. In order to easily share and interpret the ML model, fine decision trees were used for both the classification and regression algorithms. An example of a classification decision tree is shown in Figure 3, where the algorithm decides whether or not an MG model has storage. Classification and regression trees are similar except for the training data used and the output of each type of algorithm, which is Boolean in the first case and numerical in the second.  The purpose of this ML tool is to generate solid models based on the compendium with fast and easy implementation. Consequently, the structure with two layers followed in the model creation guideline discussed in the previous subsection was similarly used to train the algorithms. As seen in Figure 4, the data flows through the different layers and the algorithms until the output is delivered. Each circle represents an algorithm for each type of technology gathered in the compendium. To establish the algorithms from the first layer, the data from latitude, segment, and the Boolean information from each type of technology were used to train each decision tree. Depending on the user interest the information in the technology layer can be also used as an input for the algorithm. By doing this, the user selects his preferred combination of generation technologies within a latitude and a segment so it is not provided by the model. The output of the model is the power associated with the technologies selected and with the constraints of the latitude and segment provided.  If the user did not select his preferred technologies, the output of this layer is whether or not the MG has different technologies based on the latitude and the segment. The second layer, regarding the power capacity, was trained with the latitude, segment information, whether or not the MG uses a specific type of technology and the power output. The output of the layer is also the power delivery of each type of technology. A third layer has been added where the CAPEX costs of each generation If the user did not select his preferred technologies, the output of this layer is whether or not the MG has different technologies based on the latitude and the segment. The second layer, regarding the power capacity, was trained with the latitude, segment information, whether or not the MG uses a specific type of technology and the power output. The output of the layer is also the power delivery of each type of technology. A third layer has been added where the CAPEX costs of each generation technology can be added by the user. This provides an estimated cost of implementing those technologies to the MG. If no costs are added by the user, prices by default were implemented based on the median capacity installed in MGs and according to [66][67][68][69].
After training, the ML tool works as a black box where the inputs are latitude and segment and the output is a 10 × 4 matrix where the first column is a description of the technology listed in each row. The second column has the Boolean information of whether or not the MG model has a specific type of technology. These technologies could have been selected by the user. The third column shows the power delivered by those technologies in the model. The last column provides the estimated costs associated to integrating those generation technologies to the MG.
The The ML algorithms are shared as a MATLAB script, and different MATLAB functions can be downloaded to generate specific models based on the compendium of the 1618 MGs.

Conclusions
In the present study, information about power generation technologies of 1618 MGs have been analyzed and condensed with a ML model, by contrast to previous studies that dealt with fewer number of MGs and no searching tool. By aggregating and representing all the collected data, authors aimed to establish a guiding principle for researchers to perceive how real MGs are being deployed around the world.
With the aggregated information of the compendium, two MG infrastructure model generators were developed. A ML tool shared as a MATLAB script with a classification and regression learner and a flowchart guideline. The ML learning model is similar to a searching tool and generates MG models using the latitude and the segment as inputs providing the generation technologies and power capacity based on the compendium data. Those generation technologies can be also be provided by the user and the algorithm provides the power associated to each technology.
While both models can be adapted to the researcher's needs; there is a difference in the outcome. The model creation guideline produces more generic MG infrastructure models than the ML algorithms due to the resolution in the latitude presented in the tables and the correlations made among the data by the algorithm. Actually, the ML tool latitude precision is close to 2 • instead of the 15-30 • from the guideline.
It has been found that there is a lack of information related to costs associated to the MG or the generation technologies. Training the ML algorithm with these data would provide very valuable insights to the state of the art in the field and enhance the contribution of the proposed model. Therefore, future studies within this topic are suggested. Acknowledgments: The authors also thank Marta Peña Martinez and the Process and Specialized Information Department of the University of Carlos III library for their support in adapting the systematic review methodology. We would also like to thank Blanca Barrios Sanchez and Jesus Lopez Merino for their support during the data gathering process.

Conflicts of Interest:
The authors declare no conflict of interest.