Micro, Small, and Medium Enterprises’ Business Vulnerability Cluster in Indonesia: An Analysis Using Optimized Fuzzy Geodemographic Clustering

: The COVID-19 pandemic has caused effects in many sectors, including in businesses and enterprises. The most vulnerable businesses to COVID-19 are micro, small, and medium enterprises (MSMEs). Therefore, this paper aims to analyze the business vulnerability of MSMEs in Indonesia using the fuzzy spatial clustering approach. The fuzzy spatial clustering approach had been implemented to analyze the social vulnerability to natural hazards in Indonesia. Moreover, this study proposes the Flower Pollination Algorithm (FPA) to optimize the Fuzzy Geographically Weighted Clustering (FGWC) in order to cluster the business vulnerability in Indonesia. We performed the data analysis with the dataset from Indonesia’s national socioeconomic and labor force survey (SUSENAS and SAKERNAS). We ﬁrst compared the performance of FPA with traditional FGWC, as well as several known optimization algorithms in FGWC such as Artiﬁcial Bee Colony, Intelligent Fireﬂy Algorithm, Particle Swarm Optimization, and Gravitational Search Algorithm. Our results showed that FPAFGWC has the best performance in optimizing the FGWC clustering result in the business vulnerability context. We found that almost all of the regions in Indonesia outside Java Island have vulnerable businesses. Meanwhile, in most of Java Island, particularly the JABODETABEK area that is the national economic backbone, businesses are not vulnerable. Based on the results of the study, we provide the recommendation to handle the gap between the number of micro and small enterprises (MSMEs) in Indonesia.


Introduction
The COVID-19 pandemic has caused a contraction in the global economy as a result of various efforts in countries to reduce the COVID-19 pandemic numbers, such as lockdown policies and large-scale social restrictions. With no exception, an economic decline has also been felt by Indonesia of up to −2.07% (YoY) [1][2][3][4][5][6][7]. The manufacturing sector and service operations are the main driving force for economic growth [8][9][10][11].
In 2019, the contribution of the manufacturing sector to the Indonesian economy was 19.70 percent. This contribution was higher than the business and agricultural sectors, which contributed 13.01 percent and 12.72 percent, respectively. Apart from contributing through value-added products, the manufacturing industry is also able to provide jobs. In

Materials and Methods
This research implements the use of machine learning [49,50]. The complexity of the characteristics dataset requires precision in selecting relevant statistical analysis methods and can overcome the balance and outliers of the classification process [51][52][53]. We have identified data distribution assumptions, minimized errors to get good accuracy, and have displayed the assumptions in the form of a dashboard. In answering the objectives of this study, we used data sourced from Statistics Indonesia, namely the National Socioeconomic Survey and the National Labor Force Survey. At the stage of establishing the MSME business vulnerability cluster (BVC), this study used locations that are included in the category of disadvantaged, frontier, and outer regions of Indonesia (Table 1 SUSENAS question block). This category was defined by Presidential Regulation (PERPRES) Number 63 of 2020 concerning the determination of frontiers, outermost and least developed regions in 2020-2024, often referred to as "3T" (terdepan, terluar, tertinggal) regions and including as many as 62 regions.

The National Socioeconomic Survey (SUSENAS)
SUSENAS is a source of socioeconomic data on households in Indonesia [54,55]. In general, the purpose of collecting data through SUSENAS is the availability of community welfare that can reflect the socioeconomic conditions of the community. Specifically, the SUSENAS targets are: availability of basic data on community welfare at the district/city level and compiling detailed data on housing and health at the provincial level. The compilation of detailed data on household consumption expenditure, both in rupiah value and in quantity, is used as a basis for estimating consumption patterns, population size, nutritional adequacy, distribution of expenditures, and poverty at the national level. The question blocks that will be used in this research are represented in Table 1.

The National Labour Force Survey (SAKERNAS)
SAKERNAS is a special survey to collect employment data. The manpower data collection through SAKERNAS has three main objectives. The three objectives are to determine: (1) employment opportunities and their relation to education, number of hours worked, types of work, employment opportunities, and employment status; (2) unemployment and underemployment; and (3) people who are included in the nonlabor force category, namely, those who go to school, take care of the household and carry out other activities. Table 2 describes the question blocks that will be used in this study. Is (NAME) currently attending any training/courses/training (does not have to be certified)?
The main job What was the main business field/line of work of the place (NAME) worked during the past week?
What is the type of occupation/position of the main job (NAME) during the past week?
How long has (NAME) been looking for a job/preparing for a business in the main job?
Is there a certain party (individual/business/company) that regulates/coordinates the business/work (NAME)?
How many workers/employees/employees are paid? Did (NAME) use digital technology in their main job during the past week? Did (NAME) use the internet in their main job during the past week?
Is the internet used for: 1 Are the goods/services produced from work a week ago prioritized for their own use?
Number of working days, income and wages/salary.
What is the type of agency/institution from the workplace/business of (NAME)?
What is the main location of the workplace/business (NAME) at home?

Work experience
Has (NAME) ever had a previous occupation/main business?
Has (NAME) stopped working from the main job/business in the past year?
What was the main reason (NAME) stopped working at the main job/business during the past year? Layoffs 1 Business closes/goes bankrupt 2 Income is not satisfactory 3 Not suitable for the work environment 4 Out of work period/contract 5 Not in accordance with skills/skills gained 6 Pregnant/giving birth/childbirth 7 Taking care of the household 8 Cannot be classified into codes 1-8, What was the status/position of (NAME) before resigning from the last main job/business? Doing business alone (1) Doing business assisted by temporary workers/workers (2)

Nature-Inspired Spatial Clustering: The Naspaclust Package
Naspaclust is an abbreviation of nature-inspired spatial clustering. This is an R package that accommodates the optimization spatial clustering results of Fuzzy Geographically Weighted Clustering using nature-inspired metaheuristic algorithms [48]. There are two types of algorithm in the package: classical and optimized. The classical algorithm implements the FGWC that was developed by Mason and Jacobson in 2007 [56]. On the other hand, there are seven optimization algorithms that were developed in this package that was constructed from many previous studies (see [48] for details). However, this study limits the optimization algorithms to five, namely the ABC [14], FPA, GSA [36], IFA [19], and PSO [57][58][59][60]. The optimization of these algorithms mainly used the centroid approach that was developed by Runkler and Katz [61], and then this approach was implemented into FGWC [19,30,31,62] so that it could be used to produce a controllable parameter solution.

Data and Algorithms
This study mainly uses data from two sources, the SUSENAS and SAKERNAS. We obtained the sample's information with criteria that they be older than 15 years old and have an enterprise with paid employees. Subsequently, the data were aggregated into the district level. As the filtering rules applied, we obtained only 503 districts that fulfilled the criterion, instead of 514. All the algorithms mentioned above had been implemented in FGWC, except the FPA. Thus, this study also proposes the FPA algorithm to optimize the clustering results of business vulnerability in Indonesia. The FPA is a metaheuristic algorithm that was inspired by the pollination of the flower by the insects. FPA had previously been implemented in various studies, including those using FCM and was proven to optimize the FCM clustering results [40].

Flower Pollination Algorithm
The FPA is a metaheuristic algorithm that is inspired by the pollination of plants [37]. The goal of the FPA is the "survival of the fittest" of the plants by considering the parameters for the most optimal reproduction, which in this case is the best objective function [47]. In this study, the objective function to be optimized is fuzzy clustering with the centroid approach and the parameter is the centroid [19,30,61]. There are two types of pollination: crosspollination, or global pollination, and self-pollination, or local pollination. Local pollination is fertilization from the same plant, regardless of the flower difference. Normally, the self-pollination uses wind and water so that it cannot move at a long distance [46]. On the other hand, crosspollination occurs through pollen from a different plant. Usually, the pollen is carried by pollinators who are long-distance flyers such as insects [44]. Mathematically, the long flights correspond to the Levy Flights behavior which fulfils the Levy distribution [37]. The selection of cross and self-pollination depends on the switching probability between 0 and 1 [43]. In summary, the pseudocode of FPA implementation in FGWC can be seen in Appendix A.

Research Workflow
This study started by running the combination of methods as well as the number of clusters. Furthermore, we obtained the validation indices for each combination to be evaluated. We did not only compare the clustering performance based on the objective function but also the validation indices. We assessed which algorithm performed best in this study. The latter subsection briefly explains the evaluation methods.
After selecting the best algorithm, we then determined the optimum number of clusters by comparing the objective function as well as the validation indices. Despite determining the optimum value via the minimum or maximum values, we also used the elbow method and considered the optimum value in determining the optimum number of clusters. The elbow method has been widely used in order to get the optimum clusters in a certain algorithm [19]. Moreover, we could also use the maximum or minimum value of the indices to get the optimum number of clusters [63]. The selected number of clusters was used to analyze the business vulnerability condition in Indonesia.

Evaluation Method
The objective function is not enough to evaluate the performance of fuzzy clustering. Thus, the performance between the algorithms was compared using fuzzy clustering validation indices. Since the metaheuristic optimization has a random characteristic, this study performed 50 simulations for each combination of algorithm and number of clusters with different initialization. The sample of 50 is considered as a large random sample according to the statistical principle that uses a minimum sample of 30 [64]. Moreover, the previous study of Mehdizadeh [65] only used 10 simulation runs in his study of Fuzzy PSO, which is small enough to become a random sample. Subsequently, this study calculated the average of each evaluation method to be compared between the algorithms. We also performed a Nonparametric Kruskal-Wallis test to assess whether there was a difference between each algorithm's average performance [64]. A brief explanation of the validation indices can be seen as follows: (1) Partition coefficient (PC) The partition coefficient reflects the overlap between the fuzzy subsets and relies on the membership coefficients. Therefore, it lacks the additional consideration of the data and centroid. The partition index is calculated using [20].
(2) Classification entropy (CE) CE represents the fuzziness between clusters. Based on the equation, CE index value will always range from 0 to log a c. Thus, low CE index shows a more optimal cluster. The CE index is calculated as follows [20] (3) Separation index (S) The S index is a proportion of the objective function value to the minimum cluster separation. The minimum S index displays a an optimal cluster partition. On the other hand, the sum of the distances between centroids reflects the cluster separation [66].
(4) Xie and Beni index (XB) Along with the SC index, the XB index shows the variation magnitude between clusters as well as the separation clarity [66].
The IFV index is often used to validate spatial clustering due to its robustness and stability [67]. A maximum IFV index value reflects a good spatial cluster separation. The IFV index is measured using the equation:

Parameter Setups
There are two kinds of parameters in this study, namely the FGWC and optimization algorithms. For the FGWC, we first set the fuzzifier = 2. Subsequently, we increased the spatial effect, as recommended by Mason and Jacobson [56], of the membership configuration to 70% so that α = 0.3. On the other hand, for the spatial weight, we made the same influence of interregional population and distance that was adapted from previous studies [30,56], so that a = b = 1. Last but not least, we set epsilon = 1 × 10 −6 for the error tolerance, which is lower than Nasution et al. [19], to make sure that we obtained the best solution.
From the optimization point of view, we set the number of populations for the candidates to 15. Next, we set the extra termination for each algorithm when the global solution did not change 15 times. For the PSO algorithm, we set the inertia weight update using the simulated annealing. This is because the simulated annealing is one of the weights that produced optimum solutions in a previous study [68]. The rest of the parameter setups can be seen in Table 3. Table 3. Optimization algorithm parameter setups.

Algorithm Parameters
In ABC, n onlooker is the number of onlooker bees, and limit is number of turns to do the elimination. In FPA, γ and λ is the levy step size factor and shift, while p is the switch probability. Furthermore, G in GSA is the initial gravitational constant and v max is the maximum velocity for the agents. In IFA, γ and β represent the scaling factor for distance and attractiveness, while α k is the randomization constant. PSO had more parameters than the other algorithms. v max is the maximum velocity for particles, c 1 and c 2 are the cognitive and social scaling parameters, while w min and w max are the minimum and maximum inertia weight, respectively. Table 4 disseminates the clustering performance summary based on the algorithms and number of clusters. The bolded values show the best performance value in the number of clusters. We compile the average results of each simulation regarding the performance in this study, namely the objective function, PC, CE, S, XB, and IFV indexes. Based on the table, it can be seen that from the objective function, PC, CE, and IFV, the FPA performs best among all algorithms. In contrast, the classical algorithm performs best in most numbers of clusters in terms of the XB index. Meanwhile, the other validation indices such as FPA showed various best algorithms depending on the index and number of clusters. If we look at the details, the optimum performance was principally obtained by the FPA, although in some numbers of clusters, the ABC and GSA performed best.  In terms of computational cost, traditional FGWC provided the lowest computational cost compared to the other algorithms. By looking at the number of iterations, it can be seen that FGWC had the lowest iterations below 20 iterations with the poorest performance. The same pattern also occurred with the GSA and IFA optimization. However, looking at the performance evaluation, it can be seen that they performed worse than the FPA. This means that in the case of business vulnerability, the algorithms other than the FPA tended to converge toward the local optimum. Meanwhile, the FPA had more optimum solutions, although it required more iterations. Figure 1 supports the finding of the comparison of the methods. We employed the Kruskal-Wallis test to see whether there was a significant difference in the evaluation results between each method and plotted them using tile plot to make the interpretation straightforward. To make the comparison fairer, we only compared the optimized algorithms. Based on Figure 1, it can be seen that the chi-square value of the test inside of the tile plot is high for all the combinations of number of clusters and evaluation methods. In other words, there is a difference in the evaluation metrics among the optimization methods. By combining the previous finding and the statistical significance, we can conclude that the FPA is suitable to optimize the clustering results of business vulnerability.

Clustering Results
The previous subsection showed that the FPA is the best method to be used in this study. This subsection explains the clustering results using FPAFGWC. This section starts from the selection of the optimum number of clusters and then the interpretation of clustering results. Figure 2 shows the FPAFGWC clustering results based on the objective function and validation indices. We visualized the results to make a more straightforward decision about the optimal number of clusters. Based on the figures, it can be seen that the objective function, PC index, and XB index in Figure 2a,b,e tend to make the same pattern where the values decrease as the number of clusters increases. On the other hand, the CE and IFV index in Figure 2c,f shows the opposite. Surprisingly, the SC index in Figure 2d shows volatility between the number of clusters. other words, there is a difference in the evaluation metrics among the optimization methods. By combining the previous finding and the statistical significance, we can conclude that the FPA is suitable to optimize the clustering results of business vulnerability.

Clustering Results
The previous subsection showed that the FPA is the best method to be used in this study. This subsection explains the clustering results using FPAFGWC. This section starts from the selection of the optimum number of clusters and then the interpretation of clustering results. Figure 2 shows the FPAFGWC clustering results based on the objective function and validation indices. We visualized the results to make a more straightforward decision about the optimal number of clusters. Based on the figures, it can be seen that the objective function, PC index, and XB index in Figure 2a,b,e tend to make the same pattern where the values decrease as the number of clusters increases. On the other hand, the CE and IFV index in Figure 2c,f shows the opposite. Surprisingly, the SC index in Figure 2d shows volatility between the number of clusters.
Based on the figures, it can be seen that the objective function, XB, and IFV indexes could be used as our basis for using the elbow method because the PC and CE indexes showed an inversed relationship than they should have across the number of clusters. From the objective function and XB index, it can be seen that the "elbow" was formed in four clusters. Subsequently, the values tend to decrease slightly. Meanwhile from the IFV index, the "elbow" was directly formed in three clusters, with the following tending to have close value. From the PC and CE indexes, the optimal number of clusters was two, considering the best value. Moreover, from the SC index, it can be seen that the values increased from two to three clusters, although it decreased when the number of clusters was four and five. From the analysis above, we found that two and three were the optimal number of clusters for the business vulnerability analysis. Considering the purpose of this study to find which regions are vulnerable, we chose two clusters to be analyzed. Figure 3 shows the business vulnerability profile and Table 5 displays the cluster mean using FPAFGWC with two clusters. The details of the variable names in Table 5 can be seen in Appendix B. Based on the figure, it can be seen that Cluster 1 dominates almost the whole of Indonesia. Meanwhile, Cluster 2 mostly spreads around Java Island, including the JABODATABEK (Jakarta, Bogor, Depok, Tangerang, and Bekasi) area, though it includes some districts in Kalimantan and one district in Sumatera. Based on the cluster mean, it seems that there are some variables with a slight mean difference between clusters, such as the money source from the working household member, which produce relatively the same percentage. On the other hand, the means in Cluster 1 tend to be smaller than Cluster 2, except for training, micro enterprise, and household work organization. Thus, we can conclude that Cluster 1 is the vulnerable cluster and Cluster 2 is the nonvulnerable cluster. The vulnerability in Cluster 2 occurred because of the lower percentage of people who are trained, the lower percentage of micro enterprises, and the household work organization. This is due to the fact that Java is the most developed area in Indonesia and has many industrial areas.  Based on the figures, it can be seen that the objective function, XB, and IFV indexes could be used as our basis for using the elbow method because the PC and CE indexes showed an inversed relationship than they should have across the number of clusters. From the objective function and XB index, it can be seen that the "elbow" was formed in four clusters. Subsequently, the values tend to decrease slightly. Meanwhile from the IFV index, the "elbow" was directly formed in three clusters, with the following tending to have close value. From the PC and CE indexes, the optimal number of clusters was two, considering the best value. Moreover, from the SC index, it can be seen that the values increased from two to three clusters, although it decreased when the number of clusters was four and five. From the analysis above, we found that two and three were the optimal number of clusters for the business vulnerability analysis. Considering the purpose of this study to find which regions are vulnerable, we chose two clusters to be analyzed. Figure 3 shows the business vulnerability profile and Table 5 displays the cluster mean using FPAFGWC with two clusters. The details of the variable names in Table 5 can be seen in Appendix B. Based on the figure, it can be seen that Cluster 1 dominates almost the whole of Indonesia. Meanwhile, Cluster 2 mostly spreads around Java Island, including the JABODATABEK (Jakarta, Bogor, Depok, Tangerang, and Bekasi) area, though it includes some districts in Kalimantan and one district in Sumatera. Based on the cluster mean, it seems that there are some variables with a slight mean difference between clusters, such as the money source from the working household member, which produce relatively the same percentage. On the other hand, the means in Cluster 1 tend to be smaller than Cluster 2, except for training, micro enterprise, and household work organization. Thus, we can conclude that Cluster 1 is the vulnerable cluster and Cluster 2 is the nonvulnerable cluster. The vulnerability in Cluster 2 occurred because of the lower percentage of people who are trained, the lower percentage of micro enterprises, and the household work organization. This is due to the fact that Java is the most developed area in Indonesia and has many industrial areas.

Discussion
Our results showed that the optimization algorithm produced better results in optimizing the FGWC. This is consistent with the previous studies that found that the modification of FGWC can lead to a better clustering quality. The modification is not just limited to the metaheuristic optimization but also the modification of the spatial interaction [32], distance matrix [33], or the hybridization of context-based and fast computing like CUDA [34]. On the other hand, comparing the optimization algorithms, the FPA has the best performance in optimizing the FGWC clustering result in the business vulnerability context. This finding is consistent with the previous studies that found that the FPA outperforms the other metaheuristics algorithms [37,43,44]. Furthermore, Dhal [40], Kaur et al. [47], as well as Agarwal and Mehta [46], also found that the FPA performs well in optimizing clustering results. This study also used a switching probability of 0.7, which is close to Yang [37], who stated that 0.8 works better in several studies. Future studies may improve FPA optimization using some modifications, such as multiobjective FPA [69], distance steps setting using other distributions as in [70], implementing CUDA for increasing speed [34], etc.
Furthermore, for the clustering results, we found that Cluster 1 tends to be vulnerable. The regions are mostly spread through the whole of Indonesia except Java island. In 2020, the government imposed Act. No. 63 concerning the "3T" region. Interestingly, all the regions which comprise Cluster 1 were included. The "3T" regions are mostly located outside of Java island and are spread out. The data from BPS-Statistics Indonesia showed that only 37.74 percent of SMEs are located outside Java Island [71]. This should be concerning to government. The area outside Java mostly lacks phone signal. Moreover, the population of the border areas that are not often visited still live traditionally. The only customers of the businesses in these areas are the local people. An internet signal would be useful to increase the marketing strategy and promote their enterprises. When enterprises have internet access, their shops will be more easily noticeable to customers, and this will increase the chance of sales.
On the other hand, Cluster 2 is not a vulnerable area. It is mostly spread on Java island, particularly the JABODETABEK area. Based on the BPS, 62.26 percent of SMEs were located in Java island in 2019 [71], which shows the domination of SMEs by Java. Moreover, the area in Cluster 2 that is outside Java island are mostly city areas, such as Lhokseumawe in Aceh, Bandar Lampung and Metro in Lampung, Palangkaraya and Kota Baru (Baru City) in Central Kalimantan, etc. Nonetheless, the cities on Java island in Cluster 2 were mostly included in the Inflation City Survey by BPS-Statistics Indonesia. City areas tend to have easier access to the internet. The dense population triggers economic activity and with the addition of the internet, marketing to engage customers is easier.
Although there are some cities in Cluster 1, they also have problems that make them vulnerable in business. The Ministry of Cooperation and Small-Medium Enterprises (MCSMEs) asserted that there are four main problems in SMEs [72]. One of the main problems is the limitation of human resources. The limitation of knowledge access about business as well as the lack of business mentors makes it difficult for the SMEs to grow. From the marketing side, the lack of creativity and difficulties in goods and services distribution becomes the main problem. However, the online approach and increasing personal branding are ways to overcome this problem. The third problem is lack of access to enterprise capital, which limits production or optimal usage. Last but not least, the unlegalized SMEs still dominate the sector, making up approximately 98.68%.