A Normal Distributed Dwarf Mongoose Optimization Algorithm for Global Optimization and Data Clustering Applications

: As data volumes have increased and difﬁculty in tackling vast and complicated problems has emerged, the need for innovative and intelligent solutions to handle these difﬁculties has become essential. Data clustering is a data mining approach that clusters a huge amount of data into a number of clusters; in other words, it ﬁnds symmetric and asymmetric objects. In this study, we developed a novel strategy that uses intelligent optimization algorithms to tackle a group of issues requiring sophisticated methods to solve. Three primary components are employed in the suggested technique, named GNDDMOA: Dwarf Mongoose Optimization Algorithm (DMOA), Generalized Normal Distribution (GNF), and Opposition-based Learning Strategy (OBL). These parts are used to organize the executions of the proposed method during the optimization process based on a unique transition mechanism to address the critical limitations of the original methods. Twenty-three test functions and eight data clustering tasks were utilized to evaluate the performance of the suggested method. The suggested method’s ﬁndings were compared to other well-known approaches. In all of the benchmark functions examined, the suggested GNDDMOA approach produced the best results. It performed very well in data clustering applications showing promising performance.


Introduction
Meta-heuristic optimization is a sophisticated problem-based algorithmic design that creates optimization methods by combining multiple operators and search techniques [1,2]. The heuristic is a strategy that tries to find the best solution (optimal) [3]. In the cost estimating and artificial intelligence disciplines, meta-heuristics are used to solve difficult real-world issues, such as data clustering challenges, and other classic optimization problems [4,5]. Any optimization issue is unique, and thus necessitates a variety of metaheuristic approaches to deal with the circumstances, constraints, and variables of the problem at hand [6,7]. To find the best approach, such challenges necessitate the development of a sophisticated meta-heuristic optimizer that can handle each problem and usage separately [8][9][10]. Meta-heuristic optimization is now in demand for various uses, including designing a microgrid with an energy system [11], data mining [12,13], wind power forecasting [14], structural engineering [15], biological sequences [16], parameter extraction for photovoltaic cells [17], transportation, and finance [18][19][20][21]. There is a need to reduce decision values, especially in structures with parameters.
Creating a collection of clusters from supplied data items is known as data clusteringone of the most typical data analyses and statistic approaches [51,52]-in other words, how to find symmetric and asymmetric objects [53]. Classifiers, diagnostic imaging, time series, computer vision, data processing, market intelligence, pattern classification, image classification, and data mining are just a few of the clustering applications [54,55]. The clustering procedure aims to divide the provided objects into a predetermined number of clusters with related members belonging to the same group (maximization) [56][57][58]. Dissimilar individuals in multiple groupings, on the other hand, belong to separate groups (minimization). Partitional clustering, the method employed in this research, aims to divide a large number of data items into a collection of non-overlapping clusters without using nested structures. The cluster's heart is the centroid, and each data object is initially assigned to the centroid that is closest to it [59,60]. Centroids are adjusted based on existing assignments and by tweaking a few parameters. Some examples of data clustering applications that use optimization methods [61,62] are described below.
A thorough overview of meta-heuristic techniques for clustering purposes is presented in the literature as in [63], highlighting their methods in particular. Due to their adequate capacity to address machine learning challenges, particularly text clustering difficulties, the artificial intelligence (AI) techniques are acknowledged as excellent swarm-based technologies. For example, a unique heuristic technique based on the Moth-Flame Optimization (MFO) is proposed in [64] to handle data clustering difficulties. Various tests have been undertaken from Irvine Machine Learning Repository benchmark datasets to verify the effectiveness of the suggested method. Over twelve datasets, the suggested method was compared to five state-of-the-art techniques. The suggested technique outperformed the competition on ten datasets and was equivalent to the other two. An examination of experimental outcomes confirmed the efficacy of the recommended strategy. Moreover, a unique technique was presented in [65] based on data clustering efficiency envelopes (EDCO). Regardless of whether or not the camera model was in the database, the new EDCO technique was able to recognize it. The results showed that the EDCO method effectively differentiated unidentified source photos from known image data. The query image classified as known was linked to the origin sensor. The proposed technique was able to efficiently discriminate between photos from past and present camera models, even in severe instances.
A novel data clustering technique was proposed in [66] based on the Whale Optimization Algorithm (WOA). The effectiveness of the proposed approach was verified using 14 UCI machine learning library sample datasets. Experimental data and numerous statistical tests have validated the efficacy of the recommended technique. A simplex technique to increase bacterial colony optimization (BCO) exploring capacity called SMBCO was described in [67]. The suggested SMBCO method was utilized to tackle the data clustering challenge. Efficient machine learning datasets were used to examine the superiority of the proposed SMBCO approach. The outcomes of the clustering technique were evaluated using objective value and computing time. Compared to a traditional method with a convergence rate, the SMBCO model achieved excellent accuracy, according to the findings of trials. In [68], a beneficial approach called SIoMT was presented for regularly identifying, aggregating, evaluating, and maintaining essential data on possible patients. The SIoMT approach, in particular, is commonly utilized with dispersed nodes for data group analysis and management. The capacity and effectiveness of the suggested SIoMT technique have been well-established compared to equivalent techniques after assessing different aspects by solution of various IoMT scenarios.
According to the literature [69], the existing procedures can provide good outcomes in certain circumstances but not in others. As a result, there is a pressing need for a new strategy capable of dealing with a wide range of complicated issues. The "no free lunch theorem" inspired us to look for and develop a new approach to dealing with such complex challenges. This work provides a novel optimization approach for solving optimization issues. The suggested approach is known as GNDDMOA, and is based on the use of the fundamental methods of the Generalized Normal Distribution Optimization (GND) and Dwarf Mongoose Optimization Algorithm (DMOA), followed by the Opposition-based Learning Mechanism (OBL). The proposed methods follow the transition techniques by defining a condition that determines which technique will be used. This design is recommended to prevent the problem of rapid convergence while maintaining the diversity of potential solutions. The Opposition-based Learning (OBL) Mechanism is then activated in response to a transition technique circumstance. This phase is used to look for a new search area in order to prevent being stuck in the local search region. To validate the efficiency of the suggested strategy, two sets of experiments are used: twenty-three benchmark functions and eight data clustering challenges. The suggested methods' outcomes on the studied issues are compared to those of other well-known optimization approaches, including the Aquila Optimizer (AO), Ebola Optimization Search Algorithm (EOSA), Whale Optimization Algorithm (WOA), Sine Cosine Optimizer (SCA), Dragonfly Algorithm (DA), Grey Wolf Optimizer (GWO), Particle Swarm Optimizer (PSO), Reptile Search Algorithm (RSA), Arithmetic Optimization Algorithm (AOA), Generalized Normal Distribution (GND), and Dwarf Mongoose Optimization Algorithm (DMOA). The results showed that the suggested technique can identify new optimal solutions for both tested issues. It produced good results in terms of global search capabilities and convergence speed in all of the situations studied. The main contributions of this paper are given as follows.

•
A novel hybrid method is proposed to tackle the weaknesses of the original search methods, and is applied to solve various complicated optimization problems. • The proposed method is called GNDDMOA, which is based on using the original Generalized Normal Distribution Optimization (GND) and Dwarf Mongoose Optimization Algorithm (DMOA), followed by the Opposition-based Learning Mechanism (OBL). • The proposed GNDDMOA method was tested to solve twenty-three benchmark mathematical problems. Moreover, a set of eight data clustering problems was used to validate the performance of the GNDDMOA.
The remainder of this paper is organized as follows: The background and techniques of the algorithm are provided in Section 2. The suggested Generalized Normal Distribution Dwarf Mongoose Optimization Algorithm is demonstrated in Section 3. Section 4 contains the experimental details and analysis. The conclusion and future work direction are described in Section 5.

Generalized Normal Distribution Optimization (GND)
The following is the architecture of the classic Generalized Normal Distribution Optimization (GND) [23].

Inspiration
The standard distribution rule, which is a crucial mechanism for representing natural phenomena, was motivated by GNDO. The value of the distribution is calculated as follows: x performs a possibility distribution with area factor (µ) and balance parameter (δ), and its potential weight function is: Figure 1 shows the potential values for the utilized parameters (i.e., mu and delta) in Equation (1).

Local Search (Exploitation)
Based on the present placements of all solutions, local search contributes to positive solutions nearby the search space. Equation (2) represents the generalized distribution optimal for development.
where υ t i is the direction of monitoring of the ith solution at the tth iteration, µ i is the average value of the ith solution, δ i is the standard deviation value and η is the portion of the punishment. The values of µ i , δ i and η can be determined as follows.
where a, b, λ 1 and λ 2 are random numbers, x t best is the best obtained values, and M is the average of the candidate solutions. M is determined using Equation (6).

Global Search (Exploration)
Global search is a technique for exploring a search space worldwide in order to find promising locations, as seen below.
where λ 3 and λ 4 are handled by the normal distribution, β is random value, and ν 1 and ν 2 are two areas of values determined by Equation (8).

The Updating Mechanism of GND
The following mathematical depicts the GND's update process.
The GND technique is described in Algorithm 1.

Dwarf Mongoose Optimization Algorithm (DMOA)
The original Dwarf Mongoose Optimization Algorithm (DMOA) design is presented [22]. The suggested DMOA replicates the dwarf mongoose's compensating behavioral response, which is modeled as follows.

Alpha Group
The efficiency of each solution is calculated after the population has been initiated. Equation (11) calculates the likelihood value, and the alpha female is chosen based on this likelihood.
The n − bs relates to the number of mongooses in the α. Where, bs is the number of babysitters, peep is the vocalization of the dominant female that maintains the family on track [22]. The solutions updating mechanism is given as follows.
where ph i is a distributed random number. The sleeping mound is as provided in Equation (13) but after every repetition, where phi is a uniformly distributed random integer [1,1].
Equation (14) gives the average number of the sleeping mound discovered.
Once the babysitting exchange criterion is fulfilled, the algorithm advances to the scouting stage, when the next food supply or resting mound is considered.

Scout Group
In the scout group part, if the family forages quite far, they will come across a good sleeping mound. The scout mongoose is simulated by Equation (15).
where, rand is a random value in range [0, 1], CF value is calculated by Equation (16), and − → M value is calculated by Equation (17).
Babysitters are generally inferior group members that stay with the youngsters and are cycled on a routine basis to enable the alpha female (mother) to conduct the rest of the squad on daily hunting expeditions.

23:
Simulate the scout Mongoose for the next solution by using Equation (15).

Opposition-Based Learning (OBL) Mechanism
This section introduces the Opposition-based Learning Algorithm (OBL). It is utilized to create a new opposing solution based on the previous one [70].
In the OBL, an opposite solution (X O ) is presented as a real number. X ∈ [LB,UB] is determined by Equation (18).
The fitness function evaluates the two solutions (X O and X) during the optimization process. The best solution is identified, and the other solution is disregarded.

The Proposed Method (GNDDMOA)
This section introduces the suggested GNDDMOA (Generalized Normal Distribution Dwarf Mongoose Optimization Algorithm). Three basic search processes are employed to upgrade the alternatives in the suggested technique procedures. As a result of this strategy, the optimal solution will be more effective in locating a new search area and avoiding local optimum issues, such as premature, rapid, and sluggish convergence. Generalized Normal Distribution Optimization (GND), Dwarf Mongoose Optimization Algorithm (DMOA), and Opposition-based Learning (OBL) Strategies are the key procedures employed. The Standard Generalized Normal Distribution Optimization (GND) and Dwarf Mongoose Optimization Algorithm are used to discovering the best solution and improve their performance.
The Generalized Normal Distribution Optimization search methods are used in the first initialization step, followed by the Dwarf Mongoose Optimization Algorithm (DMOA) in the second initialization step, and the Opposition-based Learning (OBL) technique in the third iteration process. The Dwarf Mongoose Optimization Algorithm is put up in the second optimization period to ensure the GND by regulating the diversity of solutions and the consistency of the search methods (exploration and exploitation). The Opposition-based Learning mechanism aids the GND in the third iteration process, avoids the local optimum conundrum, and strengthens the suggested method's ability to uncover new search regions.
The primary techniques used in the proposed GNDDMOA method, which employs integrated search techniques, are depicted in Figure 2. The main proposed conditions are used to help handle the search process and avoid the main weaknesses of the original methods, such as being trapped in local optima and the balance between the optimization processes. The number of fitness evaluations is the same as the first method's criteria. Therefore one fitness evaluation is conducted per iteration. As a result, the suggested GNDDMOA performs one search for every repeat from the used technicians: GND, DMOA, or OBL. As a result, the suggested GNDDMOA is intended to address the core approaches' major flaws and inadequacies in order to identify plausible solutions to the presented optimization and data clustering challenges.

Complexity of the Proposed GGNDDMOA
The complexity of the proposed GGNDDMOA depends on the complexity of traditional GGN, DMOA, and OBL. The total complexity is given as: Therefore, the complexity of GGNDDMOA is given as: The best case of the proposed GGNDDMOA is as follows: The worst case of the proposed GGNDDMOA is as follows: where N K is the number of solutions.

Experiments and Results
This section presents the experiments that were conducted to test the performance of the proposed method and to compare it with other methods. The experiments are divided into two main parts: benchmark functions and data clustering problems.

Experiments 1: Benchmark Functions Problems
The findings of the functions that were tested, as well as their explanations, are presented in this section. The obtained GNDDMOA findings were compared to those of well-known optimization methods, such as Aquila Optimizer (AO) [25], Salp Cosine Algorithm (SSA) [71], Particle Swarm optimizer (PSO) [72], Generalized Normal Distribution (GND) [23], Ebola Optimization Search Algorithm (AOSA) [29], Dragonfly Algorithm (DA) [73], Reptile Search Algorithm (RSA) [27], Whale Optimization Algorithm (WOA) [74], Grey Wolf Optimizer (GWO) [75], Arithmetic Optimization Algorithm [24], and Dwarf Mongoose Optimization Algorithm (DMOA) [22]. The suggested method's performance was validated using the Friedman ranking test and the Wilcoxon ranking test. Using the Matlab program, Windows 10, and 16 GB RAM, all tests were performed 20 times [76] with the same number of iterations (1000). Table 1 shows the system parameters for the algorithms that were tested. Table 2 shows the outcomes of the test functions that were tested. Table 1. Parameter values of the tested algorithms.

No.
Algorithm Parameter Value Fully connected Cognitive and social constant (C1, C2) 2, 2 Inertia weight Linear reduction from 0.9 to 0.1 Velocity limit 10% of dimension range GWO Convergence parameter (a) Linear reduction from 2 to 0 11 AOA α 5 µ 0.5 Table 2. Details of the tested benchmark functions.

Function Description Dimensions
Range where Figure 3 shows the research function issues' qualitative findings (F1-F13). Each row has four key sub-figures: function topology, first-dimension trajectory, average fitness values, and convergence curves. In virtually all of the examined scenarios, it is evident that the recommended strategy provided the optimum result. The optimization technique is quite efficient, as evidenced by the trajectory of the selected dimension, which alters the position values substantially.

Test Function Problems
The population size is examined in Table 3 to determine the appropriate number of solutions to employ in the suggested technique. The best size was 50 since it received the highest ranking. As indicated in Table 4, the first 13 benchmark functions (F1-F13) were assessed using ten dimensions. Compared to existing comparable methodologies, the suggested GNDDMOA method yielded better results in this table. AO, EO, AOA, GWO, PSO, WOA, GND, SCA, SSA, ALO, and DA were placed second and third, respectively. Almost all of the examined functions yielded promising results using the suggested technique. We examined the first of the 13 benchmark functions and compared them to previous approaches-the suggested GNDDMOA method yielded more accurate results. The GNDDMOA suggested approach produced excellent or outstanding results in virtually all of the high-dimensional functions examined. Table 5 shows the results of the second ten benchmark functions (F14-F23). The suggested GNDDMOA approach also outperformed other comparable methods in this table. PSO, SSA, GWO, ALO, EO, AO, GND, DA, WOA, SCA, and AOA were placed second and third, respectively. In practically every function examined, the recommended technique yielded the best results. Furthermore, the suggested strategy outperformed the SSA, DA, and EO methods in the Wilcoxon ranking test. The Wilcoxon ranking test revealed that the suggested technique outperformed SSA, DA, EO, and GND in the first benchmark case (F1). The final ranking is presented in Figure 4.
The convergence behavior of the comparison approaches is shown in Figure 5 to depict the performance curves clearly. Specifically, the suggested GNDDMOA approach smoothly accelerated the best solutions ahead. It definitely found the best solution in all of the challenges it was tested on (F1-F23). Furthermore, most test scenarios indicated that the proposed GNDDMOA avoided the primary flaws previously identified, such as premature convergence. In addition, as in the previous four test instances, the convergence stability was clearly visible. As a consequence of the acquired data, we determined that the suggested approach functioned very well and produced highly comparable outcomes to those of traditional techniques and other well-established approaches.

Experiments 2: Data Clustering Problems
A second phase of experiments was carried out to tackle eight data clustering difficulties and is described in this section. Table 6 contains explanations of the data clustering challenges that were evaluated. The suggested GNDDMOA's findings were compared to those of well-known optimization methods, such as Aquila Optimizer (AO) [25], Particle Swarm optimizer (PSO) [72], Artificial Gorilla Troops Optimizer (AGTO) [77], Ebola Optimization Search Algorithm (EOSA) [29], Reptile Search Algorithm (RSA) [27], Generalized Normal Distribution (GND) [23], and Dwarf Mongoose Optimization Algorithm (DMOA) [22]. The suggested method's performance was validated using the Friedman ranking test and the Wilcoxon ranking test. Using Matlab software, Windows 10, and 16 GB RAM, all tests were performed 20 times with the same number of iterations (1000).

Results and Discussion
The results of the proposed GNDDMOA on data clustering issues are reported in this section. The results of the methods compared employing eight data clustering tasks are shown in Table 7. In solving real-world data clustering challenges, the suggested technique showed promising results. In all of the scenarios that were examined, it yielded the best outcomes. The suggested GNDDMOA was ranked #1 in the Friedman ranking test, followed by PSO, GWO, AO, AOA, AGTO, GND, WOA, and AOVA. Furthermore, the Wilcoxon ranking test revealed that the suggested technique outperformed AO, PSO, GWO, AVOA, WOA, and GND in the first dataset (Cancer). Tables 8-15 demonstrate the best values for the centroids achieved using the suggested approach.       The convergence behavior of the comparison algorithms employing the investigated data clustering issues is depicted in Figure 6. Specifically, the suggested GNDDMOA approach smoothly accelerated the best solutions ahead. It clearly achieved the best solution in all of the tested problems. In addition, the majority of the test scenarios showed that the proposed GNDDMOA avoided prior fundamental flaws, such as premature convergence. Convergence stability was also observed, just as it was in the initial test scenarios. As a consequence of the obtained findings, we determined that the proposed approach performed admirably and generated comparable outcomes to the original techniques and other well-established methods. The clustering plot pictures produced by the proposed GNDDMOA are shown in Figure 7, where each dataset was examined using a different number of clusters (i.e., K 2, 4, and 8).
We chose this original method in this study as it has demonstrated its search ability in solving many challenging optimization problems. This is one of the most recent proposed methods not investigated in this domain. The main motivation bind using a new operator in the proposed method was to avoid the observed weaknesses in the original method and to make it more efficient during the optimization process.
The suggested GNDDMOA approach has a strong capacity to discover an appropriate solution to different optimization issues and data clustering, as evidenced by the previous findings and discussion. When the performance of GNDDMOA was compared to that of the classic DMOA approach, it was clear that GND and OBL had a significant impact on the capacity to balance exploration and exploitation, as seen by the excellent quality of the final solution. However, because it relies on OBL to boost processing time, the created approach still needs considerable refinement, particularly in time computation.

Conclusions and Potential Future Work
Recent advances in data volumes and the growth of complexity in tackling vast and complicated problems have necessitated advanced and intelligent technologies to address these issues. These approaches are usually modified procedures that enable them to cope with complex issues. Data clustering is one of the most frequent applications in the data mining industry. It is used to split a large number of data items into numerous clusters, each with several instances. The clustering method's fundamental goal is to discover coherent clusters, with each group containing related items.
This research offers a fresh and inventive way of solving a collection of issues that require sophisticated methods to solve, based on a set of operators from several intelligent optimization algorithms. Three primary components are employed in the suggested technique (GNDDMOA) based on a unique transition mechanism to organize the executions of the used methods throughout the optimization process to address the significant flaws of the original methods. Dwarf Mongoose Optimization Algorithm (DMOA), Generalized Normal Distribution Optimization (GNF), and Opposition-based Learning Strategy are three of these strategies (OBL). The suggested transition method is utilized to implement the primary components that have been used. The suggested strategy is intended to solve the issue of premature coverage and unbalanced search strategies. The suggested method's performance was validated using twenty-three benchmark functions and eight data clustering challenges. The proposed method's results were compared to several other well-known methods. The suggested GNDDMOA approach produced the best results in benchmark functions and data clustering challenges in all of the evaluated scenarios. In comparison to the previous comparative methodologies, it produced good results.
The proposed method can solve other complex optimization problems in the future, such as condition monitoring, classification tasks, parameter selection, extraction of features, design issues, text grouping problems, packet headers, repairs and rehabilitation planning, and extensive medical data scheduling. In addition, a thorough examination of the suggested approach may be carried out to determine the primary reasons for the failure to identify the best solution in all circumstances.