A New Scalable, Distributed, Fuzzy C-Means Algorithm-Based Mobile Agents Scheme for HPC: SPMD Application

: The aim of this paper is to present a mobile agents model for distributed classiﬁcation of Big Data. The great challenge is to optimize the communication costs between the processing elements (PEs) in the parallel and distributed computational models by the way to ensure the scalability and the efﬁciency of this method. Additionally, the proposed distributed method integrates a new communication mechanism to ensure HPC (High Performance Computing) of parallel programs as distributed one, by means of cooperative mobile agents team that uses its asynchronous communication ability to achieve that. This mobile agents team implements the distributed method of the Fuzzy C-Means Algorithm (DFCM) and performs the Big Data classiﬁcation in the distributed system. The paper shows the proposed scheme and its assigned DFCM algorithm and presents some experimental results that illustrate the scalability and the efﬁciency of this distributed method.


Introduction
Computer science technologies have introduced several intensive data application-based complex tasks in different domains (Internet of Things (IoT), cloud computing, data mining, Big Data analysis), etc., in order to improve HPC (High Performance Computing).
Consider the large amount of data and the complex tasks that these applications have to process.Their scalability and efficiency depends on their abilities to manage these considerations.They also depend on the processing environment where they are deployed.For example, in the medical domain, performing an application for MRI (magnetic resonance imaging) image cerebral analysis-based clustering algorithms.It involves a wide number of data to be processed by this application and that requires great processing power to achieve HPC.
Clustering algorithms are widely used in the medical field to analyze, and diagnose and detect abnormal regions based on MRI image classification.However, these features require using high performance computational models that grant the efficiency and the flexibility with the most complex clustering algorithms such as the Fuzzy C-Means Algorithm.So, how can we implement these requirements in parallel and distributed computational model-based distributed system?Consider the great challenge of optimizing the communication cost in distributed computational models.We will present a cooperative computational processing model that achieves these computational requirements.This paper is organized as follows:

‚
We provide the model of parallel and distributed computing where the distributed DFCM method is assigned to be implemented (Section 3).
The cooperative computational model where the proposed (DFCM) method is assigned to be implemented is a parallel and distributed virtual machine based on mobile agents.This machine is built over a distributed computing grid of size (me ˆne) of mobile agents.In this grid (Figure 1), the mobile agents are arranged on a 2D matrix (me ˆne) as agent virtual processing elements (AVPEs) according to SPMD (Single Program Multiple Data) architecture.Each AVPE (i,j) is localized in row i and column j and has an identifier AID (agent identifier) defined by AID = me ˆi + j.These AVPEs emulate the PEs (processing elements) in a parallel computing grid.In this model we consider that the asynchronous communication between the AVPEs by exchanging ACL messages have a great benefit on reducing the communication cost involved by the PEs.

Cooperative Mobile Agent Virtual Element (AVPE) Model
The distributed classification is performed by the implementation of the proposed DFCM method on a cooperative mobile agents team works model as illustrated in Figure 2.This model is composed of the Team Leader agent and the team worker agents (AVPEs).When the DFCM program is implemented in the model, the mobile Team Leader agent receives the MRI input image and splits it into (me × ne) elementary images, as shown in Figure 2a.Then it deploys a set of AVPEs of size (me × ne), and encapsulates the classification tasks and the elementary image per AVPE (Figure 2b).The AVPEs in Figure 2c perform the distributed classification on their assigned elementary image and send the results to their mobile Team Leader agent who, later in Figure 2d, computes the final results and assembles all elementary segmented images in order to display the output image results of the classification.

Standard Fuzzy C-Means Algorithm
The well-known clustering algorithm named the fuzzy c-means (FCM) is proposed by Dunn [8] and extended by Bezdek [9].It is a clustering method that allows one pixel of the segmented image to belong to two or more clusters, each one with a different membership degree between 0 and 1.The main goal of the FCM algorithm is to find the c-cluster centers (centroids) in the data set X = {x1, x2, …, xN} that minimizes the objective function given by the following equation: The membership matrix U has the properties:

Cooperative Mobile Agent Virtual Element (AVPE) Model
The distributed classification is performed by the implementation of the proposed DFCM method on a cooperative mobile agents team works model as illustrated in Figure 2.This model is composed of the Team Leader agent and the team worker agents (AVPEs).When the DFCM program is implemented in the model, the mobile Team Leader agent receives the MRI input image and splits it into (me ˆne) elementary images, as shown in Figure 2a.Then it deploys a set of AVPEs of size (me ˆne), and encapsulates the classification tasks and the elementary image per AVPE (Figure 2b).The AVPEs in Figure 2c perform the distributed classification on their assigned elementary image and send the results to their mobile Team Leader agent who, later in Figure 2d, computes the final results and assembles all elementary segmented images in order to display the output image results of the classification.

Cooperative Mobile Agent Virtual Element (AVPE) Model
The distributed classification is performed by the implementation of the proposed DFCM method on a cooperative mobile agents team works model as illustrated in Figure 2.This model is composed of the Team Leader agent and the team worker agents (AVPEs).When the DFCM program is implemented in the model, the mobile Team Leader agent receives the MRI input image and splits it into (me × ne) elementary images, as shown in Figure 2a.Then it deploys a set of AVPEs of size (me × ne), and encapsulates the classification tasks and the elementary image per AVPE (Figure 2b).The AVPEs in Figure 2c perform the distributed classification on their assigned elementary image and send the results to their mobile Team Leader agent who, later in Figure 2d, computes the final results and assembles all elementary segmented images in order to display the output image results of the classification.

Standard Fuzzy C-Means Algorithm
The well-known clustering algorithm named the fuzzy c-means (FCM) is proposed by Dunn [8] and extended by Bezdek [9].It is a clustering method that allows one pixel of the segmented image to belong to two or more clusters, each one with a different membership degree between 0 and 1.The main goal of the FCM algorithm is to find the c-cluster centers (centroids) in the data set X = {x1, x2, …, xN} that minimizes the objective function given by the following equation: The membership matrix U has the properties:

Standard Fuzzy C-Means Algorithm
The well-known clustering algorithm named the fuzzy c-means (FCM) is proposed by Dunn [8] and extended by Bezdek [9].It is a clustering method that allows one pixel of the segmented image to belong to two or more clusters, each one with a different membership degree between 0 and 1.The main goal of the FCM algorithm is to find the c-cluster centers (centroids) in the data set X = {x 1 , x 2 , . . ., x N } that minimizes the objective function given by the following equation: Computers 2016, 5, 14

of 16
The membership matrix U has the properties: where Centroid of the cluster i. d `vi , x j ˘Euclidian distance between centroid (V i ) and data point x j .m P r1, 8r Fuzzification parameter generally equals 2. N Number of data.c Number of clusters 2 ď c < N.
To reach a minimum of dissimilarity function there are two conditions: The standard FCM classification is achieved according to the following algorithm stages, which are summarized in Figure 3.
where (3) The standard FCM classification is achieved according to the following algorithm stages, which are summarized in Figure 3.

Distributed Fuzzy C-Means Algorithm
The distributed algorithm described in Figure 4 is implemented in the proposed model based on a multi-agent system.The fuzzy c-means program is implemented according to SPMD architecture over a 2D Mesh of size (me × ne).In this model each AVPE(a) ((a = 1 to NA), where NA =

Distributed Fuzzy C-Means Algorithm
The distributed algorithm described in Figure 4 is implemented in the proposed model based on a multi-agent system.The fuzzy c-means program is implemented according to SPMD architecture over a 2D Mesh of size (me ˆne).In this model each AVPE(a) ((a = 1 to NA), where NA = (me ˆne) is the number of AVPEs), is asked to perform the fuzzy c-means program using its assigned elementary image and return its elementary results to the mobile Team Leader agent.This later computes the current global class centers and newly distributes them.This process is repeated until the convergence of the distributed algorithm.This DFCM program is performed according to the three global distributed method steps: Computers2016, 5, 14 5 of 15 the convergence of the distributed algorithm.This DFCM program is performed according to the three global distributed method steps: Step 1. Mobile Team Leader Agent Initialization In this step the Team Leader agent is initialized by the input MRI image and the values of me and ne, to define the AVPE grid size (me, ne).

of 16
In this step the Team Leader agent is initialized by the input MRI image and the values of me and ne, to define the AVPE grid size (me, ne).
Step 2. Grid Construction The Team leader agent splits the input image into (me ˆne) elementary images and deploys (me ˆne) AVPEs.Then, each AVPE(a) is initialized and migrates to its appropriate node to perform the classification task on its elementary image.
Step 3. Fuzzy C-Means Classification ‚ Each AVPE(a) encapsulates the task and load its elementary image.‚ For each iteration t { 1.
The Team Leader agent sends the class centers to all the AVPEs.2.
Each AVPE(a) gets the class centers from the message and executes the local class determination task.

3.
Each AVPE(a) returns its classification elementary results, which consist of the following terms: TE1(a,i), TE2(a,i), TE3(a), and Cardinal(a,i). where: TE1(a,i) contains the result of the sum of (U m ˆdata) computed for each class center i.This term is computed by: TE2(a,i) contains the result of the sum of (U m ) computed for each class center i computed by: TE3(a) contains the result of the sum of (U m ˆdistance 2 ) computed for all classes.This term is computed by: Cardinal(a,i) contains the result of the sum of pixel membership for each class center i.This term is computed by: pi: Number of pixels of the elementary image of the AVPE(a).

4.
The Team Leader agent performs these three sub tasks: assembling the elementary results, computing the new class centers, and computing the objective function J t .
Assembling the elementary resultsThe Team Leader agent receives the elementary results (TE1(a,i), TE2(a,i), TE3(a), Cardinal(a,i)) from each AVPE(a) and assembles them in order to compute the global values (GTE1(i), GTE2(i), GTE3(i), GC(i)), respectively, by the given equations: GTE2 piq " GTE3 piq " where: is the global value of TE1(a,i) over the AVPEs of the grid.GTE2(i) is the global value of TE2(a,i) over the AVPEs of the grid.GTE3(i) is the global value of TE3(a) over the AVPEs of the grid.GC(i) is the global value of Cardinal(a,i) over the AVPEs of the grid.
Computing the global class centersThe Team Leader agent gets the computed global values (GTE1(i), GTE2(i)) to compute the new class centers V i by the following equation: Computing the objective function J t :The Team leader agentgets the global value of GTE3(i) to compute the objective function given by the following equation: GTE3 piq (14)

5.
The Team Leader agent tests the condition of the algorithm convergence (|J t ´J(t´1) |<E th ).} // End of iteration t ‚ The Team Leader agent requests to each AVPE(a) the segmented elementary image.‚ Each AVPE(a) sends the segmented elementary image result to the Team Leader agent.‚ The Team Leader agent assembles the segmented elementary images and displays the segmented output image.

Distributed Environment Communication Mechanisms
A distributed computing environment, as illustrated in Figure 5, presents how the proposed method can perform the distributed programs thanks to the several mobile agent skills: mobility, autonomy, adaptability, and asynchronous communication.To illustrate the main idea of this contribution we present in Figure 6 an example of 2D mesh of size (4 ˆ4).Each AVPE in the grid computing has an agent message queue.When the Team Leader agent sends the computing data to their AVPEs team, the messages are stored in their queues.Thus, the AVPEs and the Team Leader agent can perform their assigned tasks and communicate between each other without the need of acknowledgments.Assume that each AVPE(a) needs to check the data in its queue and manage the computing time and the communication time.In [10], the authors implemented a distributed c-means method DCM in a distributed environment that includes an excellent agent communication management mechanism.This mechanism ensures asynchronous communication between the agents, which significantly reduces the communication cost and improves HPC.Thus, this mechanism is integrated in the proposed DFCM model to implement a scalable and efficient DFCM method.
A distributed computing environment, as illustrated in Figure 5, presents how the proposed method can perform the distributed programs thanks to the several mobile agent skills: mobility, autonomy, adaptability, and asynchronous communication.To illustrate the main idea of this contribution we present in Figure 6 an example of 2D mesh of size (4 × 4).Each AVPE in the grid computing has an agent message queue.When the Team Leader agent sends the computing data to their AVPEs team, the messages are stored in their queues.Thus, the AVPEs and the Team Leader agent can perform their assigned tasks and communicate between each other without the need of acknowledgments.Assume that each AVPE(a) needs to check the data in its queue and manage the computing time and the communication time.In [10], the authors implemented a distributed c-means method DCM in a distributed environment that includes an excellent agent communication management mechanism.This mechanism ensures asynchronous communication between the agents, which significantly reduces the communication cost and improves HPC.Thus, this mechanism is integrated in the proposed DFCM model to implement a scalable and efficient DFCM method.

Cooperative Multi-Agent Middleware
JADE (Java Agent DEvelopment) [1] is a middleware for developing the distributed multi-agent system (MAS), which is based on JAVA.It supports the agent asynchronous communication mechanisms by using the ACL messages according to the FIPA-ACL message specifications [1].The cooperative DFCM model is implemented on a parallel and distributed virtual machine based on mobile agents according to the architecture in Figure 7.
The classification of the MRI image is performed on this platform by applying this middleware.It creates the main components of this model, which are: (1) The host container: this is the second container which is started in the platform after the main container, where the mobile team leader agent is deployed in order to perform its tasks in the grid.
(2) The agent containers: these arethe containers that are started in the platform, where the mobile team worker agents will move to perform their tasks.

Cooperative Multi-Agent Middleware
JADE (Java Agent DEvelopment) [1] is a middleware for developing the distributed multi-agent system (MAS), which is based on JAVA.It supports the agent asynchronous communication mechanisms by using the ACL messages according to the FIPA-ACL message specifications [1].The cooperative DFCM model is implemented on a parallel and distributed virtual machine based on mobile agents according to the architecture in Figure 7.
The classification of the MRI image is performed on this platform by applying this middleware.It creates the main components of this model, which are: Computers 2016, 5, 14 9 of 16 (1) The host container: this is the second container which is started in the platform after the main container, where the mobile team leader agent is deployed in order to perform its tasks in the grid.(2) The agent containers: these are the containers that are started in the platform, where the mobile team worker agents will move to perform their tasks.
mechanisms by using the ACL messages according to the FIPA-ACL message specifications [1].The cooperative DFCM model is implemented on a parallel and distributed virtual machine based on mobile agents according to the architecture in Figure 7.
The classification of the MRI image is performed on this platform by applying this middleware.It creates the main components of this model, which are: (1) The host container: this is the second container which is started in the platform after the main container, where the mobile team leader agent is deployed in order to perform its tasks in the grid.
(2) The agent containers: these arethe containers that are started in the platform, where the mobile team worker agents will move to perform their tasks.

Implementation and Results
The proposed DFCM algorithm is implemented in this model for MRI medical image analysis.To do so, we choose two cerebral MRI images: brain MRI image (Img1) in Figure 8 and an abnormal brain MRI image (Img2) in Figure 9.Each image in Figure 8a is encapsulated on the team leader agent as an input image in order to be split into elementary images, as in Figure 8b.At the end of the classification process these images will be segmented into c output images (Figure 8c-e) that correspond, respectively, to the grey matter, the cerebrospinal fluid, and the white matter.The same algorithm is performed for the second image of Figure 9a to detect the abnormal region.

Implementation and Results
The proposed DFCM algorithm is implemented in this model for MRI medical image analysis.To do so, we choose two cerebral MRI images: brain MRI image (Img1) in Figure 8 and an abnormal brain MRI image (Img2) in Figure 9.Each image in Figure 8a is encapsulated on the team leader agent as an input image in order to be split into elementary images, as in Figure 8b.At the end of the classification process these images will be segmented into c output images (Figure 8c-e) that correspond, respectively, to the grey matter, the cerebrospinal fluid, and the white matter.The same algorithm is performed for the second image of Figure 9a to detect the abnormal region.(a) (c1, c2, c3) = (1.1,2.5, 3.8): in Table 1 and in Figure 10, where we see clearly the dynamic convergence of the algorithm to the final class centers (c1, c2, c3) = (1.100,97.667, 146.569).
The convergence is achieved after 13 iterations.2 and Figure 13. (3) The DFCM classification time according to the number of agents involved in the classification for the initial class centers (c1, c2, c3, c4, c5) = (1.5, 2.2, 3.8, 5.2, 8.6) for Img1, and (c1, c2, c3, c4, c5) = (1.5, 2.2, 3.8, 5.2, 8.6) for Img2.In Figure 14 we see clearly that from 16 agents the classification time of the two images achieves minimum values of 108 ms for Img1 and of 278 ms for Img2.Thus, it is considered as the appropriate number of agents needed to classify these images.
A detailed comparison between the FCM and the DFCM methods is made in Table 3 for each image.Both methods converge to the same values of the class centers.We see clearly that the classification time of the DFCM method corresponds to the FCM for one agent and by adding the number of agents the DFCM method performs a reduced classification time which achieves its minimum values from 16 AVPEs for both images. (4) The DFCM classification time according to the number of nodes in the grid computing by considering 16 AVPEs for the two images (Img1) and (Img2).In Figure 15, we see clearly that for both images the classification time achieves a gain of time of about 78% using eight nodes, compared to using just one.The corresponding classification data size for each node is illustrated in Figure 16.(5) The speedup S(DFCM), its relative speedup S R (DFCM), and the efficiency of the DFCM classification method are presented, respectively, in Figures 17 and 18, compared to the sequential FCM method.We perform interesting maximum relative speedups of 86.760% for Img1, which corresponds to again of 7.55, and of 82.372% for Img2, which corresponds to again of 5.67, by using 32 AVPEs.The speedup S(DFCM) and the relative speedup S R (DFCM) are illustrated in Table 4 and computed, respectively, by the following equations: where: T(FCM)is the classification time of the FCM method which corresponds to one agent; and T(DFCM) is the classification time of the DFCM method which corresponds to the number NA of agents.

Related Work
There are several inspiring parallel methods of the clustering algorithm on massively parallel computational models which have demonstrated interesting clustering results.In [1], the authors proposed a parallel FCM implementation on a parallel architecture, based on dividing the computation among the processors for image segmentation analysis.Their proposed method presents a great improvement of the performance and the efficiency of the image segmentation as compared to the sequential implementation.In [11], the authors proposed a parallel FCM implementation for clustering large datasets, which is designed to run on a parallel SPMD architecture using MPI tools.Their proposed implementation achieves ideal speedups as compared

Related Work
There are several inspiring parallel methods of the clustering algorithm on massively parallel computational models which have demonstrated interesting clustering results.In [1], the authors proposed a parallel FCM implementation on a parallel architecture, based on dividing the computation among the processors for image segmentation analysis.Their proposed method presents a great improvement of the performance and the efficiency of the image segmentation as compared to the sequential implementation.In [11], the authors proposed a parallel FCM implementation for clustering large datasets, which is designed to run on a parallel SPMD architecture using MPI tools.Their proposed implementation achieves ideal speedups as compared

Related Work
There are several inspiring parallel methods of the clustering algorithm on massively parallel computational models which have demonstrated interesting clustering results.In [1], the authors proposed a parallel FCM implementation on a parallel architecture, based on dividing the computation among the processors for image segmentation analysis.Their proposed method presents a great improvement of the performance and the efficiency of the image segmentation as compared to the sequential implementation.In [11], the authors proposed a parallel FCM implementation for clustering large datasets, which is designed to run on a parallel SPMD architecture using MPI tools.Their proposed implementation achieves ideal speedups as compared

Related Work
There are several inspiring parallel methods of the clustering algorithm on massively parallel computational models which have demonstrated interesting clustering results.In [1], the authors proposed a parallel FCM implementation on a parallel architecture, based on dividing the computation among the processors for image segmentation analysis.Their proposed method presents a great improvement of the performance and the efficiency of the image segmentation as compared to the sequential implementation.In [11], the authors proposed a parallel FCM implementation for clustering large datasets, which is designed to run on a parallel SPMD architecture using MPI tools.
Their proposed implementation achieves ideal speedups as compared to an existing parallel c-means implementation.The classification becomes the main core of wide number of researchers in different fields: in [12], for decision tree induction, fuzzy rule-based classifiers [13,14], neural networks [15,16], and clustering [1,17].
The different parallel methods differ from each other by the computational models which are assigned to be implemented.Their implementations depend on the parallel computing strategies [18] used and that present some challenges: high cost of the machines, limitation on the test and validation of new algorithms, and they also involve a great number of computational resources that increases the communication cost between the processors in grid computing.The proposed distributed method within this contribution grants a scalable and efficient implementation on computational model-based mobile agents.In [10] the authors improved the parallel c-means method [19] by the use of the distributed one.
So, thanks to these several interesting works, the proposed DFCM method is implemented on a scalable and efficient mobile agents model.The JADE middleware-based mobile agents model is the easiest and the most suitable solution to implement this distributed method.

Figure 5 .
Figure 5. Sequence diagram for DFCM on a cooperative multi-agent model.Figure 5. Sequence diagram for DFCM on a cooperative multi-agent model.

Figure 8 .
Figure 8. Classification results by the elaborated DFCM model implementation for Img1.Figure 8. Classification results by the elaborated DFCM model implementation for Img1.

Figure 8 .
Figure 8. Classification results by the elaborated DFCM model implementation for Img1.Figure 8. Classification results by the elaborated DFCM model implementation for Img1.

Figure 8 .
Figure 8. Classification results by the elaborated DFCM model implementation for Img1.

Figure 9 .
Figure 9. Classification results by the elaborated DFCM Model implementation for Img2.

Figure 9 .
Figure 9. Classification results by the elaborated DFCM Model implementation for Img2.

Figure 16 .
Figure 16.DFCM classification data depending on the number of nodes in the grid using 16 AVPEs (a) for Img1; (b) for Img2.

Figure 16 .Figure 16 .
Figure 16.DFCM classification data depending on the number of nodes in the grid using 16 AVPEs (a) for Img1; (b) for Img2.

Table 1 .
Different states of the distributed fuzzy c-means (DFCM) algorithm forImg1 classification starting from different class centers initialization.

Table 2 .
Different states of the distributed fuzzy c-means (DFCM) algorithm for Img2 classification starting from class centers initialization.

Table 3 .
FCM and DFCM method comparison for classification of two images (Img1and Img2).

Table 4 .
Speedup of the distributed fuzzy c-means (DFCM) method for Img1 and Img2.

Table 4 .
Speedup of the distributed fuzzy c-means (DFCM) method for Img1 and Img2.

Table 4 .
Speedup of the distributed fuzzy c-means (DFCM) method for Img1 and Img2.