Robustness of Cloud Manufacturing System Based on Complex Network and Multi-Agent Simulation

Cloud manufacturing systems (CMSs) are networked, distributed and loosely coupled, so they face great uncertainty and risk. This paper combines the complex network model with multi-agent simulation in a novel approach to the robustness analysis of CMSs. Different evaluation metrics are chosen for the two models, and three different robustness attack strategies are proposed. To verify the effectiveness of the proposed method, a case study is then conducted on a cloud manufacturing project of a new energy vehicle. The results show that both the structural and process-based robustness of the system are lowest under the betweenness-based failure mode, indicating that resource nodes with large betweenness are most important to the robustness of the project. Therefore, the cloud manufacturing platform should focus on monitoring and managing these resources so that they can provide stable services. Under the individual server failure mode, system robustness varies greatly depending on the failure behavior of the service provider: Among the five service providers (S1–S5) given in the experimental group, the failure of Server 1 leads to a sharp decline in robustness, while the failure of Server 2 has little impact. This indicates that the CMS can protect its robustness by identifying key servers and strengthening its supervision of them to prevent them from exiting the platform.


Introduction
In the era of Industry 4.0, advanced information technology such as cloud computing and the Internet of Things has brought profound changes to the manufacturing industry. Li et al. [1] conceptualized a new service-oriented networked manufacturing model known as "cloud manufacturing", which aggregates manufacturing resources and capabilities into the cloud platform, and fully realizes the sharing of manufacturing resources and capabilities through service integration [2]. This concept has received much attention from academia and enterprises, and a great amount of research has already been carried out on various aspects of cloud manufacturing, including its hierarchical structure [3,4], typical features [5], key technologies [5][6][7][8][9][10][11][12], operation modes [13][14][15][16] and service portfolio scheduling [17][18][19][20]. The cloud manufacturing system (CMS) is networked, distributed and loosely coupled; this creates great uncertainty and interference [21], which is an important issue that the CMS must face and solve. Zhang et al. [22] and Zhu et al. [23] argued that the development of cloud manufacturing is restricted by a lack of trust and security, and blockchain technology provides new ideas for overcoming such restrictions due to its reliability, tamper-proof nature, traceability and high transparency. Further, Laili et al. [24] stated that orders of different tasks affect the CMS. As such, the allocation and scheduling capability of the CMS when facing multiple tasks [25][26][27] is an important component when considering the robustness of the system. Wang et al. [28] studied the impact of service anomalies on the CMS, proposing a dynamic service composition reconfiguration model when anomalies occur. Liang et al. [29] stated that the complex demands of consumers and the changes in the and manufacturing intelligent simulation. Various other perspectives have also been studied, such as cloud service entity packaging [37,38], selection and scheduling [39][40][41], and trust and security issues [23]. However, despite its importance, simulation-based research on the robustness of the CMS remains rare.
This study proposes a robustness analysis method that combines complex networks with multi-agent simulation to analyze the robustness of the CMS from two perspectives: a static structure and dynamic process. The remainder of the paper is organized as follows. In Section 2, a multi-agent simulation model of the CMP is constructed. Here, the behavioral characteristics and models of several key agents in the CMP are given, and QoS is proposed as a robustness measure. Section 3 explores the robustness of the static topology of the cloud manufacturing network (CMN). Here, the complex network model of cloud manufacturing resources is established through both the order-task relationship and the task-resource relationship, and network efficiency and the largest connected subgraph are proposed as robustness measures. In Section 4, the robustness attack strategies are designed, where a degree-based resource failure mode (ID), betweenness-based resource failure mode (IB) and individual server-based resource failure mode are proposed. In Section 5, a case study of a cloud manufacturing project is presented, and its robustness is studied under different failure modes by combining the multi-agent simulation software Anylogic and Python 3.0 tools. Section 6 provides the conclusions and prospects of this paper.

Multi-Agent Simulation Model Construction
The CMS includes the cloud platform, cloud task, cloud resource, cloud message, order and other types of subjects, as well as two types of user role: cloud service providers and cloud demanders [3]. As shown in Figure 1, the CMP [1] broadly includes the following: (1) Cloud service providers unify various types of manufacturing equipment resources and manufacturing capability resources into the cloud platform, depositing them into the cloud resource pool through information transformation, resource sensing, resource access, unified modeling of cloud services and other technology. This bypasses the limitations of space and distance by enabling resources that are originally distributed across the world to be centrally managed and shared. (2) Cloud demanders submit service requirements (i.e., orders) to the cloud platform through terminal devices. Orders from multiple cloud demanders are uniformly stored in the cloud demand set, waiting to be processed. (3) According to the service route of the order to be processed, the cloud platform integrates and adapts different cloud tasks to form orderly and stable cloud task sequences. (4) When the cloud demand set is not empty, the platform imports each order into the corresponding cloud task sequence, in turn, to carry out cloud manufacturing services. When the cloud tasks are being processed, the corresponding resources are requested from the resource pool according to the task type. Resources in an idle state change to a busy state after being requested. After the task is completed, the resource is released and returned to an idle state.
The CMS contains multiple entity types, and various forms of information transmission and behavior interaction occur between the same entities and different entities. Therefore, the CMS model can be expressed as follows: where PA is the cloud platform agent; DA is the cloud demander agent; SA is the cloud service agent; TA is the cloud task agent; RA is the cloud resource agent; OA is the order agent issued by the DA; MA is the message agent sent to the SA when the TA requests or releases resources; and E is the external environment of information transmission and behavior interaction among entities. The CMS contains multiple entity types, and various forms of information transmission and behavior interaction occur between the same entities and different entities. Therefore, the CMS model can be expressed as follows: where is the cloud platform agent; is the cloud demander agent; is the cloud service agent; is the cloud task agent; is the cloud resource agent; is the order agent issued by the ; is the message agent sent to the when the requests or releases resources; and is the external environment of information transmission and behavior interaction among entities.

Cloud Platform Modeling
The cloud platform is the center of the CMS. The cloud demander sends orders to the cloud platform, and the platform assigns these orders to the corresponding tasks. Throughout the CMP, the platform records any processing successes and failures, and at the end of the service cycle, relevant performance indicators (e.g., service time, cost, reliability, order completion rate) are calculated. Additionally, the platform carries out a variety of roles, such as model parameter initialization and experimental parameter adjustment. The cloud platform agent can be expressed as follows: where stores orders sent by each cloud demander that are to be processed; ℎ records successfully completed orders; records failed orders; is the number of failed resources; is used to initialize model parameters; assigns orders to their respective corresponding cloud tasks for processing; sets the specified resource of the specified server to the failure state according to the node failure mode; counts data related to service time, cost, reliability and the order completion rate at the

Cloud Platform Modeling
The cloud platform is the center of the CMS. The cloud demander sends orders to the cloud platform, and the platform assigns these orders to the corresponding tasks. Throughout the CMP, the platform records any processing successes and failures, and at the end of the service cycle, relevant performance indicators (e.g., service time, cost, reliability, order completion rate) are calculated. Additionally, the platform carries out a variety of roles, such as model parameter initialization and experimental parameter adjustment. The cloud platform agent can be expressed as follows: PA =< tobeProcessedOrderList, f inishedOrderList, f ailedOrderList, attackNum, Func ini , Func allocationOrders , Func settingFaultStatus , Func calcuQos , Func outputNetwork > (2) where tobeProcessedOrderList stores orders sent by each cloud demander that are to be processed; f inishedOrderList records successfully completed orders; f ailedOrderList records failed orders; attackNum is the number of failed resources; Func ini is used to initialize model parameters; Func allocationOrders assigns orders to their respective corresponding cloud tasks for processing; Func settingFaultStatus sets the specified resource of the specified server to the failure state according to the node failure mode; Func calcuQos counts data related to service time, cost, reliability and the order completion rate at the end of simulation, then integrates these to calculate the QoS index; and Func outputNetwork sorts the order-task relationship and task-resource relationship of the processed orders into a node-list form and outputs it, to be used to construct the complex network model.

Cloud Resource Modeling
Cloud resources are the virtual resources formed by integrating the manufacturing equipment resources and manufacturing capability resources of service providers into the cloud resource pool through information transformation, resource access, cloud service unified modeling and other technology. The main function of the cloud resource agent is to cooperate with cloud tasks to complete the processing of cloud orders: where ID is the unique identifier number of the resource; produceLevel is the productivity level of the resource, which is defined as an integer from 1-10; busy indicates whether the resource is in a busy state; broken indicates whether the resource is faulty; owner specifies which cloud server the resource belongs to; and price represents the cost of the resource, which is randomly generated with normal distribution during model initialization.

Cloud Task Modeling
The construction of the cloud task agent is key to cloud manufacturing simulation modeling. It covers not only the processing path of all order types (e.g., serial, parallel, hybrid path) but also the behavior interaction and information transfer between the cloud server agent and the cloud resource agent. In addition, the cloud task agent formulates (a) the selection mechanism of the optimal service provider, (b) various statistical data, such as service cycle and cost, and (c) cloud task and cloud resource node information. To achieve this, existing process modeling library components are adapted accordingly. The cloud task agent can be expressed as follows: TA =< ID, ownerOrders, pretaskList, a f tertaskList, requeresourceList, basicWorkingTime, currentOrder, Func selectBestServer , Func selectBestResource , Func recordRouteStamp , Func recordTaskTime , Func recordTaskCost , Func recordTaskReliability > (4) where ID is the unique identification number of the task; ownerOrders specifies which type of order processing path the task belongs to; pretaskList and a f tertaskList specify the pre-order task and post-order task, respectively; requeresourceList specifies the type of resource requested by the task; basicWorkingTime specifies the standard time for completing the task; currentOrder represents the order currently being processed; Func selectBestServer determines the optimal server based on resource price, logistics, distance and other factors; Func selectBestResource determines the optimal resource; Func recordRouteStamp records the order-task relationship and task-resource relationship for completed orders; and Func recordTaskTime , Func recordTaskCost and Func recordTaskReliability record the service time, service cost and service reliability of the current task, respectively. Figure 2 shows the detailed CMP simulation inside the cloud task agent, which is realized by editing and adapting the existing component codes from Anylogic's process modeling library. The details of this process are as follows: (1) The order is imported into the internal process of the cloud task through the enter component. If the current task is first in the task sequence, the order is directly assigned by the cloud platform; otherwise, the order is assigned by the preceding task after its completion (e.g., task 2 orders are assigned from task 1 once task 1 has been completed). (2) The queue component temporarily stores the current order while the following judgments are made: (a) if the current task is first in the task sequence or there is only one task in the preceding task sequence, the hold and hold1 components are simultaneously opened and the current order is entered into queue2 for subsequent processing; or (b) if there is more than one task in the preceding task sequence, the current order must wait until the orders of all preceding tasks have completed before entering queue2 for subsequent processing (3) The queue1 component merges the information of several branch orders, and the hold2 component ensures that only one order is entered for subsequent processing at a time.
When the current order is completed and exits through exit, hold2 opens again and continues to serve the next order. (4) The order enters queue3, where the task agent selects the optimal server and sends "resource request" information to it. When the optimal service provider accepts the request, the busy attribute corresponding to the optimal resource changes to "true", and the hold3 component opens. The order flows through the delay component to simulate the cloud manufacturing service. After a certain delay time, the service is completed. (5) The order enters queue4 and continues to send the "release resource" message to the optimal server. When the optimal server accepts the message, the busy attribute corresponding to the optimal resource changes to "false", and the hold4 component opens. The order flows through the delay1 component. After a certain delay time, the release of the resource is completed. and (c) if there is no post-order task, this signifies that the task is already the final task in the task sequence. As such, the order is added to the set of completed orders, and information such as the service cycle, service cost and route record are counted and output.
component ensures that only one order is entered for subsequent processing at a time. When the current order is completed and exits through exit, hold2 opens again and continues to serve the next order. (4) The order enters queue3, where the task agent selects the optimal server and sends "resource request" information to it. When the optimal service provider accepts the request, the busy attribute corresponding to the optimal resource changes to "true", and the hold3 component opens. The order flows through the delay component to simulate the cloud manufacturing service. After a certain delay time, the service is completed. (5) The order enters queue4 and continues to send the "release resource" message to the optimal server. When the optimal server accepts the message, the busy attribute corresponding to the optimal resource changes to "false", and the hold4 component opens. The order flows through the delay1 component. After a certain delay time, the release of the resource is completed. (6) The order flows through the exit component to complete all its service processes in this task. It then imports the post-order task sequence of this task: (a) if there is only one post-order task, it is directly imported into the enter component of the post-order task; (b) if there are multiple post-order tasks, the information of the current order is copied and imported into the enter component of the respective post-order tasks; and (c) if there is no post-order task, this signifies that the task is already the final task in the task sequence. As such, the order is added to the set of completed orders, and information such as the service cycle, service cost and route record are counted and output.

Cloud Order Modeling
The orders are submitted to the cloud platform by the cloud demanders through terminal devices. They are then imported to the corresponding cloud tasks according to their respective task routes to complete service processing. The cloud order agent is represented as follows: , , , ℎ > (5) Figure 2. Internal process of cloud task agent.

Cloud Order Modeling
The orders are submitted to the cloud platform by the cloud demanders through terminal devices. They are then imported to the corresponding cloud tasks according to their respective task routes to complete service processing. The cloud order agent is represented as follows: OA =< ID, owner, taskList, routeStamp, cost1Accum, cost2Accum, cost3Accum, reliabilityAccum, startTime, f inishTime > (5) where ID is the order's unique identification number; owner specifies which demander the order is issued by; taskList specifies the complete task path corresponding to the order; routeStamp records the order-task relationship and task-resource relationship corresponding to the order when the order is completed; cost1Accum, cost2Accum and cost3Accum record the request resource cost, logistics cost and release resource cost of the order, respectively; reliabilityAccum records the completion reliability of orders; and startTime and f inishTime record the start processing time and completion processing time of the order, respectively.

Cloud Message Modeling
When processing orders, cloud tasks need to send "request resource" information to the cloud server. After the processing is complete, a "release resource" message is sent to the cloud server. Since Anylogic's built-in message agent cannot carry extra information, this paper encapsulates a messages agent type, which can be expressed as follows: where msg is the content of the message (i.e., when a resource is requested, the msg's value is "request", and when a resource is released, the msg's value is "release"); resourceList specifies which resources are to be requested and released; and owner indicates which cloud task sent the message.

Cloud Demander Modeling
The cloud demander issues demand orders to the cloud platform to drive the operation of the model. The cloud demander agent can be represented as follows: where ID is the cloud demander's unique identification number; location is the latitude and longitude coordinates of the demander, which is used to initialize the location of the demander in the GIS map; orderList is used to initialize the orders issued by the demander; and Func sendOrders sends the orders of cloud demanders to the cloud platform, where the cloud platform schedules and allocates the orders uniformly.

Cloud Server Modeling
Cloud servers mainly transfer information and interact with cloud tasks. All types of cloud resources are stored in the resource pool of each cloud server. When receiving the "request resource" message, the server finds the corresponding resource in its resource pool and allocates it to the cloud task. When receiving the "release resource" message, the server releases the corresponding cloud resource and puts it back into the cloud resource pool. The cloud server agent can be represented as: SA =< ID, location, resourcePool, dScore, pScore, totalScore, Func con f igureResource > (8) where ID is the unique identification number of the cloud service provider; location is the latitude and longitude coordinates of the server, which is used to initialize the location of the server in the GIS map; resourcePool is used to store the respective virtual resources of the cloud server; dScore, pScore and totalScore are the respective distance score, price score and total score when the cloud task selects the optimal cloud server; and Func con f igureResource is used to execute and allocate resources when receiving information about cloud tasks. If the message's content is "release resource", the server finds the corresponding resource and changes its busy attribute to "false". If the message's content is "request resource", the corresponding resource is judged as follows: (a) if its value is "true", this indicates that the resource is faulty and the cloud task cannot be completed; as such, the order requested for processing is classified as failed; or (b) if the value is "false", this indicates that the resource is not faulty. Here, its busy attribute is judged as follows: (i) if the busy attribute is "false", the processing of the cloud order can be started, and (ii) if the busy attribute is "true", this indicates that the resource is being invoked by other tasks and it needs to wait until the other task is completed.

Robustness Measurement Index Based on Multi-Agent Simulation
Based on the multi-agent model and order task sequence stated in Section 2.1, the dynamic simulation of the CMP can be realized. At the end of the simulation, the order completion time, logistics transportation distance, resource occupation and other data can be output to evaluate the performance. QoS is commonly used to evaluate the CMP. As such, based on the combination of relevant literature and the simulation output data, this paper comprehensively evaluates the QoS value from four aspects: service time, service cost, service reliability and order completion rate [20,42,43]. The calculation formulae of these four indicators are as follows: (1) Service time This is the sum of the completion times of all orders within the simulation cycle, which can be expressed as where m is the total number of orders; j = 1, 2, . . . , m is the jth order in the order sequence; and t j is the completion time of the jth order, which can be obtained by the simulation results.
(2) Service cost This total cloud service cost is calculated from three aspects: the cloud resource service fee, logistics service fee and cloud resource release fee, which can be expressed as where m is the total number of orders; n is the number of tasks corresponding to each order; j = 1, 2, . . . m represents the jth order in the order sequence; i = 1,2, . . . n represents the ith task in the task sequence; t serving i,j is the cloud service time of the ith task in the jth order; p resource i,j is the service cost per unit time of the resource corresponding to the task; d i,j is the logistics distance corresponding to the task; c logistic is the logistics cost per unit distance; t releasing i,j is the release time of the cloud resources for the task; and p release is the cost per unit time of releasing resources.
(3) Service reliability Service reliability is a multiplicative index [42], which can be expressed as where r i,j is the service reliability of the i-th task in the j-th order, which is given in the reliabilityAccum attribute of the order agent.
(4) Order completion rate The order completion rate is the ratio of the number of completed orders within the simulation cycle to the total number of orders planned to be completed: where N 1 is the number of orders completed within the simulation cycle and N 2 is the number of orders that failed to be completed. In addition, the index values need to be standardized to consider the different index dimensions. A series of robustness experiments will be carried out later in this paper, so range standardization is carried out with the index values of each experiment as samples. The calculation formulae are as follows: where T k and C k are the respective service time and service cost of the k-th experiment; T min and T max are the respective minimum and maximum values of service time; and C min where ω 1 , ω 2 , ω 3 and ω 4 are the respective weight coefficients of the four indicators, and

Construction of Cloud Manufacturing Complex Network Model
The cloud manufacturing network (CMN) is composed of cloud service resources and the connections between resources. Due to the large number of resources and complex connection relationships, the network can be analyzed using the complex network model. Figure 3a shows the processing task paths of Order-A and Order-B, the resources used by each task in these paths, and the corresponding relationships between resources and servers. If two tasks are connected on a path, the respective resources used by the two tasks are also considered to be connected. Figure 3b shows how the CMN is formed by taking all the resources as network nodes and the connections between resources as connected edges.
where and are the respective service time and service cost of the k-th experiment; and are the respective minimum and maximum values of service time; and and are the respective minimum and maximum values of service cost. The reliability and order completion rate indicators are originally in the range of [0, 1], so there is no need for standardization.
The QoS value can be evaluated by synthesizing the above four dimensions: where , , and are the respective weight coefficients of the four indicators, and ∑ = 1 .

Construction of Cloud Manufacturing Complex Network Model
The cloud manufacturing network (CMN) is composed of cloud service resources and the connections between resources. Due to the large number of resources and complex connection relationships, the network can be analyzed using the complex network model. Figure 3a shows the processing task paths of Order-A and Order-B, the resources used by each task in these paths, and the corresponding relationships between resources and servers. If two tasks are connected on a path, the respective resources used by the two tasks are also considered to be connected. Figure 3b shows how the CMN is formed by taking all the resources as network nodes and the connections between resources as connected edges.

Robustness Measurement Index Based on Static Network Topology
It is generally considered that network robustness refers to the degree of network performance retention after the failure of network nodes or edges [44], and the change of the maximum connected subgraph after network node failure can reflect the degree of retention of the network's structural integrity. As such, the change rate of the maximum connected subgraph's node number is selected as one of the robustness evaluation indexes in this study: where N is the number of nodes in the maximum connected subgraph after the network is attacked, and N is the total number of nodes in the original network. In particular, S = 0 indicates that the network is in an unconnected state; and S = 1 indicates that the network is fully connected, and there is no isolated node. Additionally, the connections between the nodes change when a network node fails. This, in turn, affects the efficiency of information dissemination in the network. Therefore, network efficiency is used to evaluate the robustness of the network transfer efficiency when nodes are lost. The shorter the distance between two nodes in a network, the faster information can be transferred from one node to another. Based on this, the formula of network efficiency can be defined as: where N represents the total number of nodes in the network and d ij represents the shortest path between node i and node j. In particular, G = 0 indicates the efficiency of the network is the worst, where the whole network contains isolated nodes, and G = 1 indicates that the efficiency of the network is the best, where the information exchange between nodes is smooth.

Failure Mode Design for Robustness Analysis
The design of failure modes is key to robustness analysis. Based on the characteristics of cloud manufacturing, this paper proposes a topology-based resource failure mode and a server-based resource failure mode.
Topology-based resource failures are further divided into degree-based and betweennessbased resource failures, where (a) node degree (i.e., how closely a resource node is connected to other resource nodes in the CMN) is commonly used to measure a node's importance, and (b) node betweenness reflects the structural importance of the node [44,45], with a node with high betweenness having greater control over logistics and information flow in the network. The specific topology-based resource failure mode designs are shown in Table 1. Table 1. Design of topology-based resource failure modes.

Failure Mode Description Failure Mode Calculation Process
Topology-based resource failure modes Initial node degree loss (ID) Sort the resource nodes in the initial network by degree, from largest to smallest. Remove one node at a time, and repeat n times until all nodes in the network are removed.

Initial node betweenness loss (IB)
Sort the resource nodes in the initial network by betweenness, from largest to smallest. Remove one node at a time, and repeat n times until all nodes in the network are removed.
Note: The removal of nodes is performed differently in the complex network model and the multi-agent model: (a) the complex network model is performed by deleting the corresponding resource nodes and all connected edges on the nodes, and (b) the multi-agent model is represented by setting the corresponding resource agent to a "fault" state, that is, where the resource cannot provide services.
Server-based resource failures fully consider the realistic scenario of cloud manufacturing, where resource nodes involved in cloud manufacturing belong to different cloud servers. The successive failure of resource nodes of the same cloud server can simulate the scenario where the cloud server gradually exits the platform and no longer provides resources. Key cloud servers can be identified by comparing the robustness indexes of different cloud servers after the loss of resources, and focused monitoring and management of these key servers can effectively ensure the robustness of the CMN. The specific server-based resource failure mode designs are shown in Table 2. Table 2. Design of server-based resource failure modes.

Failure Mode Description Failure Mode Calculation Process
Server-based resource failure modes Successive failure of Server-1 s node (group S1) Select the resource nodes belonging to server S1 in the CMN. Remove one node at a time, and repeat n times until all resource nodes belonging to server S1 in the network are removed.
Successive failure of Server-2 s node (group S2) Select the resource nodes belonging to server S2 in the CMN. Remove one node at a time, and repeat n times until all resource nodes belonging to server S2 are removed. · · · · · · · · · · · · Successive failure of Server-n's node (group Sn) Select the resource nodes belonging to server Sn in the CMN. Remove one node at a time, and repeat n times until all resource nodes belonging to server Sn are removed.

Description of Model Parameters
A case study is carried out using the cloud manufacturing project of a new energy vehicle. This project provides life-cycle cloud manufacturing services for new energy vehicles, where the vehicles served are equipped with technology such as electrification and autonomous driving.
The cloud manufacturing project includes 24 order types, 95 cloud tasks (t1-t95) and 72 resource types (r1-r72). The corresponding resource types for each cloud task are shown in Table 3, and the cloud task routes of each order type are shown in Table 4.    A total of 5 cloud servers (S1-S5) participate in the project, with each server providing 72 types of cloud resources. The cloud servers have different pricing of resources, and they are at different distances from the cloud demanders, so they compete for different orders. To distinguish between them, resource r1 of servers S1-S5 are labeled r1-S1, r1-S2, r1-S3, r1-S4 and r1-S5, and so on.
In addition, there are 14 cloud demanders (d1-d14). Each cloud demander submits 24 orders, and the number of orders of each type is 1 (i.e., 1 of each of the 24 order types).  A total of 5 cloud servers (S1-S5) participate in the project, with each server providing 72 types of cloud resources. The cloud servers have different pricing of resources, and they are at different distances from the cloud demanders, so they compete for different orders. To distinguish between them, resource r1 of servers S1-S5 are labeled r1-S1, r1-S2, r1-S3, r1-S4 and r1-S5, and so on.
In addition, there are 14 cloud demanders (d1-d14). Each cloud demander submits 24 orders, and the number of orders of each type is 1 (i.e., 1 of each of the 24 order types).  A total of 5 cloud servers (S1-S5) participate in the project, with each server providing 72 types of cloud resources. The cloud servers have different pricing of resources, and they are at different distances from the cloud demanders, so they compete for different orders. To distinguish between them, resource r1 of servers S1-S5 are labeled r1-S1, r1-S2, r1-S3, r1-S4 and r1-S5, and so on.
In addition, there are 14 cloud demanders (d1-d14). Each cloud demander submits 24 orders, and the number of orders of each type is 1 (i.e., 1 of each of the 24 order types).  A total of 5 cloud servers (S1-S5) participate in the project, with each server providing 72 types of cloud resources. The cloud servers have different pricing of resources, and they are at different distances from the cloud demanders, so they compete for different orders. To distinguish between them, resource r1 of servers S1-S5 are labeled r1-S1, r1-S2, r1-S3, r1-S4 and r1-S5, and so on.
In addition, there are 14 cloud demanders (d1-d14). Each cloud demander submits 24 orders, and the number of orders of each type is 1 (i.e., 1 of each of the 24 order types).  A total of 5 cloud servers (S1-S5) participate in the project, with each server providing 72 types of cloud resources. The cloud servers have different pricing of resources, and they are at different distances from the cloud demanders, so they compete for different orders. To distinguish between them, resource r1 of servers S1-S5 are labeled r1-S1, r1-S2, r1-S3, r1-S4 and r1-S5, and so on.
In addition, there are 14 cloud demanders (d1-d14). Each cloud demander submits 24 orders, and the number of orders of each type is 1 (i.e., 1 of each of the 24 order types).  A total of 5 cloud servers (S1-S5) participate in the project, with each server providing 72 types of cloud resources. The cloud servers have different pricing of resources, and they are at different distances from the cloud demanders, so they compete for different orders. To distinguish between them, resource r1 of servers S1-S5 are labeled r1-S1, r1-S2, r1-S3, r1-S4 and r1-S5, and so on.
In addition, there are 14 cloud demanders (d1-d14). Each cloud demander submits 24 orders, and the number of orders of each type is 1 (i.e., 1 of each of the 24 order types).
Order 51 A total of 5 cloud servers (S1-S5) participate in the project, with each server providing 72 types of cloud resources. The cloud servers have different pricing of resources, and they are at different distances from the cloud demanders, so they compete for different orders. To distinguish between them, resource r1 of servers S1-S5 are labeled r1-S1, r1-S2, r1-S3, r1-S4 and r1-S5, and so on.
In addition, there are 14 cloud demanders (d1-d14). Each cloud demander submits 24 orders, and the number of orders of each type is 1 (i.e., 1 of each of the 24 order types).
Order 52 A total of 5 cloud servers (S1-S5) participate in the project, with each server providing 72 types of cloud resources. The cloud servers have different pricing of resources, and they are at different distances from the cloud demanders, so they compete for different orders. To distinguish between them, resource r1 of servers S1-S5 are labeled r1-S1, r1-S2, r1-S3, r1-S4 and r1-S5, and so on.
In addition, there are 14 cloud demanders (d1-d14). Each cloud demander submits 24 orders, and the number of orders of each type is 1 (i.e., 1 of each of the 24 order types).
Order 53 A total of 5 cloud servers (S1-S5) participate in the project, with each server providing 72 types of cloud resources. The cloud servers have different pricing of resources, and they are at different distances from the cloud demanders, so they compete for different orders. To distinguish between them, resource r1 of servers S1-S5 are labeled r1-S1, r1-S2, r1-S3, r1-S4 and r1-S5, and so on.
In addition, there are 14 cloud demanders (d1-d14). Each cloud demander submits 24 orders, and the number of orders of each type is 1 (i.e., 1 of each of the 24 order types).
Order 54 A total of 5 cloud servers (S1-S5) participate in the project, with each server providing 72 types of cloud resources. The cloud servers have different pricing of resources, and they are at different distances from the cloud demanders, so they compete for different orders. To distinguish between them, resource r1 of servers S1-S5 are labeled r1-S1, r1-S2, r1-S3, r1-S4 and r1-S5, and so on.
In addition, there are 14 cloud demanders (d1-d14). Each cloud demander submits 24 orders, and the number of orders of each type is 1 (i.e., 1 of each of the 24 order types).
A total of 5 cloud servers (S1-S5) participate in the project, with each server providing 72 types of cloud resources. The cloud servers have different pricing of resources, and they are at different distances from the cloud demanders, so they compete for different orders. To distinguish between them, resource r1 of servers S1-S5 are labeled r1-S1, r1-S2, r1-S3, r1-S4 and r1-S5, and so on.
In addition, there are 14 cloud demanders (d1-d14). Each cloud demander submits 24 orders, and the number of orders of each type is 1 (i.e., 1 of each of the 24 order types). The basic information of each cloud service provider and cloud demander is externally imported from Excel, as shown in Table 5. In this paper, the weight coefficient of QoS is set as ω 1 = 0.35, ω 2 = 0.35, ω 3 = 0.1, ω 4 = 0.2.

Structural Robustness Analysis
The cloud resource network is obtained according to the network model construction method stated in Section 3.1, as shown in Figure 4. Matlab-2020a software is used to perform data statistics on the network, and both the relevant network topology parameters and the degree distribution are obtained, as shown in Table 6 and Figure 5, respectively.

Structural Robustness Analysis
The cloud resource network is obtained according to the network model construction method stated in Section 3.1, as shown in Figure 4. Matlab-2020a software is used to perform data statistics on the network, and both the relevant network topology parameters and the degree distribution are obtained, as shown in Table 6 and Figure 5, respectively.       There are 231 resource nodes in the network, and the distribution of node degrees is seriously uneven. A few nodes occupy the majority of connected edges, indicating that the network has the typical characteristics of a scale-free network. The small density of the network indicates that it is a sparse network, with the nodes with higher degree values tending to connect the nodes with lower degree values.
Next, based on the failure modes designed in Section 4, Python 3.0 is used to simulate There are 231 resource nodes in the network, and the distribution of node degrees is seriously uneven. A few nodes occupy the majority of connected edges, indicating that the network has the typical characteristics of a scale-free network. The small density of the network indicates that it is a sparse network, with the nodes with higher degree values tending to connect the nodes with lower degree values.
Next, based on the failure modes designed in Section 4, Python 3.0 is used to simulate and calculate the changes of structural robustness indexes under the different failure modes.
The calculation results based on the topology-based failure mode are shown in Figure 6. This shows that (a) both network efficiency and the largest connected subgraph gradually decrease with the increase of the node failure ratio, and (b) the index value in the IB mode is always lower than that in the ID mode, which indicates that the robustness of the CMN is more fragile in the IB mode. Further, in the IB mode, when the failure ratio is around 0.05, there is a precipitous decline in robustness. In contrast, the overall decline trend of robustness in the ID mode is relatively stable. When the failure ratio reaches 0.4, the maximum connected subgraph in the IB mode drops below 50, and the network efficiency drops below 0.05, which indicates that the network is in a state of collapse. This shows that from the perspective of complex networks, resource nodes with large betweenness are more important to the maintenance of the structural robustness of CMNs. As such, the focus should first be on protecting those nodes with larger betweenness, followed by those nodes with a larger degree. The robustness analysis results under the cloud server failure mode are shown in Figure 7. This shows that (a) the network efficiency value shows a fluctuating trend during the failure of cloud server nodes, and there is no significant decline; however, the maximum connected subgraph decreases rapidly with the increase of the node failure ratio, which indicates that the maximum connected subgraph is more sensitive to the failure of the cloud server; and (b) the maximum connected subgraph of S1 not only decreases the most out of the 5 servers (from 231 to 161), but it also shows a significant decline when the node failure ratio is in the [0.05, 0.1] interval. Further, the maximum connected subgraph of S3 decreases from 231 to 175, and the respective maximum connected subgraphs of S2, S4 and S5 are all above 195 after being attacked. From the perspective of the complex network, the key cloud servers are selected as S1 followed by S3. These servers should be monitored and managed to protect the structural robustness of the CMN. The robustness analysis results under the cloud server failure mode are shown in Figure 7. This shows that (a) the network efficiency value shows a fluctuating trend during the failure of cloud server nodes, and there is no significant decline; however, the maximum connected subgraph decreases rapidly with the increase of the node failure ratio, which indicates that the maximum connected subgraph is more sensitive to the failure of the cloud server; and (b) the maximum connected subgraph of S1 not only decreases the most out of the 5 servers (from 231 to 161), but it also shows a significant decline when the node failure ratio is in the [0.05, 0.1] interval. Further, the maximum connected subgraph of S3 decreases from 231 to 175, and the respective maximum connected subgraphs of S2, S4 and S5 are all above 195 after being attacked. From the perspective of the complex network, the key cloud servers are selected as S1 followed by S3. These servers should be monitored and managed to protect the structural robustness of the CMN. the cloud server; and (b) the maximum connected subgraph of S1 not only decreases the most out of the 5 servers (from 231 to 161), but it also shows a significant decline when the node failure ratio is in the [0.05, 0.1] interval. Further, the maximum connected subgraph of S3 decreases from 231 to 175, and the respective maximum connected subgraphs of S2, S4 and S5 are all above 195 after being attacked. From the perspective of the complex network, the key cloud servers are selected as S1 followed by S3. These servers should be monitored and managed to protect the structural robustness of the CMN.

Process Robustness Analysis
In addition to the process robustness measures and failure modes, this paper uses multi-agent simulation software Anylogic and Python 3.0 to explore the variations of QoS under the different failure modes, as shown in Figure 8.

Process Robustness Analysis
In addition to the process robustness measures and failure modes, this paper uses multi-agent simulation software Anylogic and Python 3.0 to explore the variations of QoS under the different failure modes, as shown in Figure 8. As shown in Figure 8a, QoS rapidly decreases with the increase of the node failure ratio in both the IB and ID modes. For both modes, the QoS values drop below 0.25 when the node failure ratio is about 0.4. However, in the IB mode, all cloud orders fail to be processed when the node failure ratio is about 0.65, whereas in the ID mode, this occurs only when the node failure ratio is about 0.95. This shows that the robustness of the CMP is more fragile in the IB mode. From the perspective of multi-agent simulation, nodes with large betweenness are more important for maintaining the robustness of the dynamic processes of cloud manufacturing.
As shown in Figure 8b, the QoS index of S1 decreases the most (from 1 to less than 0.3), followed by S3 (from 1 to 0.44). Further, the QoS indexes of S2, S4 and S5 are all above 0.5 after being attacked, which indicates that the resource failure of these servers has little impact on the robustness of the CMP. From the perspective of multi-agent simulation, S1 and S3 are the key cloud servers, which is consistent with the evaluation results from the complex network perspective stated in Section 5.2.

Conclusions
This paper proposes a novel approach for the robustness analysis of the CMS, combining complex network analysis with multi-agent simulation, which extends the robustness analysis object from the CMN to the CMP. First, a multi-agent simulation model is constructed. The behavioral characteristics and models of several key agents in the CMP are detailed, and QoS is proposed as a robustness measure. Second, a complex network model of cloud manufacturing resources is established through both the order-task rela- As shown in Figure 8a, QoS rapidly decreases with the increase of the node failure ratio in both the IB and ID modes. For both modes, the QoS values drop below 0.25 when the node failure ratio is about 0.4. However, in the IB mode, all cloud orders fail to be processed when the node failure ratio is about 0.65, whereas in the ID mode, this occurs only when the node failure ratio is about 0.95. This shows that the robustness of the CMP is more fragile in the IB mode. From the perspective of multi-agent simulation, nodes with large betweenness are more important for maintaining the robustness of the dynamic processes of cloud manufacturing.
As shown in Figure 8b, the QoS index of S1 decreases the most (from 1 to less than 0.3), followed by S3 (from 1 to 0.44). Further, the QoS indexes of S2, S4 and S5 are all above 0.5 after being attacked, which indicates that the resource failure of these servers has little impact on the robustness of the CMP. From the perspective of multi-agent simulation, S1 and S3 are the key cloud servers, which is consistent with the evaluation results from the complex network perspective stated in Section 5.2.

Conclusions
This paper proposes a novel approach for the robustness analysis of the CMS, combining complex network analysis with multi-agent simulation, which extends the robustness analysis object from the CMN to the CMP. First, a multi-agent simulation model is constructed. The behavioral characteristics and models of several key agents in the CMP are detailed, and QoS is proposed as a robustness measure. Second, a complex network model of cloud manufacturing resources is established through both the order-task relationship and the task-resource relationship to investigate the robustness of the static topology of the CMN. For this, network efficiency and the maximum connected subgraph are proposed as robustness measures. Three attack strategies are designed, which are resource failure modes based on the degree, betweenness and individual server. To verify the feasibility and effectiveness of the proposed method, a case study is then conducted on a cloud manufacturing project of a new energy vehicle. The results show that the robustness of the system (both for the CMN and the CMP) is lowest under the betweenness-based resource failure mode. This indicates that resource nodes with large betweenness are most important to the structural robustness and process robustness of the project. As such, the CMP should focus on monitoring and managing these cloud manufacturing resources so that they can provide stable services. Under the server-based failure mode, the robustness of the system varies greatly depending on the failure behavior of the service provider (e.g., the failure of S1 leads to a sharp decline in robustness, but the failure of S2 has little impact). This indicates that the CMS can protect its robustness by identifying key servers and strengthening its supervision of them to prevent them from exiting the platform.
This paper primarily focuses on how various failure modes affect the performance of the CMS, and it proposes related robustness analysis methods and protection measures. Future research based on this established complex network and multi-agent simulation model will involve the design of corresponding recovery strategies and elasticity measures of the CMS. Simulation research will also be carried out to provide a quantitative and dynamic decision basis for the improvement of the robustness of the cloud manufacturing platform. Data Availability Statement: The used and analyzed datasets during the present study are available from the corresponding author on reasonable request.