Next Article in Journal
Weighted Lp Estimates for Multiple Generalized Marcinkiewicz Functions
Previous Article in Journal
Optimization of Monitoring Node Layout in Desert–Gobi–Wasteland Regions Based on Deep Reinforcement Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multi-Task Service Composition Method Considering Inter-Task Fairness in Cloud Manufacturing

1
College of Traffic & Transportation, Chongqing Jiaotong University, Chongqing 400074, China
2
College of Computer and Artificial Intelligence, Chaohu University, Hefei 238024, China
*
Author to whom correspondence should be addressed.
Symmetry 2026, 18(2), 238; https://doi.org/10.3390/sym18020238
Submission received: 10 December 2025 / Revised: 22 January 2026 / Accepted: 26 January 2026 / Published: 29 January 2026
(This article belongs to the Section Computer)

Abstract

Within the cloud manufacturing paradigm, Cloud Manufacturing Service Composition (CMSC) is a core technology for intelligent resource orchestration in Cloud Manufacturing Platforms (CMP). However, existing research faces critical limitations in real-world CMP operations: single-task-centric optimization ignores resource sharing/competition among coexisting manufacturing tasks (MTs), causing performance degradation and resource “starvation”; traditional heuristics require full re-execution for new scenarios, failing to support real-time online decision-making; single-agent reinforcement learning (RL) lacks mechanisms to balance global efficiency and inter-task fairness, suffering from inherent fairness defects. To address these challenges, this paper proposes a fairness-aware multi-task CMSC method based on Multi-Agent Reinforcement Learning (MARL) under the Centralized Training with Decentralized Execution (CTDE) framework, targeting the symmetry-breaking issue of uneven resource allocation among MTs and aiming to achieve symmetry restoration by restoring relative balance in resource acquisition. The method constructs a multi-task CMSC model that captures real-world resource sharing/competition among concurrent MTs, and integrates a centralized global coordination agent into the MARL framework (with independent task agents per MT) to dynamically regulate resource selection probabilities, overcoming single-agent fairness defects while preserving distributed autonomy. Additionally, a two-layer attention mechanism is introduced—task-level self-attention for intra-task subtask correlations and global state self-attention for critical resource features—enabling precise synergy between local task characteristics and global resource states. Experiments verify that the proposed method significantly enhances inter-task fairness while maintaining superior global Quality of Service (QoS), demonstrating its effectiveness in balancing efficiency and fairness for dynamic multi-task CMSC.

1. Introduction

Digital transformation in the manufacturing industry is driving its evolution towards a new service-centric, on-demand cloud manufacturing paradigm [1]. CMP leverages virtualization and servitization technologies to encapsulate geographically dispersed manufacturing resources into manufacturing services (MSs), enabling unified management and intelligent composition. This effectively addresses the challenges posed by rapid market changes and personalized customization [2]. Within this paradigm, the CMSC problem in CMP serves as a core technology: its goal is to select and integrate an optimal-performing MS chain from a massive set of candidate MSs, in accordance with the logical requirements of complex MTs. This technology directly determines the resource utilization efficiency and user satisfaction of CMP [3].
Despite remarkable progress in CMSC research, existing works exhibit notable limitations when confronted with the actual operational environments of CMP. Firstly, most studies focus on finding optimal service composition solutions (SCSs) for individual MTs [4,5]. However, in real-world CMP operations, multiple MTs of the same type often coexist within a specific period, sharing the CMP’s limited resources. This multi-task coexistence scenario triggers resource competition, which in turn leads to suboptimal overall performance or even resource “starvation” for some MTs when applying SCSs optimized for single MTs—an issue insufficiently addressed in existing research. Secondly, traditional heuristic optimization methods for CMSC face a critical practicality bottleneck in dynamic CMP scenarios [6,7]: such methods require full re-execution from scratch for each new scheduling scenario (e.g., new MT arrival, resource status update), rendering them incapable of supporting real-time online decision-making in on-demand manufacturing scenarios. Finally, existing studies generally prioritize the overall QoS of CMP while overlooking the fairness of QoS distribution among different MTs [8]. In long-term CMP operations, the continuous skewing of high-quality resources towards a small number of MTs will impair the experience and fairness perception of most users, thereby undermining the health and sustainability of the CMP ecosystem.
RL has been applied to task composition, task scheduling in CMP, and related fields [9,10,11]. Its unique advantage of “one-time training for reusable online deployment” effectively overcomes the poor reusability defect of traditional heuristic algorithms. However, to the best of our knowledge, existing RL-based studies still suffer from critical limitations that hinder their practical implementation: traditional single-agent RL adopts a sequential decision-making logic, which leads to inherent fairness flaws in inter-task resource allocation and makes it difficult to balance global efficiency and inter-task fairness. Collectively, these issues constitute the core bottlenecks in current research within this field.
To address the aforementioned issues, this paper proposes a multi-objective optimization and fairness-aware method for multi-task CMSC. The main contributions of this paper are as follows:
(1) It breaks through the limitations of single-task optimization frameworks by comprehensively considering the resource sharing and competition among multiple simultaneously arriving MTs. This design enables the constructed CMSC model to be more aligned with the actual operational scenarios of CMP.
(2) A centralized control agent is integrated into the MARL framework—where each task allocation process is mapped to an independent task agent—to monitor the global state in real time and dynamically regulate the resource selection probability of each task agent. The application of MARL under the CTDE framework facilitates collaborative decision-making among multiple task agents. This not only breaks through the fairness defects arising from sequential decision-making in single-agent-based inter-task resource allocation, thereby enhancing the equity of inter-task resource allocation, but also achieves the balanced optimization of global efficiency and inter-task fairness without compromising the autonomy of task agents in distributed decision-making.
(3) A two-layer attention mechanism is incorporated into the MARL framework: task-level self-attention is used to capture the correlation between subtasks within an individual task, while global state self-attention focuses on key global resource features. This mechanism enables precise synergy between local task features and global resource states and effectively improves the system’s decision-making accuracy in high-dimensional complex scenarios.
The remainder of this paper is organized as follows: Section 2 reviews relevant literature on CMSC, highlighting the research gaps addressed by this work. Section 3 presents the formalization of the problem, defining mathematical models for MT, MS, QoS, and fairness evaluation. Section 4 details the proposed improved MARL-based algorithm. Section 5 conducts experimental simulations to verify the proposed modeling framework’s effectiveness. Finally, Section 6 summarizes the full text, discusses the limitations of the current research, and proposes future research directions.

2. Related Work

As a core technology for efficient allocation of MSs, CMSC problem has emerged as a research hotspot in academia in recent years. Existing studies have explored the CMSC problem in depth from multiple dimensions, focusing primarily on model construction and optimization algorithm design.
Fundamentally, CMSC constitutes a complex multi-objective optimization problem. In their review, Li et al. [12] systematically delineated the complete CMSC process, identifying 11 high-frequency optimization objectives categorized by the three key stakeholders—service demanders, service providers, and CMP operators—and emphasized the significance of balancing interests across these multiple parties. As manufacturing environments grow increasingly complex, optimization objectives have expanded beyond the traditional metrics of cost and time [5] to encompass energy consumption [13], reliability [14,15,16], sustainability [17,18], and inter-service synergy [19,20]. Yin et al. [14] innovatively introduced the concepts of service matching degree and collaboration degree, establishing a dual-constraint multi-objective optimization model that better aligns with practical manufacturing requirements.
In practical operations, CMPs must handle multiple MTs simultaneously, rendering resource competition a critical issue. Wang et al. [21] developed a multi-objective optimization model (MTSC-MPC) that accounts for multi-task resource competition in a fog-manufacturing environment. For scenarios involving the concurrent arrival of multiple tasks, Zhu et al. [22] designed an optimization method based on bi-layer encoding, balancing exploration and exploitation through a dual-population differential evolution algorithm. Liu et al. [23] and Ping et al. [24] applied deep reinforcement learning (DRL) to multi-task scheduling and distributed robot service coordination, respectively. Hu et al. [25] modeled dynamic CMSC as an online packing problem and proposed several corresponding online strategies. While these studies address the coexistence of multiple tasks, they focus primarily on improving overall system efficiency or aggregated benefits, without explicitly modeling or optimizing the fairness of QoS distribution among individual MTs.
To tackle this NP-hard problem, various meta-heuristic algorithms have been widely adopted and refined. Early work by Amato et al. [4] on multi-cloud service aggregation systematically compared the performance of several multi-objective genetic algorithms (e.g., NSGA-II, SPEA2), laying a foundational framework for subsequent algorithmic research. Classic algorithms such as NSGA-II [26], MOPSO [5,27], and ant colony optimization [20,28] remain extensively utilized. To enhance performance, improved variants have been proposed: Yang et al. [13] developed EMOGWO, Zhang et al. [29] proposed EMOAHA, Li et al. [12] adopted an improved chaotic sparrow search algorithm, and Deng et al. [30] designed multi-strategy improved Artificial Rabbit algorithm. These optimized algorithms effectively boost global search capability and convergence speed by integrating strategies such as chaotic mapping, Cauchy mutation, and Lévy flight. Hybrid algorithms have also demonstrated considerable potential: Jing et al. [31] proposed a multi-population co-evolutionary algorithm, Xiong et al. [26] integrated NSGA-II with the mayfly algorithm, and Jin et al. [15] combined teaching-learning-based optimization with the flower pollination algorithm. Chen et al. [32] introduced the Whale-Goshawk Optimization (WGO) algorithm to address CMSC under uncertain environments. Despite these advancements, meta-heuristic algorithms face a critical practical bottleneck in dynamic on-demand manufacturing: they typically require complete re-execution from scratch for each new scheduling event (e.g., the arrival of a new MT or real-time resource updates). This inherent limitation precludes real-time online decision-making, severely restricting their practical application in dynamic CMP environments.
In contrast, RL has exhibited substantial potential in CMSC and related fields [9,10,11,33,34,35], owing to its adaptive decision-making and experience-learning capabilities. Its “offline training, online deployment” paradigm overcomes the need for repeated computation for each scheduling event, providing a more flexible and real-time solution for dynamically evolving environments.
In the context of single-objective optimization and QoS-aware service composition, research has focused on dimensionality reduction and unbiased learning. Atmani et al. [9] proposed the ML-MCSC scheme, which integrates machine learning-based dimensionality reduction (Random Forest and t-SNE) with the Monte Carlo method to filter low-quality service candidates, thereby reducing the dimensionality of the service space. An unbiased on-policy Monte Carlo method is then employed to construct a Markov Decision Process (MDP) model, enhancing cumulative QoS rewards and execution efficiency. Wang et al. [10] introduced a Sparse-Reward Deep Reinforcement Learning (SDRL) method for the hybrid scheduling of manufacturing and computational tasks in cloud manufacturing. This approach mitigates sparse reward issues through an objective hindsight experience replay (HER) mechanism, employs a continuous action space to characterize hybrid decisions, and combines a twin deep deterministic policy gradient (twin DDPG) to improve decision accuracy. Zou et al. [33] focused on resource scheduling in microservice architectures, proposing an RL-based method with a multi-dimensional reward function (incorporating response time, throughput, and resource utilization) to enable real-time allocation adjustments under dynamic loads.
In multi-objective optimization and dynamic scheduling, RL research has emphasized balancing conflicting objectives and environmental adaptability. Boudour et al. [34] proposed a Multi-Objective Reinforcement Learning (MORL) framework for service selection in multi-cloud IoT environments, comparing three algorithms: MPMOQL, PCN, and Envelope. Using a reward vector encompassing energy consumption, cost, response time, and availability, they demonstrated the superiority of MPMOQL in terms of Pareto-front convergence, diversity, and efficiency. Xiao et al. [11] constructed a DRL framework for multi-task scheduling, systematically comparing Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), and Normalized Advantage Function (NAF) algorithms, and confirming PPO’s superiority in QoS balance, convergence speed, and robustness under dynamic service availability. Chen et al. [35] proposed a Gated Transformer Reinforcement Learning algorithm (GTrPO) for online simulation task scheduling, utilizing a multi-channel discrete-event framework to capture temporal task features and an attention memory module to model multi-step state correlations.
Despite these breakthroughs, key limitations in existing RL research impede its practical application. Traditional single-agent RL adheres to sequential decision-making logic and lacks global coordination mechanisms. This not only results in inherent fairness deficiencies in inter-task resource allocation—making it challenging to balance global efficiency with per-task fairness—but also fails to effectively integrate local task features (e.g., subtask correlations) with global resource states, leading to suboptimal decision accuracy in high-dimensional complex scenarios. Additional challenges include inefficient training due to sparse rewards [10], subjective weighting of multiple objectives [34], and insufficient adaptability to dynamic environment [35]. Collectively, these issues represent a core bottleneck, indicating the need for future work to optimize multi-agent collaboration, strengthen feature fusion, improve multi-objective optimization strategies, and enhance robustness in uncertain environments.
In summary, significant gaps persist in both the modeling and solution of the CMSC problem. At the modeling level, resource competition among concurrent MTs and the fairness of QoS distribution across individual tasks have not been integrated into a unified optimization framework. At the algorithmic level, meta-heuristic algorithms require full re-execution for dynamic events, precluding real-time decision-making, while single-agent RL suffers from fairness. To bridge these gaps, this study aims to construct an integrated "resource competition—fair distribution" optimization model, achieving a synergistic balance among CMP operational efficiency, per-task QoS fairness, and high-dimensional decision accuracy. By addressing the combined challenges of model integration and algorithm adaptability, this research seeks to enhance the long-term competitiveness and user retention of CMPs, ultimately advancing the sustainable development of the cloud manufacturing ecosystem.

3. Problem Description and Formal Modeling

CMPs are consumer-centric and demand-driven digital platforms for information sharing and collaborative manufacturing, involving three types of stakeholders: service consumers, service providers, and the CMP itself. Service consumers can log in to the CMP to publish MT based on their actual needs. Upon receiving an MT, the CMP first determines whether it is a simple MT that can be independently completed by a single service. If so, it matches a suitable MS based on the consumer’s specific preferences. For complex MTs, the CMP provides optimal decision-making solutions following the process of “task decomposition, service discovery and matching, service selection, and composition.” After confirmation by both the consumer and the service provider, the MT is initiated. During execution, the CMP is responsible for scheduling and monitoring the entire process until the MT is completed in a closed loop.
This paper aims to design a CMSC model under complex manufacturing environments: targeting the non-functional requirements of a given complex MT, comprehensively considering various constraints, selecting suitable MS from the candidate MS set of each subtask, and constructing an optimal, controllable, and monitorable composite MS chain. The specific process is illustrated in Figure 1.
To formalize the CMSC problem with multiple parallel MTs on the CMP, we first introduce the following parameters:
(1)
n: The number of parallel complex MTs.
(2)
m: the number of subtasks decomposed from each MT.
(3)
C j ( 1 j m ) : the candidate MS set that can execute the j-th subtask.
(4)
M j = | C j | : the number of MS in C j .
(5)
Q ( i , j ) : the QoS values of MS i for the j-th subtask, covering multiple indicators: cost indicator cost ( i , j ) , time indicator time ( i , j ) , quality indicator quality ( i , j ) , reliability indicator reliability ( i , j ) , and availability indicator availability ( i , j ) .
(6)
L a ( i ) : the maximum number of subtask instances that MS i can process simultaneously.
Further, we define the decision variable x r , j , i { 0 , 1 } (where 1 r n , 1 j m , i C j ):
(1)
x r , j , i = 1 indicates that the j-th subtask of MT r is assigned to MS i.
(2)
x r , j , i = 0 indicates that the j-th subtask of MT r is not assigned to MS i.
Next, we specify the constraint conditions for the problem:
(1)
Each subtask must be assigned exactly one MS:
i C j x r , j , i = 1 , r [ 1 , n ] , j [ 1 , m ] .
(2)
The number of MTs assigned to each MS cannot exceed its load capacity:
r = 1 n j = 1 m x r , j , i L a ( i ) , i j = 1 m C j .
Subsequently, we formulate the objective functions for the problem:
(1) Efficiency objectives
Minimize f 1 ( X ) = 1 n · m r = 1 n j = 1 m i C j x r , j , i · cost ( i , j ) ,
Minimize f 2 ( X ) = 1 n · m r = 1 n j = 1 m i C j x r , j , i · time ( i , j ) ,
Maximize f 3 ( X ) = 1 n · m r = 1 n j = 1 m i C j x r , j , i · quality ( i , j ) ,
Maximize f 4 ( X ) = r = 1 n j = 1 m i C j x r , j , i · reliability ( i , j ) 1 n · m ,
Maximize f 5 ( X ) = r = 1 n j = 1 m i C j x r , j , i · availability ( i , j ) 1 n · m ,
(2) Minimize the QoS Discrepancy Among MTs.
Let the QoS vector of MT r be q r = ( q r 1 , q r 2 , q r 3 , q r 4 , q r 5 ) , where each component corresponds to a normalized QoS indicator: cost and time (converted to benefit-type indicators via transformation), while quality, reliability, and availability are inherently benefit-type. After normalization, all indicators are unified to the same direction (larger values are better).
To measure the overall discrepancy between MT QoS vectors, we adopt a fairness function based on the average Euclidean distance. The fairness objective is defined as minimizing the average pairwise Euclidean distance among all MTs:
Minimize f 6 ( X ) = 1 n ( n 1 ) r = 1 n s = r + 1 n q r q s ,
where · denotes the Euclidean norm, and q r and q s are the QoS vectors of MTs r and s, respectively.

4. Marl-Based Solution Methodology

For the above-mentioned multi-task CMSC problem, its solution space is j = 1 m ( M j ) n , which expands exponentially as the number of MTs n and subtasks m increase—making it a typical NP-hard problem. To ensure the existence of feasible solutions, the following resource sufficiency conditions must be satisfied: the size of the candidate MS set for each subtask M j n ; the load capacity of each MS i satisfies L a ( i ) 1 ; and the total load capacity of all MS j = 1 m i C j L a ( i ) n × m .
To model the multi-task CMSC problem, we define it as a Multi-Agent POMDP (MAPOMDP) with the tuple: N t a s k , N g l o b a l , O , A , P ( · | · ) , s t , a t , R , Π , γ .
Here:
  • N t a s k = n denotes the number of task-specific agents (one agent is assigned to each MT, where n is the total number of MTs in the CMSC scenario);
  • N g l o b a l = 1 is a single global coordinator agent (responsible for global fairness regulation and inter-agent coordination across all task-specific agents);
  • O represents the observation space of the multi-agent system, encompassing the local observation spaces of all task-specific agents and the global observation space of the coordinator agent;
  • o t a s k , t r denotes the local observation of the r-th task-specific agent at time t, which covers the subtask assignment progress of the corresponding MT, the QoS performance of selected MSs, and the resource occupancy ratio of the task;
  • o g l o b a l , t denotes the global observation of the coordinator agent at time t, which integrates the QoS and load states of all candidate MSs, the reward and resource occupancy of all MTs, and the overall resource utilization rate of the CMP;
  • A represents the action space of the multi-agent system, where the global coordinator agent does not directly output discrete actions but generates coordination signals to modulate the policy of each task-specific agent;
  • A t a s k r = { 0 , 1 , , M j 1 } denotes the discrete action space of the r-th task-specific agent, where each action corresponds to the selection of a candidate MS for an unassigned subtask j of the r-th MT, and M j is the number of candidate MSs for subtask j;
  • s t S denotes the global state of the system at time t, which contains the complete information of all candidate MSs and all MTs;
  • a t = { a t a s k 1 , a t a s k 2 , , a t a s k n } denotes the joint action of all task-specific agents at time t, where each element a t a s k r corresponds to the MS selection decision for the unassigned subtasks of the r-th MT;
  • R represents the immediate reward function of the multi-agent system, which outputs a local reward for each task-specific agent and a global reward for the coordinator agent;
  • Π = { π t a s k 1 , π t a s k 2 , , π t a s k n , π c o o r d } denotes the policy set of the multi-agent system;
  • π t a s k r denotes the policy of the r-th task-specific agent, which takes the local observation and the coordination signal from the global coordinator as inputs to predict the probability distribution of MS selection actions;
  • π c o o r d denotes the policy of the global coordinator agent, which generates the coordination signal based on the global observation to regulate the action distribution of task-specific agents;
  • γ [ 0 , 1 ] is the discount factor of the immediate reward, which balances the importance of immediate rewards and future cumulative rewards;
  • P ( o | s ) is the observation distribution function, which generates the local observation of each task-specific agent and the global observation of the coordinator agent based on the global state.
The optimization objective is to maximize the cumulative discounted reward:
J ( π ) = E τ π t = 0 T γ t 1 n r = 1 n R t a s k r ( s t , a t ) + λ R g l o b a l ( s t , a t )
where τ denotes the trajectory of the multi-agent system during an episode, T is the maximum number of steps to complete the assignment of all subtasks across all MTs, R t a s k r is the local reward of the r-th task-specific agent, R g l o b a l is the global reward of the coordinator agent, and λ [ 0 , 1 ] is the weight coefficient that balances the local QoS utility of individual MTs and the global fairness and resource efficiency of the CMP.
Considering the efficiency, stability, and adaptability of the MAPPO algorithm [9,36] in multi-agent environments, we use the MAPPO algorithm under the CTDE framework to solve the above composition model. This framework includes three types of networks:

4.1. Task Actor Network

Each task agent has an independent policy network π θ r ( a t a s k r | o t a s k r , Z t ) , which is responsible for generating the resource selection distribution based on local observations and coordination signals from the centralized control agent. The policy network incorporates task-level self-attention to perceive subtask correlations within a single MT:
π θ r ( a t a s k r | o t a s k r , Z t ) = Softmax MLP Attn t a s k ( H r ) Z t
where H r = Feat ( o t a s k r ) denotes the feature extraction function for local observations, Attn t a s k ( . ) denotes task-level self-attention, and ⊕ denotes feature concatenation. The task-level self-attention is defined as:
Head i ( H r ) = Softmax H r W i Q ( H r W i K ) T d k H r W i V ,
Attn t a s k ( H r ) = Concat Head 1 ( H r ) , , Head k ( H r ) W O .
where H r = [ h r 1 , h r 2 , , h r m ] denotes the feature matrix of subtasks for the r-th MT, W i Q , W i K , and W i V are projection matrices for query, key, and value, k is the number of attention heads, d k is the dimension of each attention head, and W O is the output projection matrix.

4.2. Centralized Control Network

A dedicated network π ϕ ( Z t | o g l o b a l , t ) for the centralized control agent, which monitors the global state in real time and generates coordination signals to dynamically regulate the resource selection probability of each task agent. This network incorporates global state self-attention to focus on critical resource features:
Z t = MLP Attn g l o b a l ( H g l o b a l ) .
where Attn g l o b a l ( . ) denotes global state self-attention (formulated identically to the task-level self-attention but applied to the global feature matrix H g l o b a l = [ h 1 , h 2 , , h M ] of all candidate MSs), enabling the centralized control agent to capture key resource-state correlations and avoid inter-task fairness defects caused by single-agent sequential selection.

4.3. Critic Network

A shared value network V ϕ ( s t ) , which evaluates the value of joint actions based on the global state (consistent with CTDE’s centralized evaluation mechanism).
The objective functions of the Task Actor network, Centralized Control network, and Critic network are as follows:
  • Task Actor Loss
L Actor ( θ r ) = E τ min π θ r ( a task r | o task r , Z t ) π θ r old ( a task r | o task r , Z t ) A ^ t r , clip π θ r π θ r old , 1 ϵ , 1 + ϵ A ^ t r β H ( π θ r ( · | o task r , Z t ) ) ,
Centralized Control Loss
L Control ( ϕ ) = E τ Z t Z t * 2 2 .
Critic Loss
L Critic ( ϕ ) = E τ V ϕ ( s t ) R ^ t 2 .
where θ r and ϕ are the parameters of the r-th Task Actor network and the shared networks (Centralized Control + Critic), respectively; H ( π θ r ( · | o t a s k r , Z t ) ) denotes the policy entropy (measuring the randomness of resource selection distribution); β is the regularization coefficient of the policy entropy (inherent hyperparameter of MAPPO for training stability, irrelevant to multi-objective weighting); ϵ is the PPO clipping parameter; A ^ t r is the advantage function for the r-th task agent calculated by Generalized Advantage Estimation (GAE):
A ^ t r = l = 0 ( γ λ GAE ) l δ t + l r .
where λ GAE is the GAE parameter (set to 0.95) [36], which balances the bias-variance trade-off in advantage estimation: a higher λ GAE reduces variance but increases bias, while a lower λ GAE has the opposite effect. Z t * is the optimal coordination signal derived from global fairness constraints. Future work will systematically analyze the sensitivity of λ GAE across the range [ 0.8 , 0.99 ] to optimize performance for different scenario scales.
After clarifying the MAPPO algorithm architecture based on the CTDE framework, we further design the global state, observation space, action space, reward function, state transition, and termination conditions to adapt it to solving the multi-task CMSC problem:

4.4. Global State

s t R m × M j × 5 needs to cover the complete information of all candidate MSs (captured by the centralized control agent for real-time monitoring):
s t = qos 1 ; l 1 ; qos 2 ; l 2 ; ; qos M ; l M T .
where q o s i = [ c i , t i , q i , r i , a i ] represents the five-dimensional QoS vector of MS i, and l i = c u r r e n t _ l o a d i m a x _ l o a d i denotes the normalized load state of MS i.

4.5. Observation Space

The observation space of each task agent and the centralized control agent is designed to integrate the two-layer attention mechanism. For clarity, we first give the following definitions:
Definition 1.
(Available service set): For the j-th subtask, its available MS set is
C j available ( t ) = i C j : current _ load i < max _ load i .
Definition 2.
(Unassigned subtask set): At time t, the set of unassigned subtasks for the r-th MT is
U t r = j : x r , j , i = 0 i C j .
where x r , j , i is a decision variable, indicating whether subtask j of task r is assigned to MS i.
  • For the r-th task agent,
    o task r = f ( r , j ) , i k j U t r , i C j available ( t ) R m × M j × 5 .
    where f ( r , j ) , i k = q o s i k (QoS value of MS i in the k-th objective dimension for available MS, 0 for unavailable MS), and the feature matrix of o t a s k r is processed by task-level self-attention to capture intra-task subtask correlations.
  • For the centralized control agent,
    o global = [ o task 1 , o task 2 , , o task n ] R n × m × M j × 5 .
    where the feature matrix of o g l o b a l is processed by global state self-attention to focus on critical resource features and generate coordination signals.

4.6. Action Space

The action space of each task agent r is a discrete selection (regulated by coordination signals from the centralized control agent):
A task r = 0 , 1 , , | C j available ( t ) | 1 , r = 1 , , n .
where | C j a v a i l a b l e ( t ) | is the number of available candidate MSs for the current unassigned subtask j of MT r. The action a t a s k r = k means that task agent r selects the k-th available candidate MS to assign to the current subtask, and the selection probability is dynamically adjusted by
MS index = arg max i C j available ( t ) π θ r ( i | o task r , Z t ) .
which breaks through fairness defects caused by single-agent sequential selection and ensures inter-task equity in resource allocation.

4.7. Reward Function

The reward function abandons subjective fairness penalty coefficients and achieves fairness regulation via the centralized control agent and coordination signals:
  • For the r-th task agent: R t a s k r ( s t , a t ) = q ¯ r n o r m (normalized QoS average value of all currently assigned subtasks for MT r, arithmetic averaging for cost/time/quality, geometric averaging for reliability/availability), reflecting local task utility without subjective weighting;
  • For the centralized control agent: R g l o b a l ( s t , a t ) = 1 F ( s t ) (global fairness reward), where F ( s t ) is the fairness metric (Euclidean distance of normalized QoS across MTs), and R g l o b a l guides the centralized control agent to generate optimal coordination signals;
  • The total reward for the multi-agent system is
    R t = 1 n r = 1 n R task r ( s t , a t ) + λ R global ( s t , a t ) .
    where λ is a coefficient balancing local utility and global fairness.

4.8. State Transition

s t + 1 = T ( s t , a t ) , where the update rules are as follows:
  • Increment the c u r r e n t _ l o a d of the selected MS by 1;
  • Update the task-service allocation matrix X = [ x r , j , i ] ;
  • Recompute the set of unassigned subtasks U t + 1 r for each MT r;
  • Update the global observation o g l o b a l and generate new coordination signals via the centralized control agent.

4.9. Termination Conditions

The episode terminates if either:
  • All subtasks have been assigned: r = 1 n j = 1 m i C j x r , j , i = n × m ;
  • The maximum number of steps T is reached.

5. Experimental Analysis

5.1. Experimental Setup and Baseline Comparison

5.1.1. Experimental Environment Configuration

To fully verify the effectiveness of the proposed multi-task CMSC model and the superiority of the MARL-based solving algorithm, this study built a systematic comparative experimental platform. The hardware environment uses an Intel Core i7-12700H processor (with six performance cores + eight efficiency cores, up to 4.7 GHz turbo frequency) and 32 GB DDR4 3200 MHz memory; the software environment is built on Python 3.9 with the PyTorch 1.12.0 deep learning framework. All comparative experiments were conducted under the same hardware conditions, and each experiment was repeated five times to ensure the statistical significance of the results.
Based on this standardized experimental setup, the MARL framework adopted in this study follows the MAPPO paradigm. Its core network architecture comprises three modular components with distinct structural designs:
  • The task-specific policy network is structured with a two-layer fully connected feature extraction module and independent service selection sub-networks for each subtask. It integrates a two-head multi-head attention mechanism (equipped with residual connections and layer normalization) to model subtask correlations.
  • The global coordination network consists of a two-layer fully connected feature extraction layer, a four-head self-attention module (paired with layer normalization and residual connections), and a feed-forward network for further feature processing.
  • The centralized value estimation network is a three-layer fully connected network, with hidden layer dimensions scaled to twice the base hidden dimension and the base hidden dimension in sequence, outputting scalar global value estimates.
The core hyperparameters are configured as follows: the discount factor γ is set to 0.99, the PPO clipping coefficient ϵ to 0.2, the entropy regularization coefficient β to 0.01, and the GAE-Lambda coefficient λ to 0.95. The learning rate is set to 1 × 10 4 for the policy network and 1 × 10 3 for the value estimation network. A total of 2000 training episodes are configured with a maximum of 500 steps per episode to balance training efficiency and convergence stability. For training optimization, gradient clipping is enabled to prevent gradient explosion; the QoS cache is configured with a maximum capacity of 1000 to reduce redundant computations; a fixed random seed is used to ensure experimental reproducibility.

5.1.2. Dataset Design

To comprehensively evaluate the algorithm’s performance under different scenarios, this study designs a dataset generation mechanism covering two dimensions: “scale scalability” and “resource constraint” and constructs six test datasets (see Table 1 for details). All datasets are generated with explicit QoS attribute distributions and strict repeatability guarantees, addressing the concerns of transparency and replicability in experimental research.
All datasets include five core QoS attributes of MSs with clear optimization directions. The generation of these attributes follows a tiered uniform distribution to simulate the inherent heterogeneity of candidate services, and the specific settings are as follows:
(1) QoS Attribute Definition
  • Cost: Monetary cost of service execution (unit: CNY), optimization direction: minimization (smaller is better);
  • Time: Execution duration of service (unit: hours), optimization direction: minimization (smaller is better);
  • Quality: Qualification rate of service output (dimensionless), optimization direction: maximization (larger is better);
  • Reliability: Probability of fault-free service operation (dimensionless), optimization direction: maximization (larger is better);
  • Availability: Probability of service being online and accessible (dimensionless), optimization direction: maximization (larger is better).
(2) Tiered Generation Distribution
To simulate the quality differences among candidate MS, the candidate services of each subtask type are divided into three tiers: high-quality, medium-quality, and general-quality. QoS values of services in different tiers are generated from uniform distributions within fixed ranges (see Table 2). The number of services in each tier is proportional to the total number of candidate services per subtask type ( M j ). For example, when M j = 20 , there are seven high-quality services, seven medium-quality services, and six general-quality services; the remainder (if any) is allocated to the first two tiers. All values are rounded to two decimal places (for cost and time) or four decimal places (for quality, reliability, and availability) to ensure data precision.
(3) Repeatability and Normalization
A fixed random seed is used for all data generation to ensure consistent results across replications. A unified global min–max normalization strategy is adopted for standardized processing: QoS values are normalized using precomputed global minimum and maximum values (across the entire dataset). For scenarios where the minimum value equals the maximum value, boundary handling is performed by returning 0.5 to avoid extreme values. This approach ensures data consistency while preserving the relative differences between services.
(4) Scale Expansion Datasets (S1, S2, S3)
By sequentially increasing the number of parallel MTs (n), subtask types (m), and candidate MSs per subtask type ( M j ), the magnitude of the solution space is gradually expanded to simulate the evolution of CMP from small-scale to large-scale scenarios. These datasets are mainly used to evaluate the convergence, computational efficiency, and scalability of the algorithm under different problem scales, verifying its adaptability to complex CMSC problems. The QoS generation distribution remains consistent across all scale datasets to ensure that performance variations are solely attributed to problem scale.
(5) Resource Constraint Datasets (D1, D2, D3)
By fixing the number of parallel MTs ( n = 10 ) and subtask types ( m = 6 ) and adjusting the number of candidate MSs per subtask type ( M j ) from 15 to 65, differentiated competition environments ranging from “extremely tight resources” to “sufficient resources” are constructed. This set of datasets is used to verify the core performance of the proposed model in multi-task resource sharing scenarios, including fairness guarantee, equilibrium solution exploration, and high-quality scheme generation under different resource competition intensities. It thereby fully tests the effectiveness of the fairness objective modeling and multi-task collaborative decision-making mechanism. Consistent QoS generation rules are maintained to isolate the impact of resource constraint levels on experimental results.

5.1.3. Baseline Algorithms

To systematically verify the comprehensive performance of the proposed algorithm, this study selected six representative and novel algorithms as baselines for comparison: Particle Swarm Optimizer (PSO) [37], FATA [38], Snake Optimizer (SO) [39], Differential Evolution (DE) [40], Single-Agent Reinforcement Learning (RL, represented by PPO [36]), and MOEA/D [41]. The characteristics of each algorithm are detailed as follows:
  • PSO: A classic swarm intelligence algorithm proposed in 1995, inspired by the collective foraging behavior of bird flocks and fish schools. Each solution is modeled as a “particle” that updates its position and velocity by tracking its own historical best and the global best of the swarm, thus effectively balancing exploration and exploitation. Due to its simple structure, PSO can be easily integrated with weight parameters, making it well-suited for multi-objective weighted optimization tasks. Parameter settings: population size = 100, inertia weight = 0.7298, cognitive coefficient = 1.49618, social coefficient = 1.49618, maximum iterations = 2000, velocity limit = [−1, 1].
  • FATA: A state-of-the-art physics-inspired swarm intelligence algorithm proposed in 2024, mimicking the mirage formation process. It integrates the mirage light filtering (MLF) principle (with definite integration) and light propagation strategy (LPS) (with trigonometric principles) to balance global exploration and local exploitation, exhibiting excellent performance in continuous multi-objective optimization and engineering tasks. Parameter settings: population size = 100, maximum iterations = 2000, MLF integration step = 0.01, LPS angle range = [0, 2 π ], exploration factor = 0.8.
  • Snake Optimizer (SO): A novel nature-inspired metaheuristic algorithm proposed in 2022, inspired by the unique mating and foraging behaviors of snakes. It divides the search process into exploration and exploitation phases, simulating fight and mating modes to update solutions, and thus demonstrates strong competitiveness in multi-objective optimization scenarios. Parameter settings: population size = 100, maximum iterations = 2000, fight probability = 0.5, mating probability = 0.3, exploration-exploitation balance factor = 0.6.
  • Differential Evolution (DE): A classic yet continuously improved evolutionary algorithm. It optimizes solutions through mutation, crossover, and selection operations, and its recent variants have been widely applied in multi-objective weighted optimization. Owing to its simplicity and effectiveness, DE is regarded as a representative baseline for evolutionary algorithms. Parameter settings: population size = 100, maximum iterations = 2000, mutation factor = 0.5, crossover probability = 0.8, selection strategy = greedy selection.
  • Single-Agent RL (represented by PPO): A cutting-edge RL algorithm for optimization tasks. It optimizes the policy via proximal policy optimization to adapt to complex multi-objective search spaces, showing remarkable effectiveness in single-agent and cooperative optimization problems. Parameter settings: Learning rate = 3 × 10 4 , batch size = 64, discount factor γ = 0.99, clip range ϵ = 0.2, total training steps = 5000, hidden layer dimension = 128.
  • MOEA/D: A landmark decomposition-based multi-objective evolutionary algorithm. It decomposes multi-objective optimization problems into scalar subproblems and optimizes them simultaneously. This algorithm has advantages in handling high-dimensional objectives and generating evenly distributed solutions, thus serving as a core baseline for decomposition-based methods. Parameter settings: population size = 100, maximum iterations = 2000, neighborhood size = 20, crossover probability = 0.9, mutation probability = 0.01, penalty factor = 5.

5.2. Convergence Analysis

Convergence, a core metric for evaluating optimization algorithms, directly determines an algorithm’s ability to escape local optima and approach the global optimal solution stably during iteration. Its performance is closely tied to the algorithm’s practical value in industrial-scale complex scenarios. The CMSC problem faces challenges such as exponentially expanding solution space (with task complexity) and dynamically changing resource competition. Differences in how algorithms balance global exploration and local exploitation lead to distinct convergence characteristics across problem scales. This study transforms the multi-objective optimization problem into a single-objective one via weight weighting and conducts convergence comparison experiments on scale-expanded datasets (S1, S2, S3). By integrating fitness curves, statistical tables, and distribution plots, we comprehensively evaluate the convergence performance of the proposed MARL algorithm across different problem complexities from the perspective of synergistic convergence efficiency and optimization accuracy. Note that MARL adopts an “offline training-online deployment” mode: its convergence iteration count in offline training is not directly comparable to the real-time iteration of traditional algorithms, and its core advantages lie in the efficiency of online application and the high accuracy of optimization results after training. This analysis focuses on this characteristic.

5.2.1. Synergistic Analysis of Convergence Efficiency and Optimization Accuracy

Convergence efficiency is measured by the number of iterations required to reach stable convergence, while optimization accuracy is determined by the final fitness value and its proximity to the theoretical optimal solution; together, they form the core dimensions of an algorithm’s practical value. From Table 3 and Figure 2 (fitness curves), MARL exhibits a unique “more iterations but significantly leading accuracy” characteristic, which aligns well with its practical deployment mode:
  • Small-scale (S1) scenario: MARL has an average convergence iteration count of 741 (the highest among all comparison algorithms), while traditional algorithms like PSO (36), DE (45.6), and FATA (63.2) have much lower counts. However, the fast convergence of traditional algorithms is essentially premature convergence to local optima: their fitness values increase slowly in the early iteration stage, and the stable average fitness is generally below 0.57 (FATA: 0.541261). By contrast, MARL, relying on multi-agent collaborative decision-making, shows a steep fitness increase in the early stage (first 250 iterations), with a final average fitness of 0.58766 (significantly higher than all competitors). For the single-agent RL (PPO), its average convergence iteration count is 569.4 (lower than MARL), but its final average fitness (0.58166) is 0.60% lower than MARL; MOEA/D achieves 0.584352 with 153.4 iterations, still slightly lower than MARL. This indicates that MARL avoids local optima through sufficient offline iteration exploration, achieving higher-precision optimization and laying a high-quality decision-making foundation for online application.
  • Medium-scale (S2) scenario: As the solution space dimension increases, MARL’s average convergence iteration count rises to 772 (only 31 more than S1, a <5% increase), showing good scalability; its final average fitness further improves to 0.59282 (the only algorithm with increased accuracy in the medium-scale scenario). In contrast, other algorithms either see a sharp increase in iterations (e.g., RL (PPO) rises from 569.4 to 912.4, a 60.2% increase) or a significant drop in accuracy (e.g., DE drops from 0.567525 to 0.511148, a 9.93% decrease), failing to balance efficiency and accuracy. MOEA/D achieves 0.568239 with 106.8 iterations (4.15% lower than MARL); SO maintains 626.8 iterations but only achieves 0.531644 (a large gap from MARL). MARL’s ability to improve accuracy against the trend stems from multi-agent division of labor, which reduces the search complexity of high-dimensional solution spaces and continuously explores high-quality solutions via offline collaborative exploration—avoiding the “efficiency-accuracy trade-off” of traditional algorithms in expanded solution spaces.
  • Large-scale (S3) scenario: The exponentially expanding solution space challenges global exploration capabilities. MARL’s average convergence iteration count is 1025 (a 32.8% increase from S2, still reasonable), and its final average fitness is 0.58508 (only a 1.32% decrease from S2, the smallest attenuation). Traditional algorithms generally suffer from “sharp accuracy drop + inefficient iteration”: DE, PSO, and FATA maintain low iterations (32–44.2) but have average fitness below 0.51 (DE: 0.499896); RL (PPO) reaches 1999 iterations (close to the maximum limit) with fitness dropping to 0.54674 (a 6.00% attenuation), and its convergence process fluctuates frequently (Figure 2c), failing to stabilize. From boxplots and state plots: MARL’s fitness is highly concentrated in 0.58–0.59 (max: 0.5906, min: 0.5807), with a narrow boxplot and short whiskers; RL (PPO) has a wide boxplot, long whiskers, and fitness scattered in 0.4813–0.5942 (standard deviation: 0.039822), with severely degraded stability and accuracy. This confirms MARL’s advantage in complex solution spaces: its multi-agent distributed search and collaborative optimization integrate local search information via reasonable offline iterations, forming a force to approach the global optimal, maintaining high accuracy even in large-scale scenarios and ensuring reliable online response.

5.2.2. Convergence Stability Analysis

Convergence stability is evaluated via indicators such as fitness standard deviation, convergence fluctuation, and value range span across multiple runs, reflecting the consistency of algorithm performance across different instances (supported by Table 4, Table 5 and Table 6 and fitness distribution Figure 3, Figure 4 and Figure 5):
  • S1 scenario: MARL has a fitness standard deviation of 0.006082 (the smallest among all algorithms), with a compact boxplot (no outliers) and a max–min gap of 0.0168—indicating strong consistency in offline training. Competitors like SO (0.011633), FATA (0.015717), and RL (PPO) (0.018997) have higher standard deviations; DE has a standard deviation of 0.027078, a wide whisker range, and a max–min gap of 0.080882—showing large volatility. This demonstrates that MARL stably outputs high-quality solutions in offline training, with little sensitivity to initial condition fluctuations.
  • S2 scenario: As complexity increases, most competitors’ stability decreases, but MARL remains consistent: its standard deviation is 0.011101 (slightly higher than S1), with a narrow boxplot and a max–min gap of 0.0296. Most competitors cannot balance stability and accuracy: DE has a low standard deviation (0.009807) but a boxplot concentrated in the low-fitness range (average: 0.511148); RL (PPO) has a standard deviation of 0.012367, longer whiskers, and 2.77% lower accuracy than MARL. MARL’s high stability comes from the robustness of multi-agent collaboration: information interaction and complementarity offset the uncertainty of single search paths, reducing offline training volatility.
  • S3 scenario: Stability differences are more significant: MARL has a standard deviation of 0.003375, a very narrow boxplot (no outliers), and a max–min gap of 0.0099—showing excellent consistency. Competitors’ stability degrades sharply: RL (PPO) has a standard deviation of 0.039822, a wide boxplot, and a range of 0.1129 (highly uncertain); DE, FATA, and PSO have low standard deviations (0.002058–0.003622) but boxplots concentrated in the low-fitness range (“invalid stability” with low accuracy). This indicates that MARL maintains stable optimization in complex solution spaces: dynamic agent collaboration and adaptive strategy adjustment resist the uncertainty of large-scale solution spaces, ensuring reliable offline training results.
In summary, based on the “offline training-online deployment” mode, MARL exhibits excellent comprehensive convergence performance across all scales: although its offline training iterations are relatively large, it achieves significantly leading optimization accuracy (with minimal attenuation) via sufficient collaborative exploration; meanwhile, its convergence stability is extremely strong (boxplots show highly concentrated values with no significant fluctuations), unaffected by increasing problem complexity. This advantage stems from the unique role of multi-agent collaboration in balancing exploration/exploitation, reducing solution space complexity, and resisting uncertainty—meeting the core needs of offline training (accuracy and stability) and laying a solid foundation for efficient online response, fully verifying MARL’s applicability and superiority in industrial-scale cloud manufacturing multi-task parallel service composition optimization.

5.2.3. Time Complexity Analysis

Building on the aforementioned synergistic analysis and verification of the algorithm’s convergence efficiency, optimization accuracy, and convergence stability, this section further conducts a quantitative analysis of the time complexity of the proposed method: The method is designed and implemented based on the MAPPO framework, with a theoretical asymptotic time complexity of O T · m · M j · n + B · H a · H , where T denotes the total number of training iteration steps, m is the number of subtask types, M j represents the number of candidate services per subtask, n is the number of tasks, B stands for the batch size, H a denotes the number of attention heads, and H is the dimension of the neural network hidden layer. By enabling engineering optimization strategies such as QoS caching and vectorized fairness calculation, the order of the algorithm’s time complexity remains unchanged, but the constant terms in the computation process are effectively reduced, resulting in a noticeable reduction in the actual effective complexity.
To objectively evaluate the runtime stability of the algorithm across scenarios with different complexity levels (eliminating the interference of data magnitude differences), this study introduces the Coefficient of Variation (CV) as the core evaluation metric, calculated by the formula C V = σ μ × 100 % , where σ is the standard deviation of runtime and μ is the mean runtime. This metric accurately reflects the relative volatility of runtime, with a lower CV value indicating stronger relative stability of the algorithm’s runtime. The revised actual runtime data from five repeated runs of the algorithm under scenarios with different complexity levels validate the aforementioned complexity characteristics:
(1)
S1 scenario: The average runtime is 51.33 s (with a standard deviation of 7.77 s and a CV of ≈15.1%), ranging from a minimum of 42.43 s to a maximum of 63.37 s. In this scenario, the algorithm’s runtime exhibits high relative volatility due to factors such as QoS cache hit rate and instantaneous system resource occupancy, yet it generally aligns with the complexity characteristics dominated by the scale of subtasks and services.
(2)
S2 scenario: The average runtime reaches 127.34 s (with a standard deviation of 9.03 s and a CV of ≈7.1%), with a minimum of 116.00 s and a maximum of 135.58 s. As the scale of subtasks and services and the number of training iterations increase, the absolute runtime of the algorithm shows a linear growth trend, and the CV of runtime decreases by approximately 53% compared with the S1 scenario, indicating a significant improvement in relative stability.
(3)
S3 scenario: The average runtime rises to 442.38 s (with a standard deviation of 21.09 s and a CV of ≈4.8%), varying from 409.17 s (minimum) to 470.81 s (maximum). With further increases in complexity, although the absolute standard deviation of runtime increases due to the larger base value, the CV continues to drop below 5%, demonstrating that the algorithm achieves significantly better relative runtime stability in large-scale scenarios.
Overall, the absolute runtime of the algorithm increases significantly with the expansion of subtask-service scale and training iteration volume, and the absolute standard deviation of runtime also shows an upward trend. However, the CV decreases from 15.1% to 4.8%, indicating that the relative volatility of the algorithm’s runtime is substantially reduced as complexity increases, and the relative stability is significantly enhanced.

5.3. Robustness Analysis

This study systematically evaluates the robustness of seven types of algorithms in typical disturbance scenarios of CMP from three core dimensions: fitness level, performance degradation rate (degradation rate), and output stability (standard deviation). The disturbance scenarios are divided into two categories:
(1)
Service Failure Scenario: A small-scale dataset (S1) is used, with the number of candidate services set to 10. The probability of service failure increases stepwise from 10% to 50%, simulating service node failures under high-concurrency tasks.
(2)
QoS Degradation Scenario: A medium-scale dataset (S2) is used, with the probability of QoS decline similarly increasing from 10% to 50%, simulating the impact of service performance fluctuations on the overall platform performance.

5.3.1. Robustness Analysis Under Service Failure

In the service failure scenario, the robustness performance of each algorithm shows significant differences as the probability of service failure gradually increases from 0% (no perturbation) to 50% (high perturbation). The specific data comparison is shown in Table 7, and the fitness trend is illustrated in Figure 6 and Figure 7.
Regarding fitness level, as can be seen from Table 7, under no perturbation (0% failure probability), the Multi-Agent Reinforcement Learning (MARL) algorithm achieves the highest average fitness at 0.7396. As the failure probability increases, the fitness of all algorithms declines, but MARL maintains the highest fitness value at each failure probability level. When the failure probability reaches 50%, the fitness of MARL is 0.484629, still leading the other compared algorithms.
As the failure probability increases, the performance degradation rates of the algorithms exhibit different characteristics. The degradation rate of MARL grows relatively quickly with increasing disturbance. At a 10% failure probability, its degradation rate is 5.13%; when the failure probability rises to 50%, its degradation rate increases to 34.47%, the highest among all algorithms. In contrast, algorithms like FATA show a more gradual increase in degradation rate under the same conditions.
In terms of output stability, MARL shows a standard deviation of 0.003056 under the initial no-perturbation state, indicating relatively good stability. However, as the failure probability increases, its output fluctuation also rises, with the standard deviation reaching 0.063072 at a 50% failure probability. This places it at a similar level to algorithms like DE and SO, while FATA demonstrates better stability throughout the process.
In summary, in the service unavailability scenario, MARL demonstrates excellent fitness maintenance capabilities. However, its performance is relatively sensitive to failure disturbances, exhibiting a higher degradation rate and moderate output stability.

5.3.2. Robustness Analysis Under Service Degradation

In the QoS degradation scenario, the robustness performance of each algorithm, as the probability of QoS decline increases from 0% to 50%, is shown in Table 8, and the fitness changes are illustrated in Figure 8 and Figure 9.
In terms of fitness level, MARL shows a significant advantage. As seen from Table 2, its initial fitness under no perturbation is as high as 0.889861, far exceeding other algorithms. Even when the QoS degradation probability reaches 50%, its fitness remains at a high level of 0.794802. A noteworthy phenomenon is that at QoS degradation probabilities of 10% and 20%, MARL’s fitness is slightly higher than the initial value (0.897658 and 0.893154, respectively), demonstrating a certain adaptive optimization capability.
In terms of performance degradation rate, MARL’s performance is particularly outstanding. As shown in Table 2, at a 10% QoS degradation probability, its degradation rate is negative (−0.88%), meaning performance improves rather than declines. Even at a 50% degradation probability, its degradation rate is only 10.68%, significantly lower than that of other algorithms, such as DE (25.65%), indicating strong resistance to performance attenuation.
Regarding output stability, MARL also performs excellently. Its standard deviation under the no-perturbation state is extremely low (0.001336), indicating a very stable output. As the QoS degradation probability increases to 50%, its standard deviation grows to 0.019268, a relatively controlled increase. Its stability performance is better than that of algorithms like RL (whose standard deviation increases from 0.011314 to 0.035791).
In conclusion, in the service degradation scenario, MARL demonstrates comprehensive and significant robustness advantages across all three dimensions—fitness maintenance, resistance to degradation, and output stability—making it an effective algorithm choice for handling QoS fluctuations.
Overall, MARL shows significant and comprehensive robustness advantages in coping with QoS fluctuations, making it the preferred recommended algorithm for CMP in dynamic and uncertain environments. However, in scenarios with frequent service node failures, although its absolute performance remains leading, the higher degradation rate suggests that its performance maintenance mechanism still has room for optimization under such extreme disturbances. Future research could focus on enhancing MARL’s adaptability to service failures to further improve its comprehensive robustness in all-around disturbance environments.

5.4. Fairness Analysis

Fairness stands as a core ethical and technical criterion for resource allocation schemes in CMP. The modeling logic of its measurement methods directly dictates the optimization direction and ultimate allocation performance of algorithms. This paper first clarifies the mathematical modeling formulas of four mainstream fairness measurement methods, namely Euclidean distance, Gini coefficient, max–min fairness, and Jain’s index. Subsequently, integrating experimental data from three resource scenarios (constrained, moderate, and sufficient), it systematically compares the performance of each method, conducts an in-depth analysis of their advantages, disadvantages, and applicable scenarios, and provides comprehensive theoretical and practical references for fairness optimization.

5.4.1. Mathematical Modeling of Fairness Metrics

Mathematical modeling of fairness metrics revolves around the equilibrium of QoS allocation among MTs, forming four quantitative methods with clear logical orientations:
  • Euclidean distance measures the global average difference by calculating the mean Euclidean distance of five-dimensional normalized QoS vectors between all MT pairs, with its modeling formula shown in Equation (8) and a smaller value indicating better fairness.
  • The Gini coefficient characterizes the degree of inequality based on the cumulative distribution difference of comprehensive normalized QoS scores, with the formula
    G = 1 2 n 2 μ i = 1 n j = 1 n | q i q j | .
    where μ = 1 n i = 1 n q i is the mean of comprehensive scores, and a smaller value implies more balanced allocation.
  • Max–min fairness focuses on the performance gap between extreme tasks by computing the mean difference between the maximum and minimum values of each QoS dimension, with the formula
    F = 1 K i = 1 K max j = 1 , , n q i , j min j = 1 , , n q i , j .
    where K = 5 is the number of QoS evaluation dimensions, and a smaller value indicates a narrower gap between extreme tasks.
  • Jain’s index is calculated based on the equilibrium of task scores across each QoS dimension, with a smaller value representing better fairness under the revised logic and its formula
    J = 1 1 K k = 1 K i = 1 n q i , k 2 n · i = 1 n q i , k 2 .
    which reflects the overall fairness by depicting the distribution equilibrium of scores across dimensions.

5.4.2. Analysis of Fairness Metrics Across Different Resource Scenarios

(1) Performance in Resource-Constrained Scenario
In resource-constrained scenarios, competition for limited service resources among MTs is most intense, leading to notable performance differences among various fairness metrics. All fairness-aware methods significantly outperform the non-optimized benchmark, as quantitatively verified in Table 9 and visually supported by Figure 10 and Figure 11.
As shown in Table 9, the non-optimized benchmark exhibits the poorest performance across all indicators: Euclidean distance reaches 0.358283, the Gini coefficient is as high as 0.962103, and sum of standard deviations reaches 0.545988. Visually, this is reflected in Figure 11 by the tallest bar (representing the largest total sum of standard deviations), and in Figure 10 (where Euclidean, Gini, max–min, and Jain are used to represent Euclidean distance, Gini coefficient, max–min fairness, and Jain’s index, respectively) by the highest bars for the variance of most individual targets (QoS), clearly indicating severe resource allocation imbalance without fairness optimization.
It is noteworthy that while all fairness-aware metrics surpass the non-optimized benchmark, considerable performance variations exist among them, with each excelling in its targeted evaluation metric. According to Table 9: The Euclidean metric achieves the smallest Euclidean distance (0.186091), the lowest sum of standard deviations (0.301243), and the minimal max–min fairness value (0.197342); The Gini metric attains the lowest Gini coefficient (0.57001); The Jain metric records a moderate Jain’s index (0.019652), which is higher than the Euclidean metric’s 0.012722.
These performance differences stem from the distinct optimization focus of each fairness metric, with their strengths and limitations closely aligned with their specific fairness objectives.
(2) Performance in Resource-Moderate Scenario
Under resource-moderate conditions, where resource supply and MT demand are relatively balanced, the performance advantages of various fairness metrics become more pronounced, with the Gini metric demonstrating particularly strong adaptability in this scenario. All fairness-aware methods substantially outperform the non-optimized benchmark, as evidenced by Table 10 and visually confirmed in Figure 12 and Figure 13.
Table 10 shows that the non-optimized benchmark still performs the worst across all metrics: the Euclidean distance reaches 0.387697, the Gini coefficient peaks at 0.999067, and the sum of standard deviations is 0.5987. Visually, Figure 13 shows the tallest bar for the non-optimized benchmark, while Figure 12 shows that across multiple QoS dimensions, the standard deviation bars for the non-optimized benchmark are significantly higher than those of other methods, indicating a significant imbalance in the absence of fairness constraints.
Although all fairness-aware metrics exceed the benchmark, notable performance variations exist among them, with each excelling in its designated evaluation metric, consistent with its optimization orientation. As indicated in Table 10: The Gini metric achieves the smallest Gini coefficient (0.378402), the lowest sum of standard deviations (0.197099), and the minimal max–min fairness value (0.132504); The Euclidean metric records a relatively small Euclidean distance (0.192357); The Jain metric obtains the smallest Jain’s index (0.010251), slightly better than the Euclidean metric’s 0.010431.
These performance discrepancies arise from the distinct optimization objectives of each fairness metric, with their effectiveness closely tied to scenario-specific resource supply-demand characteristics.
(3) Performance in Resource-Sufficient Scenario
In the resource-sufficient scenario, service resources are redundantly supplied—a context where fairness metrics can fully leverage ample resources to refine allocation equilibrium, with the Euclidean metric emerging as the most robust performer. All fairness-aware schemes outpace the non-optimized benchmark by a significant margin, as validated quantitatively in Table 11 and visually in Figure 14 and Figure 15. The non-optimized benchmark remains the poorest performer across all indicators.
While all fairness-aware metrics benefit from ample resources to outperform the benchmark, their performance varies drastically, as each metric’s design aligns differently with the scenario’s low-competition, resource-rich traits. Per Table 11 The Euclidean metric claims the smallest Euclidean distance (0.149901), lowest sum of standard deviations (0.234444), and minimal values for both Gini coefficient (0.41709) and Jain’s index (0.005961); The max–min metric holds a relatively small max–min fairness value (0.175697), though it lags behind the Euclidean metric’s 0.157468; The Gini metric, by contrast, shows diminished advantages here, with its Gini coefficient (0.553687) being the highest among fairness-aware schemes.
These disparities are further reinforced visually: as shown in Figure 14, the Euclidean metric exhibits relatively lower bar heights across most Quality of Service (QoS) dimensions, indicating reduced fluctuations; meanwhile, in Figure 15, its bar for the sum of standard deviations is also the shortest—collectively illustrating how the optimization focus of each fairness metric interacts with resource availability to shape the results.
Synthesizing experimental data and visualization results across the three resource scenarios, the performance of the four fairness metrics can be summarized as follows:
  • Euclidean distance: demonstrates optimal global fairness in both resource-constrained and resource-sufficient scenarios. Its core advantage lies in comprehensively capturing QoS disparities between tasks, making it suitable for these two scenarios with imbalanced resource supply and demand.
  • Gini coefficient: achieves the best performance in the resource-moderate scenario, with a notable effect in suppressing allocation inequality. Applicable to scenarios where resource supply and demand are relatively balanced.
  • Max–min fairness: excels at controlling QoS gaps between extreme tasks, effectively managing fluctuations in extreme dimensions but lacking balance across global dimensions.
  • Jain’s index: exhibits good inter-task equilibrium in single dimensions, yet its ability to control global QoS disparities is inferior to that of the Euclidean cistance metric.

5.4.3. Quantitative Analysis of Fairness-QoS Trade-Off

To explicitly address the trade-off between fairness gains and global QoS performance, we quantitatively compared the core QoS of the fairness-aware MARL method (Euclidean metric as the representative optimal fairness scheme) and the non-fairness-optimized benchmark across three resource scenarios (constrained/moderate/sufficient), with results visualized in Figure 16 and quantified in Table 12.
The trade-off analysis reveals that the proposed fairness-aware MARL method effectively enhances system fairness while preserving the stability of core QoS dimensions. Specifically, across all datasets, the fairness metrics exhibit a statistically significant reduction, with respective improvements of 15.14%, 21.37%, and 4.86%, thus validating the efficacy of the proposed optimization approach. With regard to the core QoS dimensions, for the time and cost metrics that are most sensitive to resource availability, remarkable optimization effects are observed in the D2 and D3 scenarios, where the time is reduced by up to 7.92% and the cost by up to 4.85%. A slight increase in both metrics is noted in the D1 scenario, which reflects the adaptive trade-off capability of the method under varying resource constraint conditions. For the service performance dimensions—including quality, availability, and reliability—consistent positive improvements ranging from 0.13% to 3.36% are achieved after optimization, indicating that fairness optimization is not implemented at the expense of core QoS.
This conclusion is further corroborated by the visualization results. As illustrated in Figure 16, under the three resource scenarios (i.e., tight, moderate, and sufficient), the blue radar polygon representing the fairness-optimized method largely overlaps with the red polygon representing the non-optimized benchmark. Although minor fluctuations are observed in the time and cost coordinates in certain scenarios, the concurrent improvements in the quality, availability, and reliability coordinates ensure the stability of the overall service profile. This visually verifies that the introduction of fairness objectives does not compromise the global QoS. In summary, both the experimental data and graphical analysis demonstrate that the proposed fairness-aware multi-agent reinforcement learning method can effectively enhance system fairness without degrading global QoS. These two objectives are not mutually exclusive; instead, they are synergistically optimized via the hierarchical agent architecture and attention mechanisms embedded in the model.

5.4.4. Ablation Experiment

To further investigate the contributions of key components within the Multi-Agent Reinforcement Learning (MARL) framework to system performance—particularly in balancing global efficiency and inter-task fairness—this section conducts an ablation experiment, focusing on analyzing the roles of the attention mechanism and the global coordination mechanism. By systematically deactivating each mechanism, this study quantifies their individual and combined effects on fairness metrics and QoS.
(1) Experimental Setup
To evaluate the individual and combined effects of the attention mechanism and the global coordination mechanism on fairness optimization, four configuration combinations were designed for the experiment. Here, “A” denotes the attention mechanism, “C” denotes the global coordinator agent network, “+” indicates enabled, and “−” indicates disabled. The first configuration enables both mechanisms (A+C+), achieving dual-dimensional synergistic optimization. The second configuration enables only the attention mechanism (A+C−). The third configuration enables only the global coordinator network (A−C+). The fourth configuration disables both mechanisms simultaneously (A−C−). It is important to note that even in the fourth configuration, the system still performs fairness optimization through the fairness strength coefficient in the coordination parameters, integrating fairness metrics into the reward function with a specific weight and performing dynamic adjustment via a reward correction mechanism. Therefore, this configuration does not serve as a baseline control group that completely ignores fairness; rather, it retains fairness optimization at the reward level, thus still reflecting the impact of fairness constraints on system performance in the ablation experiment.
(2) Results Analysis and Discussion The experimental results are shown in Figure 17 and Table 13. Under tight resource constraints (D1), fairness values are relatively low (0.0900–0.1058), indicating a good level of fairness, while fitness values remain stable (0.6722–0.6814). The impact of different mechanism combinations is notable in this scenario: the A+C+ configuration achieves the best performance, with the lowest fairness mean (0.0900) and the highest fitness mean (0.6814). When only a single mechanism is enabled (A+C− or A−C+), fairness slightly decreases and fitness slightly drops. Disabling both mechanisms (A−C−) yields the poorest fairness (0.1058) and fitness (0.6722), demonstrating a clear synergistic effect between the two mechanisms under strict resource limitations.
Under moderate resource constraints (D2), performance differences among mechanism configurations are relatively pronounced. The A+C+ configuration achieves the highest fitness (0.8431) while maintaining good fairness (0.0938). The configuration with only the attention mechanism enabled (A+C−) attains the best fairness (0.0884) and near-optimal fitness (0.8414), indicating the critical role of the attention mechanism in improving fairness under this scenario. When only the global coordinator is enabled (A−C+), fairness is 0.0923, but fitness is the lowest (0.8389), accompanied by significant fairness fluctuation (std 0.0243), suggesting insufficient stability when used alone. Disabling both mechanisms (A−C−) yields the poorest performance in both fairness (0.1194) and fitness (0.8159), further underscoring their importance even under moderate resource conditions.
Under abundant resource constraints (D3), performance differences across configurations are more subdued, especially in fitness. The A−C+ configuration achieves the best fairness (0.0962) and high fitness (0.8515), indicating that the global coordinator remains effective in promoting fairness even with ample resources. The A+C+ configuration attains the highest fitness (0.8523) with acceptable fairness (0.1014) and exhibits the most stable outputs (fairness std 0.0080, fitness std 0.0051). The A+C− configuration shows moderate and consistent results in both metrics. The configuration with both mechanisms disabled (A−C−) again performs the worst, with fairness at 0.1268 and fitness at 0.8208, highlighting the foundational role of these mechanisms across varying resource conditions.
Synthesizing the experimental results across the three resource scenarios reveals that the performance of mechanism configurations is highly sensitive to the level of resource constraints. Under tight constraints, the combined use of both mechanisms (A+C+) is essential for achieving optimal fairness and fitness. In moderate conditions, the attention mechanism alone (A+C−) plays a dominant role in fairness optimization, while the global coordinator introduces instability when used in isolation. When resources are abundant, the system performance becomes less dependent on either mechanism, allowing for flexible or simplified configurations. These findings underscore that the severity of resource constraints should guide the selection and deployment of coordination mechanisms in practice.

6. Conclusions

This study focuses on the multi-task CMSC problem in CMPs, addressing the core limitations of existing research, such as inadequate adaptation to multi-task resource competition scenarios and the lack of fairness in QoS allocation among MTs. A dual-objective optimization model integrating global MS QoS and inter-task fairness is constructed, and a Fairness-Aware MARL algorithm is proposed. Leveraging the CTDE framework and adopting the MAPPO solution, this algorithm reformulates the problem as a POMDP. In terms of architectural design, a centralized control agent is introduced into the MARL framework, where each task allocation process is mapped to an independent task agent. The centralized control agent can real-time monitor the global state and dynamically adjust the resource selection probability of each task agent. MARL under the CTDE framework enables collaborative decision-making among multiple task agents, breaking through the fairness defects caused by sequential selection in single-agent inter-task allocation. It improves the fairness of inter-task resource allocation without sacrificing the autonomy of distributed decision-making of task agents, achieving a balanced optimization of global efficiency and inter-task fairness. Meanwhile, at the mechanism level, a two-layer attention mechanism is applied in the MARL framework: task-level self-attention perceives the correlation among subtasks within a single task, while global state self-attention focuses on key global features. This mechanism realizes precise synergy between local task features and global resource states, effectively enhancing the decision-making accuracy of the system in high-dimensional and complex scenarios, and ultimately optimizing fairness and global QoS efficiency in multi-task CMSC scenarios.
Nonetheless, there are limitations to the present research: the existing framework only prioritizes CMP resource efficiency and the QoS fairness of MT demanders, while neglecting to incorporate the core demands of MS providers (e.g., operational costs, profit distribution, load balancing, and long-term stability). A decline in QoS or the withdrawal of providers will undermine the sustainability of the CMP ecosystem. Going forward, a multi-objective optimization model balancing the interests of three parties (CMP operators, MT demanders, and MS providers) will be developed, incorporating hard constraints such as providers’ load upper limits and profit baselines. Dynamic game and incentive-compatible scheduling mechanisms will be introduced to coordinate the interest conflicts among the three parties. Combined with provider heterogeneity and dynamic scenarios, quantitative methods for multi-dimensional fairness (QoS fairness of demanders, profit/load fairness of providers) will be explored, ultimately achieving a dynamic balance of interests among the three parties and enhancing the adaptability and robustness of the model and algorithm in the actual CMP ecosystem.

Author Contributions

Conceptualization, Z.F.; Methodology, Z.F.; Software, Z.F. and Y.Y.; Validation, Z.F., Y.Y., D.F. and D.L.; Formal analysis, Z.F.; Investigation, Z.F.; Resources, Z.F.; Data curation, Z.F.; Writing—original draft, Z.F.; Writing—review & editing, Z.F. and Q.C.; Visualization, Z.F.; Supervision, Z.F.; Project administration, Z.F.; Funding acquisition, Z.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Anhui Provincial University Key Research Project (grant number: 2022AH051719), the Anhui Provincial University Outstanding Research and Innovation Team Program (grant number: 2024AH010022), and the National Natural Science Foundation of China (grant number: 62301087).

Data Availability Statement

The dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CMPCloud Manufacturing Platform
CMSCCloud Manufacturing Service Composition
QoSQuality of Service
MTManufacturing Task
MSManufacturing Service
SCSservice composition solutions
MARLMulti-Agent Reinforcement Learning
CTDECentralized Training with Decentralized Execution
MOPSOPareto-based Particle Swarm Optimization
NSGA-IINon-dominated Sorting Genetic Algorithm II
W-PSOWeighted-based Particle Swarm Optimization
W-GAWeighted-based Genetic Algorithm

References

  1. Tao, F.; Cheng, Y.; Xu, L.D.; Zhang, L.; Li, B.H. CCIoT-CMfg: Cloud Computing and Internet of Things-Based Cloud Manufacturing Service System. IEEE Trans. Ind. Inform. 2014, 10, 1435–1442. [Google Scholar]
  2. Wang, F.; Laili, Y.; Zhang, L. Trust Evaluation for Service Composition in Cloud Manufacturing Using GRU and Association Analysis. IEEE Trans. Ind. Inform. 2023, 19, 1912–1922. [Google Scholar] [CrossRef]
  3. Li, Y.; Yao, X.; Liu, M. Multiobjective Optimization of Cloud Manufacturing Service Composition with Improved Particle Swarm Optimization Algorithm. Math. Probl. Eng. 2020, 9186023, 1–17. [Google Scholar] [CrossRef]
  4. Amato, A.; Venticinque, S. Multiobjective Optimization for Brokering of Multicloud Service Composition. ACM Trans. Internet Technol. 2016, 16, 1–20. [Google Scholar] [CrossRef]
  5. Hayyolalam, V.; Pourghebleh, B.; Chehrehzad, M.R.; Pourhaji Kazem, A.A. Single-objective service composition methods in cloud manufacturing systems: Recent techniques, classification, and future trends. Concurr. Comput. Pract. Exp. 2022, 34, e6698. [Google Scholar] [CrossRef]
  6. Akbaripour, H.; Pourhaji Kazem, M.H. Service composition and optimal selection in cloud manufacturing: Landscape analysis and optimization by a hybrid imperialist competitive and local search algorithm. Neural Comput. Appl. 2020, 32, 10873–10894. [Google Scholar] [CrossRef]
  7. Que, Y.; Zhong, W.; Chen, H.; Chen, X.; Ji, X. Improved adaptive immune genetic algorithm for optimal QoS-aware service composition selection in cloud manufacturing. Int. J. Adv. Manuf. Technol. 2018, 96, 4455–4465. [Google Scholar] [CrossRef]
  8. Yang, B.; Wang, S.; Li, S.; Jin, T. A robust service composition and optimal selection method for cloud manufacturing. Int. J. Prod. Res. 2022, 60, 1134–1152. [Google Scholar] [CrossRef]
  9. Atmani, N.; Khanouche, M.E.; Belaid, A.; Amirat, Y.; Chibani, A. Machine learning-based reduction and unbiased reinforcement learning method for QoS-aware services composition in IoT environments. Computing 2025, 107, 127. [Google Scholar] [CrossRef]
  10. Wang, X.; Laili, Y.; Zhang, L.; Liu, Y. Hybrid Task Scheduling in Cloud Manufacturing with Sparse-Reward Deep Reinforcement Learning. IEEE Trans. Autom. Sci. Eng. 2025, 22, 1878–1892. [Google Scholar] [CrossRef]
  11. Xiao, J.; Cai, Y.; Chen, Y. Study on deep reinforcement learning for multi-task scheduling in cloud manufacturing. Int. J. Comput. Integr. Manuf. 2025, 38, 1663–1680. [Google Scholar] [CrossRef]
  12. Li, C.; Liu, L.; Shi, L. Review of Cloud Service Composition for Intelligent Manufacturing. J. Ind. Inf. Integr. 2021, 23, 100211. [Google Scholar]
  13. Yang, Y.; Yang, B.; Wang, S.; Jin, T.; Li, S. An enhanced multi-objective grey wolf optimizer for service composition in cloud manufacturing. Appl. Soft Comput. 2019, 85, 106003. [Google Scholar] [CrossRef]
  14. Yin, C.; Li, S.; Li, X. An optimization method of cloud manufacturing service composition based on matching-collaboration degree. Int. J. Adv. Manuf. Technol. 2024, 131, 343–353. [Google Scholar] [CrossRef]
  15. Jin, H.; Jiang, C.; Lv, S.; He, H.; Liao, X. A hybrid teaching-learning-based optimization algorithm for QoS-aware manufacturing cloud service composition. Computing 2022, 104, 2489–2509. [Google Scholar] [CrossRef]
  16. Hu, Q.; Qi, H.; Jia, Y.; Qu, L. A two-phase method to optimize service composition in cloud manufacturing. Computing 2024, 106, 2261–2291. [Google Scholar] [CrossRef]
  17. Gao, Y.; Yang, B.; Wang, S.; Fu, G.; Zhou, P. A multi-objective service composition method considering the interests of tri-stakeholders in cloud manufacturing based on an enhanced jellyfish search optimizer. J. Comput. Sci. 2023, 67, 101934. [Google Scholar] [CrossRef]
  18. Hyder, M.T.; Lobo, C.; Madupuru, T.S.; Sudarshan, S.; Sodahi, M.; Valilai, O.F. Enabling robust service composition in cloud manufacturing with sustainability considerations. In Proceedings of the 2021 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 13–16 December 2021; pp. 792–796. [Google Scholar]
  19. Li, H.; Jiang, R.; Ge, S. Researches on Manufacturing Cloud Service Composition and Optimization Approach Supporting for Service Statistic Correlation. In Proceedings of the 2014 26th Chinese Control and Decision Conference (CCDC), Changsha, China, 31 May–2 June 2014; pp. 4149–4154. [Google Scholar]
  20. Chen, Y.; Liu, J.; Ling, L.; Wang, L. Parallel manufacturing cloud service composition algorithm based on collaborative effect. Comput. Integr. Manuf. Syst. 2019, 25, 137–146. [Google Scholar]
  21. Wang, Y.; Gao, S.; Wang, S.; Zimmermann, R. An Adaptive Multiobjective Multitask Service Composition Approach Considering Practical Constraints in Fog Manufacturing. IEEE Trans. Ind. Inform. 2022, 18, 6756–6766. [Google Scholar] [CrossRef]
  22. Zhu, J.; Yu, Z.; Wang, T.; Zhang, Y.; Jiang, B. A Double Layer Coding-based Multi-task Service Composition Method in Cloud Manufacturing. In Proceedings of the International Conference on Cloud Computing, Melbourne, Australia, 9–10 December 2021; pp. 1–6. [Google Scholar]
  23. Liu, Y.; Ping, Y.; Zhang, L.; Wang, L.; Xu, X. Scheduling of decentralized robot services in cloud manufacturing with deep reinforcement learning. Robot.-Comput.-Integr. Manuf. 2023, 80, 102454. [Google Scholar] [CrossRef]
  24. Ping, Y.; Liu, Y.; Zhang, L.; Wang, L.; Xu, X. Sequence generation for multi-task scheduling in cloud manufacturing with deep reinforcement learning. J. Manuf. Syst. 2023, 67, 315–337. [Google Scholar] [CrossRef]
  25. Hu, Y.; Yang, Y.; Wu, F. Dynamic cloud manufacturing service composition with re-entrant services: An online policy perspective. Int. J. Prod. Res. 2024, 62, 3263–3287. [Google Scholar] [CrossRef]
  26. Xiong, W.; Wang, Y.; Gao, S.; Huang, X.; Wang, S. A multi-objective service composition optimization method considering multi-user benefit and adaptive resource partitioning in hybrid cloud manufacturing. J. Ind. Inf. Integr. 2024, 38, 100564. [Google Scholar] [CrossRef]
  27. Xie, N.; Tan, W.; Zheng, X.; Zhao, L.; Huang, L.; Sun, Y. An efficient two-phase approach for reliable collaboration-aware service composition in cloud manufacturing. J. Ind. Inf. Integr. 2021, 23, 100211. [Google Scholar] [CrossRef]
  28. Wang, Y.; Wang, S.; Yang, B.; Gao, B.; Wang, S. An effective adaptive adjustment method for service composition exception handling in cloud manufacturing. J. Intell. Manuf. 2022, 33, 735–751. [Google Scholar] [CrossRef]
  29. Zhang, Q.; Li, S.; Pu, R.; Zhou, P.; Chen, G.; Li, K.; Lv, D. An adaptive robust service composition and optimal selection method for cloud manufacturing based on the enhanced multi-objective artificial hummingbird algorithm. Expert Syst. Appl. 2024, 244, 122823. [Google Scholar] [CrossRef]
  30. Deng, L.; Shu, T.; Xia, J. Multi-Strategy Improved Artificial Rabbit Algorithm for QoS-Aware Service Composition in Cloud Manufacturing. Algorithms 2025, 18, 107. [Google Scholar] [CrossRef]
  31. Jing, W.; Zhang, Y.; Chen, Y.; Zhang, H.; Huang, W. Cloud-edge collaboration composition and scheduling for flexible manufacturing service with a multi-population co-evolutionary algorithm. Robot.-Comput.-Integr. Manuf. 2024, 90, 102814. [Google Scholar] [CrossRef]
  32. Chen, K.; Wang, T.; Zhuo, H.; Cheng, L. WGO: A similarly encoded whale-goshawk optimization algorithm for uncertain cloud manufacturing service composition. Auton. Intell. Syst. 2025, 5, 1–13. [Google Scholar] [CrossRef]
  33. Zou, Y.; Qi, N.; Deng, Y.; Xue, Z.; Gong, M.; Zhang, W. Autonomous Resource Management in Microservice Systems via Reinforcement Learning. In Proceedings of the 2025 8th International Conference on Computer Information Science and Application Technology (CISAT), Kunming, China, 11–13 July 2025; pp. 991–995. [Google Scholar]
  34. Boudour, R.; Aklouf, Y.; Fekir, N.C.; Benmessai, S. Multi-Objective Reinforcement Learning for QoS-Aware Optimization of Multi-Cloud IoT Service Selection. In Proceedings of the 2024 Global Congress on Emerging Technologies (GCET), Gran Canaria, Spain, 9–11 December 2024; pp. 43–50. [Google Scholar]
  35. Chen, Z.; Zhang, L.; Laili, Y.; Wang, X.; Wang, F. Online simulation task scheduling in cloud manufacturing with cross attention and deep reinforcement learning. J. Intell. Manuf. 2025, 36, 5779–5800. [Google Scholar] [CrossRef]
  36. Yu, C.; Velu, A.; Vinitsky, E.; Gao, J.; Wang, Y.; Bayen, A.; Wu, Y. The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games. Adv. Neural Inf. Process. Syst. 2022, 35, 24611–24624. [Google Scholar]
  37. Shi, Y.; Eberhart, R.C. A modified particle swarm optimizer. In Proceedings of the 1998 IEEE International Conference on Evolutionary Computation, Anchorage, AK, USA, 4–9 May 1998; pp. 69–73. [Google Scholar]
  38. Qi, A.; Zhao, D.; Heidari, A.A.; Liu, L.; Chen, Y.; Chen, H. FATA: An efficient optimization method based on geophysics. Neurocomputing 2024, 607, 128289. [Google Scholar] [CrossRef]
  39. Hashim, F.A.; Hussien, A.G. Snake Optimizer: A novel meta-heuristic optimization algorithm. Knowl.-Based Syst. 2022, 242, 108320. [Google Scholar] [CrossRef]
  40. Storn, R.; Price, K. Differential Evolution—A simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
  41. Zhang, Q.; Li, H. MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition. IEEE Trans. Evol. Comput. 2007, 11, 712–731. [Google Scholar] [CrossRef]
Figure 1. Process of CMSC.
Figure 1. Process of CMSC.
Symmetry 18 00238 g001
Figure 2. Fitness curves of MARL and comparative algorithms under different data scales (S1, S2, S3). (a) Fitness curve under S1 scenario. (b) Fitness curve under S2 scenario. (c) Fitness curve under S3 scenario.
Figure 2. Fitness curves of MARL and comparative algorithms under different data scales (S1, S2, S3). (a) Fitness curve under S1 scenario. (b) Fitness curve under S2 scenario. (c) Fitness curve under S3 scenario.
Symmetry 18 00238 g002
Figure 3. Fitness distribution of MARL and comparative algorithms under S1 scenario. (a) Comparison of final fitness distributions. (b) Mean final fitness with standard deviation.
Figure 3. Fitness distribution of MARL and comparative algorithms under S1 scenario. (a) Comparison of final fitness distributions. (b) Mean final fitness with standard deviation.
Symmetry 18 00238 g003
Figure 4. Fitness distribution of MARL and comparative algorithms under S2 scenario. (a) Comparison of final fitness distributions. (b) Mean final fitness with standard deviation.
Figure 4. Fitness distribution of MARL and comparative algorithms under S2 scenario. (a) Comparison of final fitness distributions. (b) Mean final fitness with standard deviation.
Symmetry 18 00238 g004
Figure 5. Fitness distribution of MARL and comparative algorithms under S3 scenario. (a) Comparison of final fitness distributions. (b) Mean final fitness with standard deviation.
Figure 5. Fitness distribution of MARL and comparative algorithms under S3 scenario. (a) Comparison of final fitness distributions. (b) Mean final fitness with standard deviation.
Symmetry 18 00238 g005
Figure 6. Average fitness, standard deviation, and degradation rate under different failure probabilities.
Figure 6. Average fitness, standard deviation, and degradation rate under different failure probabilities.
Symmetry 18 00238 g006
Figure 7. Fitness Distribution under different failure probabilities.
Figure 7. Fitness Distribution under different failure probabilities.
Symmetry 18 00238 g007
Figure 8. Average fitness, standard deviation, and degradation rate when MSs degrade with different probabilities.
Figure 8. Average fitness, standard deviation, and degradation rate when MSs degrade with different probabilities.
Symmetry 18 00238 g008
Figure 9. Fitness distribution rate when services degrade with different probabilities.
Figure 9. Fitness distribution rate when services degrade with different probabilities.
Symmetry 18 00238 g009
Figure 10. QoS dimension change diagram in resource-constrained scenario.
Figure 10. QoS dimension change diagram in resource-constrained scenario.
Symmetry 18 00238 g010
Figure 11. Comparison diagram of sum of standard deviations in resource-constrained scenario.
Figure 11. Comparison diagram of sum of standard deviations in resource-constrained scenario.
Symmetry 18 00238 g011
Figure 12. QoS dimension change diagram in resource-moderate scenario.
Figure 12. QoS dimension change diagram in resource-moderate scenario.
Symmetry 18 00238 g012
Figure 13. Comparison diagram of sum of standard deviations in resource-moderate scenario.
Figure 13. Comparison diagram of sum of standard deviations in resource-moderate scenario.
Symmetry 18 00238 g013
Figure 14. QoS dimension change diagram in resource-sufficient scenario.
Figure 14. QoS dimension change diagram in resource-sufficient scenario.
Symmetry 18 00238 g014
Figure 15. Comparison diagram of sum of standard deviations in resource-sufficient scenario.
Figure 15. Comparison diagram of sum of standard deviations in resource-sufficient scenario.
Symmetry 18 00238 g015
Figure 16. Comparison of target values with and without fairness considerations.
Figure 16. Comparison of target values with and without fairness considerations.
Symmetry 18 00238 g016
Figure 17. Impact of attention mechanisms and global coordination under different resource scenarios (D1, D2, D3). (a) Under D1 scenario. (b) Under D2 scenario. (c) Under D3 scenario.
Figure 17. Impact of attention mechanisms and global coordination under different resource scenarios (D1, D2, D3). (a) Under D1 scenario. (b) Under D2 scenario. (c) Under D3 scenario.
Symmetry 18 00238 g017
Table 1. Experimental dataset configuration.
Table 1. Experimental dataset configuration.
Datasetnm M j Resource Tightness
S1 (Small-scale)5320Medium
S2 (Medium-scale)10640Medium
S3 (Large-scale)201280Medium
D1 (Tight resources)10615High
D2 (Moderate resources)10640Medium
D3 (Sufficient resources)10665Low
Table 2. QoS generation distributions by service tier.
Table 2. QoS generation distributions by service tier.
Service TierCostTimeQualityReliabilityAvailability
High-quality[10.00, 40.00][1.00, 8.00][0.9000, 0.9900][0.9200, 0.9900][0.9200, 0.9900]
Medium-quality[40.00, 70.00][8.00, 14.00][0.8000, 0.9000][0.8500, 0.9200][0.8800, 0.9200]
General-quality[70.00, 100.00][14.00, 20.00][0.7000, 0.8000][0.8000, 0.8500][0.8500, 0.8800]
Table 3. Convergence speed and average fitness value under different data scales.
Table 3. Convergence speed and average fitness value under different data scales.
AlgorithmAverage Convergence IterationsAverage Fitness Value
S1S2S3S1S2S3
DE45.619.644.20.5675250.5111480.499896
FATA63.240.624.60.5412610.51110.500367
MOEA/D153.4106.859.40.5843520.5682390.529619
PSO3651.4320.5727380.5285790.50227
SO646.2626.8642.80.5544280.5316440.515741
RL569.4912.419990.581660.57640.54674
MARL74177210250.587660.592820.58508
Table 4. Fitness distribution under S1 scenario.
Table 4. Fitness distribution under S1 scenario.
AlgorithmMaxMinMeanStd
DE0.6029530.5220710.5675250.027078
FATA0.5558040.5172710.5412610.015717
MOEA/D0.6083230.5463880.5843520.020631
PSO0.5953540.5514670.5727380.017077
SO0.5751560.5412150.5544280.011633
RL0.60550.55170.581660.018997
MARL0.59310.57630.587660.006082
Table 5. Fitness distribution under S2 scenario.
Table 5. Fitness distribution under S2 scenario.
AlgorithmMaxMinMeanStd
DE0.5220890.4974380.5111480.009807
FATA0.5263260.4973550.51110.011054
MOEA/D0.5792870.5543720.5682390.010653
PSO0.5404980.5131610.5285790.009883
SO0.5474090.5229170.5316440.009098
RL0.59580.56130.57640.012367
MARL0.60920.57960.592820.011101
Table 6. Fitness distribution under S3 scenario.
Table 6. Fitness distribution under S3 scenario.
AlgorithmMaxMinMeanStd
DE0.5026950.4971950.4998960.002058
FATA0.5061680.4964660.5003670.003523
MOEA/D0.5329270.5260120.5296190.002771
PSO0.5076430.4995850.502270.003089
SO0.5196520.5100190.5157410.003622
RL0.59420.48130.546740.039822
MARL0.59060.58070.585080.003375
Table 7. Numerical results under different service failure probabilities.
Table 7. Numerical results under different service failure probabilities.
AlgorithmFailure Probability (%)Average FitnessStandard DeviationDegradation Rate (%)
DE00.6749290.0020050
100.6461830.0293374.26
200.6130150.0382759.17
300.5898760.05435212.6
400.5151240.05103223.68
500.4753760.06434229.57
FATA00.65318400
100.630110.0284893.53
200.6058580.0356737.25
300.5865310.05244410.2
400.5149310.05104621.17
500.4765350.06411927.04
MARL00.73960.0030560
100.701660.0344745.13
200.664560.04354210.15
300.634780.06480814.17
400.54310.05720426.57
500.4846290.06307234.47
SO00.67911900
100.6494850.028334.36
200.616340.0368419.24
300.5920280.05366412.82
400.5164440.05082623.95
500.4763660.06441129.86
PSO00.6781410.0011510
100.6489750.028384.3
200.6159560.0368779.17
300.5909220.05360312.86
400.5156280.0512123.96
500.4753760.06434229.9
RL00.6627020.0059410
100.6350520.0256024.17
200.6046810.0321648.76
300.5849660.05548911.73
400.5071740.04928723.47
500.4713880.06480728.87
MOEA/D00.67871600
100.6480610.0287264.52
200.6157180.0368169.28
300.5881320.05337313.35
400.5144270.05102124.21
500.4753760.06434229.96
Table 8. Numerical results under different service qos degradation probabilities.
Table 8. Numerical results under different service qos degradation probabilities.
AlgorithmQoS Degradation Probability (%)Average FitnessStandard DeviationDegradation Rate (%)
DE00.8015230.0032660
100.7613190.0079635.02
200.7328160.0063618.57
300.6914750.01122613.73
400.6390340.01173820.27
500.5959090.01736725.65
FATA00.7734750.0022750
100.757670.0044372.04
200.7278330.0050585.9
300.6894080.01011210.87
400.6364160.01167517.72
500.602890.01068622.05
MARL00.8898610.0013360
100.8976580.002507−0.88
200.8931540.007904−0.37
300.8691510.0129372.33
400.8414830.0232925.44
500.7948020.01926810.68
SO00.8202600
100.80054502.4
200.79066403.61
300.74733208.89
400.703659014.22
500.67788017.36
PSO00.8533430.0006020
100.8476330.0056460.67
200.7719140.0074339.54
300.724580.00553715.09
400.6955110.01390618.5
500.6432960.01560524.61
RL00.8208980.0113140
100.8107840.0055841.23
200.7897170.0109733.8
300.7564020.0241187.86
400.7533090.0188328.23
500.739190.0357919.95
MOEA/D00.8253320.0044360
100.7979090.0096163.32
200.7780490.005445.73
300.7560340.0022798.4
400.7343680.00825311.02
500.7025990.01039514.87
Table 9. Fairness indicators of each algorithm in resource-constrained scenario.
Table 9. Fairness indicators of each algorithm in resource-constrained scenario.
AlgorithmEuclidean
Distance
Gini
Coefficient
Max–Min
Fairness
Jain’s
Index
Sum of
Standard Deviations
Non-optimized0.3582830.9621030.3681440.0424850.545988
Gini0.2704880.570010.2676230.029150.360559
Max–Min0.2314580.6849460.2092650.0184380.333134
Euclidean0.1860910.6123620.1973420.0127220.301243
Jain’s Index0.2311830.6939170.2586270.0196520.363327
Table 10. Fairness indicators of each algorithm in resource-moderate scenario.
Table 10. Fairness indicators of each algorithm in resource-moderate scenario.
AlgorithmEuclidean
Distance
Gini
Coefficient
Max–Min
Fairness
Jain’s
Index
Sum of
Standard Deviations
non-optimized0.3876970.9990670.4023760.0572690.5987
Gini0.1393510.3784020.1325040.0056440.197099
Max–Min0.211520.6016070.189640.0121640.313852
Euclidean0.1923570.5593230.2004430.0104310.298049
Jain0.2026080.5547140.2043250.0102510.314178
Table 11. Fairness indicators of each algorithm in resource-sufficient scenario.
Table 11. Fairness indicators of each algorithm in resource-sufficient scenario.
AlgorithmEuclidean
Distance
Gini
Coefficient
Max–Min
Fairness
Jain’s
Index
Sum of
Standard Deviations
non-optimized0.3377990.9544760.3310970.0396240.512319
Gini0.2062210.5536870.2052720.0114170.30907
Max–Min0.1861360.4767740.1756970.0098580.261468
Euclidean0.1499010.417090.1574680.0059610.234444
Jain0.1790770.4773960.1656430.0072570.271693
Table 12. Global QoS performance comparison (fairness-aware vs. non-fairness-optimized).
Table 12. Global QoS performance comparison (fairness-aware vs. non-fairness-optimized).
DataSetQoS DimensionNon-OptimizedOptimizedImprovement (%)
D1Time10.180310.9408−7.47
Cost50.642351.8315−2.35
Quality0.84450.84660.25
Availability0.90510.90740.25
Reliability0.91520.91640.13
Fairness0.35830.30415.14
D2Time11.144210.26187.92
Cost53.24951.12733.98
Quality0.84460.86051.88
Availability0.89640.91582.16
Reliability0.91360.91890.58
Fairness0.38010.298921.37
D3Time10.08689.50625.76
Cost54.454351.8114.85
Quality0.84220.87033.34
Availability0.89790.92813.36
Reliability0.92010.92460.49
Fairness0.3310.31494.86
Table 13. Results of ablation experiments.
Table 13. Results of ablation experiments.
Constraint FairnessMeanFairnessStdFitnessMeanFitnessStd
TightA+C+0.09000.00700.68140.0056
A+C−0.09590.01220.68000.0044
A−C+0.10250.01300.67740.0067
A−C−0.10580.00990.67220.0101
ModerateA+C+0.09380.01130.84310.0022
A+C−0.08840.01430.84140.0066
A−C+0.09230.02430.83890.0076
A−C−0.11940.00640.81590.0059
AdequateA+C+0.10140.0080 0.85230.0051
A+C−0.10510.01620.84950.0072
A−C+0.09620.01520.85150.0080
A−C−0.12680.01140.82080.0069
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fang, Z.; Ying, Y.; Cao, Q.; Fang, D.; Lu, D. A Multi-Task Service Composition Method Considering Inter-Task Fairness in Cloud Manufacturing. Symmetry 2026, 18, 238. https://doi.org/10.3390/sym18020238

AMA Style

Fang Z, Ying Y, Cao Q, Fang D, Lu D. A Multi-Task Service Composition Method Considering Inter-Task Fairness in Cloud Manufacturing. Symmetry. 2026; 18(2):238. https://doi.org/10.3390/sym18020238

Chicago/Turabian Style

Fang, Zhou, Yanmeng Ying, Qian Cao, Dongsheng Fang, and Daijun Lu. 2026. "A Multi-Task Service Composition Method Considering Inter-Task Fairness in Cloud Manufacturing" Symmetry 18, no. 2: 238. https://doi.org/10.3390/sym18020238

APA Style

Fang, Z., Ying, Y., Cao, Q., Fang, D., & Lu, D. (2026). A Multi-Task Service Composition Method Considering Inter-Task Fairness in Cloud Manufacturing. Symmetry, 18(2), 238. https://doi.org/10.3390/sym18020238

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop