Next Article in Journal
Research on Energy Consumption Optimization Strategies of Robot Joints Based on NSGA-II and Energy Consumption Mapping
Previous Article in Journal
Investigating Learning Assistance by Demonstration for Robotic Wheelchairs: A Simulation Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Federated Learning for Distributed Multi-Robotic Arm Trajectory Optimization

College of Mechanical Engineering, Donghua University, Songjiang District, Shanghai 201620, China
*
Author to whom correspondence should be addressed.
Robotics 2025, 14(10), 137; https://doi.org/10.3390/robotics14100137
Submission received: 14 August 2025 / Revised: 19 September 2025 / Accepted: 25 September 2025 / Published: 29 September 2025
(This article belongs to the Section Sensors and Control in Robotics)

Abstract

The optimization of trajectories for multiple robotic arms in a shared workspace is critical for industrial automation but presents significant challenges, including data sharing, communication overhead, and adaptability in dynamic environments. Traditional centralized control methods require sharing raw sensor data, raising concerns and creating computational bottlenecks. This paper proposes a novel Federated Learning (FL) framework for distributed multi-robotic arm trajectory optimization. Our method enables collaborative learning where robots train a shared model locally and only exchange gradient updates, preserving data privacy. The framework integrates an adaptive Rapidly exploring Random Tree (RRT) algorithm enhanced with a dynamic pruning strategy to reduce computational overhead and ensure collision-free paths. Real-time synchronization is achieved via EtherCAT, ensuring precise coordination. Experimental results demonstrate that our approach achieves a 17% reduction in average path length, a 22% decrease in collision rate, and a 31% improvement in planning speed compared to a centralized RRT baseline, while reducing inter-robot communication overhead by 45%. This work provides a scalable and efficient solution for collaborative manipulation in applications ranging from assembly lines to warehouse automation.

1. Introduction

Robotic manipulators have become indispensable in modern industrial automation, performing complex tasks such as assembly, welding, and material handling with high precision. However, as industries increasingly adopt multi-robotic systems for collaborative operations, optimizing their trajectories in shared workspaces presents significant challenges.
A critical limitation in multi-robotic systems is the need for centralized control, which often requires sharing raw sensor data among manipulators, raising concerns about data privacy and communication overhead. Federated Learning (FL) offers a promising solution by enabling decentralized model training, where manipulators collaboratively optimize trajectories without exchanging raw data [1]. Instead, local updates from reinforcement learning (RL) agents are aggregated into a global model, preserving privacy while improving adaptability [2].
Traditional trajectory planning methods, such as Rapidly exploring Random Trees (RRT) and Probabilistic Roadmaps (PRM), focus on single-robot systems and often struggle with dynamic environments, real-time coordination, and computational efficiency when scaled to multiple robots [3]. Moreover, ensuring collision-free motion while maintaining synchronization and minimizing latency factors remains an open research problem.
However, implementing FL in multi-robotic trajectory optimization introduces challenges such as handling heterogeneous manipulator configurations, maintaining synchronization in dynamic environments, and balancing communication efficiency with real-time performance.
This paper proposes a Federated Learning-based framework for optimizing trajectories for multiple robotic arms in a shared workspace while mitigating the critical limitations of centralized control, namely data privacy risks, high communication overhead, and lack of adaptability in dynamic environments. We distributed multi-robotic arm trajectory optimization, addressing these challenges through a hybrid approach combining FL with sampling-based motion planning. Our contributions include the following:
  • A privacy-preserving FL architecture where robots train shared trajectory models locally and contribute only gradient updates, eliminating the need for raw data exchange.
  • An adaptive RRT algorithm enhanced with pruning optimization to reduce computational overhead while ensuring collision-free paths in dynamic environments.
  • Real-time synchronization via EtherCAT, ensuring precise coordination among manipulators with minimal latency [4].
  • Integration of multi-sensor fusion such as vision, encoders, force feedback for robust object localization and trajectory adaptation.
Our experiments demonstrate that the proposed framework achieves superior trajectory efficiency, collision avoidance, and scalability compared to centralized and standalone RL approaches. Applications span industrial assembly lines, surgical robotics, and warehouse automation, where collaborative manipulation in constrained spaces is critical. The simulations have undergone significant revisions to improve clarity, technical depth, and organization. The algorithms were restructured to concisely present the core goal of optimizing multi-robotic arm trajectories using Federated Learning (FL) while preserving data privacy, along with key challenges like synchronization in dynamic environments and handling heterogeneous robot configurations. The introduction was substantially expanded to provide stronger motivation for the work, clearly outlining the limitations of existing centralized approaches and explaining how FL enables collaborative learning without raw data exchange [5]. A dedicated contributions subsection was added to highlight the paper’s novel elements, including the FL-based optimization framework, adaptive RRT algorithm with pruning optimization, and real-time synchronization through EtherCAT.
Technical sections were enhanced with deeper explanations of the federated learning architecture, particularly how local reinforcement learning updates are securely aggregated into a global model [6,7]. The trajectory optimization methodology now includes more detailed discussions of sampling-based techniques like RRT and PRM, with emphasis on the improved adaptive RRT algorithm that incorporates dynamic pruning to reduce computational overhead.

2. Synchronization of Multiple Manipulators

Synchronization of multiple manipulators is challenging task in bin picking. a perfect synchronization can assure accuracy. It is quite difficult, but precise coordination can enhance the working speed and work ability. This process involves a central controller to control and manage manipulator simultaneously through Ethercat [8,9,10]. Where precision also required to make sure exact positioning and multiple tasks can perform at same time by distributing approaches where individual controller can communicate in real time by Ethercat.
Adding an encoder for the feedback to controller ensures close-loop communication. Multiple manipulators connected together symmetrically and performing action on same time required path planning and strong RRT algorithm. The central unit is responsible for decision making and precise control to avoid collision. As shown in Figure 1, when it is a shared space, it also required the communication of multiple manipulators at the same time and advanced algorithm compute synchronization of RRT trajectories.
t slave i = t master + Δ t offset i + Δ t propagation i
The equation describes how EtherCAT synchronizes the local clock of the i-slave device with the master’s reference clock. Here, t slave represents the synchronized time of the slave, while t master is the master’s reference time. The term Δ t offset corrects any initial time difference offset between the slave and master clocks, ensuring they start from a common baseline. Meanwhile, Δ t propagation accounts for the signal transmission delay as data travels from the master to the slave over the network.
Figure 1. Coordinated control synchronizing multiple manipulators using (a) controller (PID, MPC), (b) feedback controller for stability, and (c) visualization for real-time monitoring. The system integrates (a) controllers (PID for precision, MPC for predictive optimization), (b) feedback controllers to correct deviations via encoder data, and (c) visualization tools, RRT path plots, and state trajectories to enhance operational transparency and debugging.
Figure 1. Coordinated control synchronizing multiple manipulators using (a) controller (PID, MPC), (b) feedback controller for stability, and (c) visualization for real-time monitoring. The system integrates (a) controllers (PID for precision, MPC for predictive optimization), (b) feedback controllers to correct deviations via encoder data, and (c) visualization tools, RRT path plots, and state trajectories to enhance operational transparency and debugging.
Robotics 14 00137 g001

2.1. Non-Repetitive Trajectories

Because the object location and size is different, we need more calculations to pick and place. When they must follow a non-repeatable trajectory.its more complex trajectory path, each manipulator requires a different sequence to perform the task. Unlike repeatable trajectories, non-repeatable trajectories required continuous recalculations to perform the collision-free task [11]. Since the trajectory is not a predefined task, the motion planning RRT algorithm helps the control system to automatically choose minimum path to perform task.
x ( t ) = k = 0 n a k ( t ) · t k
The expression x ( t ) defines a vector-valued function as a polynomial in time tm where each term a k ( t ) · t k consists of a time-dependent coefficient vector a k ( t ) multiplied by t k . This allows x ( t ) to model dynamic, time-varying behaviors, such as trajectories or curves, where the coefficients themselves can change over time for greater flexibility.
x ( t ) = a 1 t + a 2 t 2 + a 3 sin ( ω t ) , y ( t ) = b 1 t + b 2 cos ( ω t ) + b 3 t 3 , z ( t ) = c 1 t + c 2 e λ t + c 3 ln ( 1 + t ) .
The equations describe the motion of a point in 3D space over time t, combine linear motion a 1 t and oscillation a 3 sin ( ω t ) , include linear motion b 1 t , oscillation b 2 cos ( ω t ) , and cubic growth b 3 t 3 , and feature a square root term c 1 t , exponential decay c 2 e λ t , and logarithmic growth c 3 ln ( 1 + t ) . x(t) represents a full state vector (position, velocity, acceleration) of the robotic arm at time t. y(t) and z(t) are the y-coordinate and z-coordinate, respectively, of the robot’s end-effector in 3D space at time t. As shown in Figure 2, together, they model complex motion with linear, oscillatory, and non-linear effects [12,13].

2.2. Repeatable Trajectories

Probabilistic Roadmap: Probabilistic Roadmap refers to a network of dots within occupied environment. This technique is used by manipulator where algorithm construct a graph in robotic configuration space. It consists two phases learning phase and query phase.
Learning phase: The learning phase creates random nodes in graph for collision free movement. Query phase: This phase creates nearby paths for robot by connecting closest dots to each other. These techniques will provide initial information about the starting point to the final point. As PRM relies on these nodes created randomly by the algorithm and then connected together. The algorithm relies on sampling-based motion planner to construct a graph and the connected random nodes. This can also include the analyzed obstacle in the environment. As shown in Figure 3, apart from usable space, there are hurdles and obstacles in the path. Multiple planner nodes connecting can be specified according to roots [14].
Once path building is complete and the graph dots have been created, the next process continues and can connect multiple dots together, and the same planner can be used multiple times. This can save effort and time. Rapidly exploring Random Trees: This technique is one time query planner technique that uses the sampling-based motion planning algorithm to explore high dimensional spaces. RRT advances a tree from the starting configuration when the graph had been created, start to end, to make it particularly effective; RRT make it effective for the real dynamic world. Sometimes, it is bidirectional, and, if required to reuse need to make query plan again and if two different graphs were created, at some point, they connect together.
q new = q near + η · q rand q near q rand q near , where q rand RandomSample ( 0 ) , ( random config ) q near = arg min q T q q rand , ( nearest node ) CollisionFree ( q near , q new ) , ( feasibility ) q new q goal < ϵ terminate . ( goal check )
The RRT algorithm expands a tree by generating new nodes q new from the nearest existing node q near moving a step size η toward a random sample q rand . Key conditions q rand are randomly sampled, q near is the tree’s closest node, the path must be collision-free, and the process stops when q new nears the goal within tolerance ϵ . This balances exploration and goal-directed growth for efficient path finding.

3. Actuation and Redundancy

Hardware very much relies on actuation where it converts energy to motions and often exploits redundancy, meaning more degrees of freedom that actually required this, enhancing the flexibility and robustness robotic system used, which has 2–4 robotic arms with 6DOF each.
Fully actuated: Within the working range or a fully actuated condition manipulator, all joints are fully actuated and have independent control over all the degrees of freedom (DOF) in the working process. Under actuated: In this condition, robots do not have degrees of freedom, but rather redundancy achieved by dynamic coupling of fewer actuators; then, the degree of freedom. conditions also applies.
For collision-free movement, encoders and stop switches are used to stop over-moving the manipulator. This way, the end effector can be positioned to the maximum reachable points. In shared space, each manipulator moves in different positions, and there is a high chance to collide with or hit obstacles; thus, the system requires high-end safety to avoid collision. As shown in Figure 4, to avoid collision, feedback is required by master control to access full control to each manipulator.
M ( q ) q ¨ + C ( q , q ˙ ) q ˙ + G ( q ) = τ M ( q ) q ¨ + C ( q , q ˙ ) q ˙ + G ( q ) = B ( q ) τ
The equations represent the dynamics of robotic manipulators, capturing the relationship between joint motion and applied torques. The first equation describes a fully actuated robotic arm, where the inertial forces M ( q ) q ¨ , Coriolis/centripetal forces C ( q , q ˙ ) q ˙ , and gravitational forces G ( q ) are balanced by direct joint torques τ . Here, M ( q ) is the inertia matrix encoding mass distribution, C ( q , q ˙ ) accounts for velocity-dependent effects, and G ( q ) represents gravity’s influence. Underactuated redundancy generalizes this to an underactuated system via the actuation matrix B τ which maps available control inputs τ to effective joint torques B τ .

3.1. Manipulator Rendering

Dynamic environment is rendered with different sizes and shapes of geometries such as boxes or cylindrical shapes; the process starts by importing the manipulator kinematics system using a technique called frugal that provides initial information about the shape and size of geometry [15,16]. The system sends signals and calculates the position of the arm to pick the geometry and perform the next task by using the Kalman filter State equation:
x k = F k x k 1 + B k u k + w k .
The equation describes a discrete-time state-space model that describes how a system evolves from one time step k 1 to the next k. Here, x k represents the current state vector (position, velocity) which depends on system dynamics F k x k 1 . The state transition matrix F k linearly transforms the previous state x k 1 to predict the new state. The Control Input is B k u k . The matrix B k maps external control actions u k (motor commands) to their effect on the state. The Process Noise is w k . This term accounts for uncertainties or disturbances (assumed Gaussian, with zero mean and covariance) modeling unpredictable effects like friction or sensor errors. The equation observation model is measured as follows:
z k = H k x k + v k
The equation describes the measurement model in state estimation and filtering problems (Kalman filters). z k : The measurement vector at time step k representing observed data (sensor readings like position, velocity, or camera detections). H k : The observation matrix, which maps the true state to the expected measurement. It defines how the state variables relate to the sensor outputs. x k : The true state vector (manipulator poses, object position) being estimated. v k : The measurement noise, accounting for sensor inaccuracies. w k N ( 0 , Q k ) = process noise (Gaussian with covariance Q k ).

3.2. Multi-Modal Initialization

The control system aligns information from the fusion of different sensors and gathers data to achieve a robust estimation shape of object size. The vision system provides the orientation of the object input that can be given by pictures of different sizes; after that, the system can analyze the size of the object by itself and adjust the gripper according to the shape of the object. Additionally, pressure sensors are also added to the gripper for smooth gripping.
Q * = J ( Q k ) + λ · 1 collision ( Q k , ϵ ) Q k = I ( q k 0 , q k T , t ) q k 0 , q k T p ( q | M )
This equation defines multi-modal trajectory optimization for robotic arms. The optimal trajectory Q * is chosen from candidates Q k , each generated by interpolating between sampled start q k 0 and goal q k T configurations. The selection minimizes a cost J ( Q k ) (path length) plus a collision penalty λ · 1 collision . This ensures safe, efficient motion in cluttered environments by evaluating multiple path options.
  • Trajectory generation: Q k = I ( q k 0 , q k T , t ) .
  • Configuration sampling: q k 0 , q k T p ( q | M ) .
  • J ( Q k ) + λ · 1 collision : Cost + safety trade-off.
λ is applied to penalize trajectories that collide with obstacles in the environment ϵ , and λ is a weighting factor. Figure 5 depicts the control system integrates data from multiple sensors, fusing inputs from vision systems, LiDAR, and pressure sensors to create a robust estimation of an object’s size and shape.

3.3. Collision-Free Motion Planning

Multiple manipulators are connected practically, providing continuous feedback to master control to achieve exact positioning to improve coordination in complex dynamic environment; visual information can be processed individually by each system, and trajectory paths can be determined by the connectivity of multiple systems [17].

3.4. Constrained Optimization for Each Manipulator

By using multiple constraints such as environmental constraints, these constraints execute complex task while staying in manipulators’ physical limits. Task constraints and coordination constraints can achieve possible results by formulating motion planning and mathematically optimizing trajectories with explicit constraints, as manipulators perform different tasks by using diffident path trajectories.

4. Motion Variation

When multiple manipulators work in a shared space, object manipulation requires the shortest time and requires collision-free time to complete tasks in the shortest possible time, where the placing position is predefined, but the picking place is not predefined. The system is required to recognize the connection between each manipulator to perform a task, and adding an encoder helps the manipulators for exact positioning. The system can achieve trajectory planning by collaborating with each manipulator, and each manipulator control system is synchronized together and connected to master control.
Operating manipulators in shared space require careful synchronization in dynamic environments. The path trajectories are created separately to each manipulator; this condition applies for multiple manipulators at the same space, and each manipulator required to work closely.
τ i ( 0 ) = q i , start , τ i ( T ) = q i , goal i , A i ( q i ( t ) ) j i A j ( q j ( t ) ) = t , i
Equations describe the constraints for a multiple manipulator path planning problem. The boundary conditions are specified for each agent i: the initial position τ i ( 0 ) is set to the starting configuration q i , start , and the final position τ i ( T ) must reach the goal configuration q i , goal by time T. This ensures that each agent begins and ends its trajectory at predefined locations.
For every agent i, the area or volume A i ( q i ( t ) ) occupied by the agent at any time t must not intersect with the combined areas or volumes occupied by all other agents j i at the same time. In other words, no two agents can occupy the same space simultaneously, ensuring safe and conflict-free paths.

5. System Structure and Updates in Design

Modern manipulator considering several factors sophisticated hierarchical new architecture design have been added for both performance and adaptability. It consists of three layers: Physical layer: The physical layer comprises modular interfaces to unified control connect such as Ethercat. Coordination layer: The coordination layer is responsible for handling real-time task distributions GPU-accelerated trajectory planning and collision-free motion planning. Cognitive layer: Its human–robot interactions are through machine learning models. Manipulators enhance work efficiency. Adopting a sudden new environment and making new trajectories is a challenging task for the control system. Essential details can be provided by sensor fusion and strategy-based real-time decision can be solved by removing errors in trajectories.

5.1. Action Space

Action space is defined by the robot structure which allows each arm to work under working space and distance from object to gripper. Data integration can be the best strategy to avoid collision between multiple manipulators. This technique makes the pick and place process different in trajectories as well as in time, so it will only work one by one. Gripper designs varying sizes of the object, and the opening/closing time can be adjusted by command. Position and pointing can be adjusted according to the position of the object.

5.2. Motion Planning Failure

Motion planning failure occurs if the RRT algorithm fails to generate a viable path for motion. Different path trajectories required different path trajectory planning; at the same time, some RRT trajectories can recover from predicting previously used trajectories in real time. Training data can be analyzed multiple times to determine whether a mission can be successful or not. In the real-world environment scenario, mission failure cannot be predicted in advance, yet it performs the task and analyzes previous trajectories. These data can be used in variations in training data.

5.3. Rapidly Exploring Random Tree

RPT expansion is used in the parabolicity compute sampling-based RRT method. It builds a search tree from a static beginning point and then gradually expands it using random sampling nodes to explore the whole environmental space until it reaches a target point. A continuous trajectory route between the starting and destination point is created with this technique. Adding the probabilistic computing and quick search capabilities of the RRT method are its main advantages. The term “completeness” describes the algorithm’s thorough coverage of the search space and connecting nodes through a process, where it allows for path planning between any two places by conducting an exhaustive and unhindered study of the full map region given enough time and iterations. Previous recent sampling point, New Node, grows with radius r under collision detection circumstances. Cost sums are computed sequentially from the starting coordinate point to the closest node and to the new node, and every node in this dynamic environmental space is a subset of the new node. The new node’s parent is selected to be the node with the closest distance. If there are no barriers separating it from the closest node, the new node is added to the tree structure. This iterative process continues on until criteria are satisfied and a workable solution is discovered.

5.4. Adaptive Pruning Optimization Strategy

With the addition of an adaptive pruning threshold and elimination of suboptimal nodes in motion, the enhanced RRT algorithm integrates an adaptive pruning optimization technique that intelligently maintains computational efficiency without decreasing solution quality. As shown in Figure 6, depending on real conditions, the algorithm dynamically modifies this threshold encountered throughout the search process rather of relying just on the cost of the current best path to set pruning criteria [18]. In order to dynamically modify the lopping threshold, it also takes into account the features of obstacle avoidance in diverse contexts, including the numbers of obstacles, the path’s complexity, and varied dimensions lowering the pruning threshold.
The optimization method in the algorithm also incorporates a local path re-evaluation in the mechanism. The optimization method can efficiently narrow the search space by directly altering the tree structure, which boosts the algorithm’s performance even further. The ongoing pruning and path optimization process can enhance the RRT method, reduce the path length, and increase path trajectory smoothness.
min i e i 2 + A i w i 2 + γ a i 2 d t
This equation optimizes robotic arm movements by minimizing tracking errors e ( t ) P , control effort u ( t ) W , and sudden joint changes q ˙ ( t ) Q over time. It ensures precise, energy-efficient, and smooth pruning motions. Table 1 summarizes the Iteration results. The integral balances these factors across the task duration, while missing constraints (like arm dynamics) would refine the system further.
In order to preserve and maintain more possible routes, they could be required to be delayed in pruning places with a lot of impediments to determine whether to keep or change the pathways in the current search tree.

5.5. Trajectory Optimization

We evaluated our proposed FL-based framework in a simulated environment with two 6-DOF robotic arms operating in a shared workspace with static and dynamic obstacles. We compared our method against three baselines: Standard RRT*, a Centralized RL controller, and a Standalone RL controller where each robot learns independently. RRT* adds an “optimization” step. It rewires the tree to find shorter paths. Table 2 summarizes the performance comparison.

6. Connectivity

Connecting refers to multiple-manipulator collision-free trajectory in a controlled dynamic environment. It required multiple data for trajectories and path planning. Multi-layer trajectories required multiple RRT algorithms at the back end, optimizing global planner and localization, which is individually planned for each manipulator. Multiple systems are connected to the master control at the same time, and continuous feedback is provided to create a collision-free dynamic environment. A manipulator control system can connect to other system by physical interfaces which are provided by the system in use.

6.1. Motion Planning in Joint Space

In shared space, the manipulator system can connect to another manipulator system by the physical interface. Trajectory planning and motion constraints in 3D visualization helps to avoid obstacles and is also beneficial for the system to locate the exact position of the object and update the trajectory after every step. As shown in Figure 7, path trajectory in a working space can be divided according to the available working space. RRT algorithm and motion constraints assist the motion planner to reduce complexity.

6.2. System-to-System Communication

Multiple-system manipulation is a challenging task for the control system in a shared working space. The mathematical algorithm for motion planning employs a layered communication architecture for communication, and self-decision making enables possibilities to communicate multiple-manipulator systems with each other. Communication assists in coordination for planning, control, and execution. The system is connected with master control, and the system exchange is related with information about other systems which can be processed and analyzed or presented, and the previous position of the collective manipulator information exchange creates close-loop communication between one system to another by sending request information and exchanging information. The master control system inquires about recent activities to other systems.

6.3. Statistic Multi-System Control

This is a modern statistical technique that centralized the control system, and it involves the global command to control each manipulator connected with master control. However, for clarification, this refers to the multi-control system that provides a central command structure that can exceed several numbers of the control system connected with each other by global command [19]. Control depends on the communication range in which many systems can connect with each other.

6.4. Computational Limits

Manipulator motion planning confronts the fundamental computational limits that need to carefully scale with the manipulator degree of freedom. When a number of systems are connected with each other in dynamic control, the computational limit becomes the bottleneck for the system to proceed to the next step. The system required a complete structure in which dots are connected with each other. Path trajectory and planning will be analyzed individually, but each system is connected to the master control to obtain equal feedback to each manipulator. Adding self-decision-making possibilities required data provided by the integration of sensor fusion data from each manipulator, and localization required central control to be in action.

6.5. Translation Control

Translation control refers to the end factor of a manipulator, more precisely, transition control in cartesian moment that maintains orientation stability. Let us consider we have a distance target point μ  −  μ d as object functionality, so each time lap τ , the mean position incremental translation 1 μ n( τ ) of the complying subset (p n < N) is calculated as root mean square · 2 as given data:
Δ μ n ( τ ) = ϵ n i = 1 n ν i ( τ )
This equation calculates the average incremental change Δ μ n ( τ ) by scaling the sum of individual terms ν i ( τ ) by ϵ n . It is used in adaptive systems like robotics or machine learning to adjust parameters trajectories or weights based on collective feedback, where ϵ controls the adjustment rate, and n normalizes the response.

6.6. Spread Control

Spread control is responsible for managing multiple-manipulator degrees of freedom. This technique optimizes coverage while preventing collision. The trajectory algorithm maximizes uniform spacing and minimizes concentration. This technique helps in collision-free movement, optimal coverage, and load balancing.
u spread = K s q q nom + D s q ˙
This equation calculates control forces u spread to maintain a robotic arm’s desired joint configuration q nom . The stiffness term K s ( q q nom ) pushes joints toward their target positions, while the damping term D s q ˙ ensures smooth, stable motion. Together, they enable the arm to preserve an optimal posture, keeping joints spread for stability during tasks without abrupt movements. The gain matrices K s and D s determine the system’s responsiveness.

7. Distributed Optimization Algorithms

DOA refer to enabling multiple systems to connect to each other and breakdown the algorithm to subparts. Using multiple optimization algorithm methods and achieving multiple results, we distributed optimization algorithms into three general categories in this section. ADMM techniques include DFO methods and sequential convex programming. These groups are categorized according to their suitability for multi-manipulator environmental scenarios and distributed via common junction procedures. Here, we provide a concise overview of each category by examine a representative distributed algorithm optimization. Following an analysis of each separable optimization problem, the following is discussed. Communication and computation: Multiple-manipulator trajectory planning systems require careful balancing of computational algorithm demands. In order to achieve best performance, solution, quality, and latency constraints, delay should be no more than 1 μs for low-level control to 10 ms for high-level computational [20,21]. This interaction between communication and computation is a essential design consideration. Using adaptive resolution mechanisms that dynamically transition between high-fidelity cloud plans and efficient edge fallbacks depending on network dynamics, robust architectures edge-cloud use hybridization to distribute processing across nodes. Master control handles raw sensor data and immediate control (sub-millisecond), edge nodes manage local planning (10–100 ms), and cloud resources handle large-scale computations such as policy learning (>100 ms).
x i k + 1 = x i k α f i ( x i k ) Local Objective β j N i ( x i k x j k ) + γ Consensus u spread k Physical Control
  • Local Optimization f i ( x i k ) : The term represents the gradient of robot i’s local objective function minimizing tracking error or energy consumption. The weight α scales how aggressively the robot pursues this individual goal.
  • Consensus Coordination j N i ( x i k x j k ) : This term ensures alignment with neighboring robots j N i by reducing differences between their states x i k x j k . The weight β controls the strength of coordination, critical for tasks like collaborative lifting or formation control.
  • Physical Control u spread k : It is implied but not fully shown; the placeholder u spread suggests additional physical constraints (joint limits, stability) are applied, scaled by γ . In practice, this might include terms like K s ( q q nom ) to maintain safe joint configurations.

7.1. DFO Method

We point out that when extending gradient descent to limited optimization, a popular projection on gradient descent, a common technique is used, which entails projecting the iterates to the feasible set [22,23]. We discuss into more depth on the extensions of gradient descent techniques to limited optimization. These are known as first-order techniques because they typically only need to compute the gradient or the first derivative of the goal and constraint functions.When applying the unconstrained optimization, variables are updated in the following way. As shown in Figure 8, DFO algorithms avoid this basic issue by permitting each manipulator to use only its local gradients and collaborating with its neighbor to reach a consent on a common solution.

7.2. Sequential Convex Programming

Sequential Convex Programming (SCP) solving provides an iterative framework for solving challenging non-convex motion planning and optimization issue by computing a series of iterates that indicate the resolution of a number of approximations of the original problem is known as sequential convex programming.

7.3. Multiple-Pose Distributed Optimization

A revolutionary method, multiple-pose distributed optimization, which is a dynamically paradigm-shifting approach to selects the most efficient distributed formulation based on real-time system conditions for robotic coordination. This framework adaptively transition between three basic optimization poses. ADMM decomposition for loosely coupled separatable issues and consensus approaches for tightly coupled problems. Federated averaging is carried out in dense networks, for star topology systems with central coordination [24]. In order to automatically trigger pose transitions, the system continuously monitors key metrics like network connectivity, problem separability, and residual convergence.
q i k + 1 = arg min q i J i ( q i ) + j N i λ i j k q i + ρ 2 q i q i k + q j k 2 2
This equation describes the distributed optimization process for a robotic arm or multiple arms at each iteration k + 1 . The goal is to find the optimal joint configuration q i k + 1 that minimizes a local cost function J i ( q i ) (tracking error or energy consumption) while ensuring coordination with neighboring robots. The second term enforces consensus among robots by penalizing deviations from the average of their previous configurations q i k + q j k 2 , weighted by a penalty parameter ρ . The Lagrange multipliers λ i j k help balance the trade-off between local optimization and global agreement.

8. Multi-Manipulator Task Assessment

Multi-manipulator task assessment provides a comprehensive framework for estimating and optimizing collaborative manipulator operations through real-time performance and minimize a trajectory cost and avoid colliding from starting point to desired state configuration. It also included the monitoring and adaptive control inputs. The system continuously tracks key metrics such as desired reference trajectory including task completion time by using synchronized ROS2 clocks.

8.1. Flexible and Scalable Advances in Dynamic Task Allocation

Leverage adaptive algorithms to efficiently distribute workloads in real-time, ensuring optimal resource utilization. These modernizations enable seamless task reassignment and scalability, adapting to changing demands while maintaining system performance and reliability. At its core, a classified architecture processes energy efficiency and workload balance, creating a multidimensional performance profile [25], through joint state data feature extraction pipelines that quantify trajectory smoothness, torque profiles, and completion ratios. For LSTM temporal pattern analysis, we use it to feed into machine learning models like random forests for failure prediction.
T = i = 1 N w i x i x i * 2 Tracking Error + α j i x i x j min 2 Collision Avoidance + β τ i 2 Effort Minimization
The variables x i and x j represent the positions of the i and j agents in the system, respectively. The tracking error term x i x j * 2 measures how far each agent is from its desired target position x i * . The collision avoidance term x i x j min 2 ensures agents maintain a minimum safe distance by heavily penalizing proximity and enforces a lower bound on separation. The effort term τ j 2 quantifies the control input force or torque applied to the i agent, minimizing energy use. Together, these terms ensure precise, safe, and efficient multi-agent coordination.

8.2. Multi-Manipulator Collaborative Work

Multi-manipulator collaborative work, including object identification, visual place recognition, depth estimation, 3D mapping, reinforcement learning, and multi-manipulator learning, involves applying deep learning techniques to estimate functions from data. In shared work spaces, consent grasping dynamically balances applied forces across end-effectors, and collision-aware path trajectory motion planning integrates swarm potential fields for safe motion [26].
At the algorithmic level, these systems leverage distributed task allocation through optimized assignment methods such as RRT algorithm, and modern implementations utilize quality-of-service-controlled communication architecture to synchronize state information at 100 ms intervals while maintaining μs-level hardware synchronization through time-sensitive networking, with performance continuously monitored via metrics like force balance error (<0.15 target) and collision probability (<0.1%).
M 0 x ¨ 0 + C 0 x ˙ 0 + G 0 = i = 1 n J i T F i
M 0 represents the object’s inertial properties in task space. x ¨ 0 acceleration vector of the system’s generalized coordinates. C 0 accounts for Coriolis and centrifugal effects due to rotational motion, and G 0 handles gravitational forces. The right side of the equation sums the contributions from each robot, where J i T transforms end-effector forces F i into equivalent joint torques.

9. Conclusions

This study presents a comprehensive framework for path planning and coordinated control of single and multiple robotic manipulators in dynamic, shared environments. By integrating sampling-based algorithms such as Rapidly exploring Random Trees (RRT) and Probabilistic Roadmaps (PRM), along with real-time synchronization via EtherCAT, the system achieves collision-free, adaptive motion trajectories for both repetitive and non-repetitive tasks. Advanced control architectures—including PID, MPC, and adaptive pruning optimization—enhance the efficiency and safety of manipulation, particularly in complex scenarios requiring multi-arm cooperation. The implementation of feedback loops through encoders, Kalman filters, and multi-sensor fusion (vision, depth, force) significantly improves object recognition, pose estimation and trajectory adaptation. Moreover, the proposed multi-layered system architecture (physical, coordination, and cognitive layers) ensures robustness, scalability, and real-time responsiveness. Distributed optimization algorithms like ADMM and DFO, coupled with cloud-edge hybrid computing, enable high-performance multi-agent collaboration with minimized latency. Ultimately, this work provides a scalable and intelligent robotic control framework capable of dynamic task adaptation, efficient motion planning, and real-time coordination, laying the foundation for future industrial and autonomous robotic systems that require precision, safety, and autonomy in cluttered or unpredictable environments.

Author Contributions

F.K.; conceived the study, developed the federated learning framework, implemented the software, conducted experiments on multiple robotic arms, and wrote the initial draft of the manuscript. Z.M.; supervised the research, provided critical feedback, secured funding, and contributed to revising and editing the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The equipment for this research was primarily provided by the university laboratory. No major external funding was received, with the exception of the specialized sensors required for the experimental setup, which were funded by Z.M.

Institutional Review Board Statement

This study was conducted in accordance with the research guidelines of Donghua University. Since the work involved simulated and laboratory-based experiments on robotic arms without human participation or sensitive data, formal ethics approval was not required. All procedures followed standard safety and operational protocols for robotic systems. The research adhered to the university’s policies on experimental integrity and data reproducibility. No conflicts of interest or risks were identified. Funding and resources were provided by Donghua University under its institutional research framework.

Data Availability Statement

Data is contained within the article.

Acknowledgments

This research was supported by Donghua University. The authors gratefully acknowledge the technical and infrastructural support provided by the Mechatronics Lab at Donghua University, which was essential for conducting experimental work on multi-robot coordination and control. Special thanks to Zhuo Meng for their expert guidance and for providing access to the advanced robotic testbeds, which were critical in validating the proposed framework. The lab’s high-precision motion-capture systems and collaborative robotics platforms enabled rigorous experimental testing and accurate data acquisition, significantly improving the reliability of this study. During the preparation of this manuscript, the authors used OpenAI tool deepseek for the refining experimental prompts and generating initial code structures for data analysis. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Zhang, W.; Yang, D.; Wu, W.; Peng, H.; Zhang, N.; Zhang, H.; Shen, X. Optimizing federated learning in distributed industrial IoT: A multi-agent approach. IEEE J. Sel. Areas Commun. 2021, 39, 3688–3703. [Google Scholar] [CrossRef]
  2. Zhang, S.Q.; Lin, J.; Zhang, Q. A Multi-Agent Reinforcement Learning Approach for Efficient Client Selection in Federated Learning. Proc. AAAI Conf. Artif. Intell. 2022, 36, 9091–9099. [Google Scholar] [CrossRef]
  3. Wu, N.; Jia, D.; Li, Z.; He, Z. Trajectory Planning of Robotic Arm Based on Particle Swarm Optimization Algorithm. Appl. Sci. 2024, 14, 8234. [Google Scholar] [CrossRef]
  4. Liu, W.; Huang, Z.; Qu, Y.; Chen, L. Path Planning for Robotic Arms Based on an Improved RRT Algorithm. Open J. Appl. Sci. 2024, 14, 1214–1236. [Google Scholar] [CrossRef]
  5. Imakura, A.; Sakurai, T. FedDCL: A Federated Data Collaboration Learning as a Hybrid-Type Privacy-Preserving Framework Based on Federated Learning and Data Collaboration. arXiv 2024, arXiv:2409.18356. [Google Scholar] [CrossRef]
  6. Wu, P.; Su, H.; Dong, H.; Liu, T.; Li, M.; Chen, Z. An obstacle avoidance method for robotic arm based on reinforcement learning. Ind. Robot 2025, 52, 9–17. [Google Scholar] [CrossRef]
  7. Cao, M.; Mao, H.; Tang, X.; Sun, Y.; Chen, T. A novel RRT*-Connect algorithm for path planning on robotic arm collision avoidance. Sci. Rep. 2025, 15, 2836. [Google Scholar] [CrossRef]
  8. Le, Q.D.; Yang, E. Adaptive Fault-Tolerant Tracking Control for Multi-Joint Robot Manipulators via Neural Network-Based Synchronization. Sensors 2024, 24, 6837. [Google Scholar] [CrossRef]
  9. Yu, P.; Tan, N.; Gu, X. A Hybrid Neurodynamic Scheme for Bimanual Synchronized Tracking Control of Robotic Manipulators with Uncertain Kinematics. IEEE Trans. Ind. Inform. 2025, 21, 2967–2976. [Google Scholar] [CrossRef]
  10. Wang, D.; Zhang, G.; Zhang, T.; Zhang, J.; Chen, R. Time-Synchronized Convergence Control for n-DOF Robotic Manipulators with System Uncertainties. Sensors 2024, 24, 5986. [Google Scholar] [CrossRef]
  11. Yu, C. Adaptive Iterative Learning Constrained Control for Linear Motor-Driven Gantry Stage with Fault-Tolerant Non-Repetitive Trajectory Tracking. Mathematics 2024, 12, 1673. [Google Scholar] [CrossRef]
  12. Widanage, K.N.D.; Xia, J.; Parween, R.; Godaba, H.; Herzig, N.; Glovnea, R.; Huang, D.; Li, Y. Non-repetitive-path Iterative Learning and Control for Human-guided Robotic Operations on Unknown Surfaces. IEEE Trans. Robot. 2025, 41, 4922–4940. [Google Scholar] [CrossRef]
  13. Wang, L.; Zhu, S.; Wei, M.; Wang, X.; Huangfu, Z.; Chen, Y. Iterative Learning Control with Adaptive Kalman Filtering for Trajectory Tracking in Non-Repetitive Time-Varying Systems. Axioms 2025, 14, 324. [Google Scholar] [CrossRef]
  14. Xie, S.; Zhong, H.; Li, Y.; Xu, S.A.; Liu, W.; Bian, S.; Zhang, S. Predictive Control for an Ankle Rehabilitation Robot Using Differential Evolution Optimization Algorithm-Based Fuzzy NARX Model. IEEE Trans. Neural Syst. Rehabil. Eng. 2025, 33, 1886–1895. [Google Scholar] [CrossRef] [PubMed]
  15. Tao, S.; Xiang, F.; Shukla, A.; Qin, Y.; Hinrichsen, X.; Yuan, X.; Bao, C.; Lin, X.; Liu, Y.; Chan, T.; et al. Maniskill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI. arXiv 2024, arXiv:2410.00425. [Google Scholar] [CrossRef]
  16. Wang, Y.; James, S.; Stathopoulou, E.K.; Beltrán-González, C.; Konishi, Y.; Del Bue, A. Autonomous 3-D Reconstruction, Mapping, and Exploration of Indoor Environments with a Robotic Arm. IEEE Robot. Autom. Lett. 2019, 4, 3340–3347. [Google Scholar] [CrossRef]
  17. Yang, F.; Yao, Q.; Song, Z.; Gu, H.; Ma, J.; Huo, Y.; Liu, Y. A Collaborative Drone System with Robotic Arm Based on Lightweight Structure and Multimodal Control. In Proceedings of the 4th International Conference on Robotics, Automation and Intelligent Control (ICRAIC), Changsha, China, 6–9 December 2024; pp. 375–381. [Google Scholar] [CrossRef]
  18. Bi, A.; Zhang, C. Robot Arm Grasping based on Multi-threaded PPO Reinforcement Learning Algorithm. In Proceedings of the 3rd International Conference on Artificial Intelligence, Human-Computer Interaction and Robotics (AIHCIR), Hong Kong, China, 15–17 November 2024; pp. 369–373. [Google Scholar] [CrossRef]
  19. Shvalb, N.; Hacohen, S.; Medina, O. Statistical Robotics: Controlling Multi Robot Systems Using Statistical-Physics. IEEE Access 2024, 12, 134739–134753. [Google Scholar] [CrossRef]
  20. Soleimani Amiri, M.; Ramli, R. Intelligent Trajectory Tracking Behavior of a Multi-Joint Robotic Arm via Genetic-Swarm Optimization for the Inverse Kinematic Solution. Sensors 2021, 21, 3171. [Google Scholar] [CrossRef]
  21. Zhang, H.; Li, X.; Wang, L.; Liu, D.; Wang, S. Construction and Optimization of a Collaborative Harvesting System for Multiple Robotic Arms and an End-Picker in a Trellised Pear Orchard Environment. Agronomy 2024, 14, 80. [Google Scholar] [CrossRef]
  22. Desai, N.; Behara, G.; Deb, S.; Sen, D. A Genetic Algorithm Based Adaptive and Hybrid Pathfinding Algorithm to Enhance the Performance of Robotic Bin-Picking. In Proceedings of the IEEE 21st India Council International Conference (INDICON), Kharagpur, India, 19–21 December 2024; pp. 1–7. [Google Scholar] [CrossRef]
  23. Guo, P.; Luo, D.; Wu, Y.; He, S.; Deng, J.; Yao, H.; Sun, W.; Zhang, J. Coverage Planning for UVC Irradiation: Robot Surface Disinfection Based on Swarm Intelligence Algorithm. Sensors 2024, 24, 3418. [Google Scholar] [CrossRef]
  24. Zhang, L.; Zhou, X.; Liu, J.; Wang, C.; Wu, X. Instance-level 6D pose estimation based on multi-task parameter sharing for robotic grasping. Sci. Rep. 2024, 14, 7801. [Google Scholar] [CrossRef] [PubMed]
  25. Duan, S.; Jiang, Z.; Ren, Y.; Wei, H. Coordination of Multi-Robot Systems. In Proceedings of the 2022 International Conference on Electronics and Devices, Computational Science (ICEDCS), Marseille, France, 20–22 September 2022; pp. 361–367. [Google Scholar] [CrossRef]
  26. Ma, H.; Wei, X.; Wang, P.; Zhang, Y.; Cao, X.; Zhou, W. Multi-Arm Global Cooperative Coal Gangue Sorting Method Based on Improved Hungarian Algorithm. Sensors 2022, 22, 7987. [Google Scholar] [CrossRef] [PubMed]
Figure 2. Real-time optimization, error compensation, and collision prediction for motion systems—combining (a) intermittent/periodic control (irregular, cycle-skipping) for efficiency and (b) continuous control (smooth, unbroken cycles) for stability.
Figure 2. Real-time optimization, error compensation, and collision prediction for motion systems—combining (a) intermittent/periodic control (irregular, cycle-skipping) for efficiency and (b) continuous control (smooth, unbroken cycles) for stability.
Robotics 14 00137 g002
Figure 3. Probabilistic Roadmaps (PRM) and Rapidly exploring Random Tree (RRT) for Repeatable Path Planning. (a) represents a unique one-time path without repetition.
Figure 3. Probabilistic Roadmaps (PRM) and Rapidly exploring Random Tree (RRT) for Repeatable Path Planning. (a) represents a unique one-time path without repetition.
Robotics 14 00137 g003
Figure 4. Actuation and redundancy in robotic systems enhancing performance and control strategies. (a) Heterogeneous sensor fusion setup LiDAR/IMU/RGB-D. (b) Multi-sensor fusion system integrating LiDAR, IMU, and RGB-D data for unified perception and reporting.
Figure 4. Actuation and redundancy in robotic systems enhancing performance and control strategies. (a) Heterogeneous sensor fusion setup LiDAR/IMU/RGB-D. (b) Multi-sensor fusion system integrating LiDAR, IMU, and RGB-D data for unified perception and reporting.
Robotics 14 00137 g004
Figure 5. Multi-modal initialization and multi-sensor fusion architecture for adaptive trajectory diversity.
Figure 5. Multi-modal initialization and multi-sensor fusion architecture for adaptive trajectory diversity.
Robotics 14 00137 g005
Figure 6. Adaptive pruning metric error reduction and node elimination. (a) Data points or measurements in a process iteration count 10, an error value 0.035 rad. (b) Error decreasing from 0.14 rad to 0.02 rad over 10 iterations, indicating improving system performance.
Figure 6. Adaptive pruning metric error reduction and node elimination. (a) Data points or measurements in a process iteration count 10, an error value 0.035 rad. (b) Error decreasing from 0.14 rad to 0.02 rad over 10 iterations, indicating improving system performance.
Robotics 14 00137 g006
Figure 7. Multi-agent robotic coordination: real-time edge processing and cloud analytics.
Figure 7. Multi-agent robotic coordination: real-time edge processing and cloud analytics.
Robotics 14 00137 g007
Figure 8. Dynamic task allocation and performance metrics for multi-task assessment, environment scalability, and adaptability metrics for multi-manipulator task execution. (a) Joint angle evolution tracks angular displacement rad over time s, (A: 0.5 rad, B: 1.0 rad at distinct timesteps). (b) Cost function convergence plots optimization progress cost: 1.5 → 0.5 → 0.0 indicating error reduction. (c) Position error per target measures positional deviation during tasks 0.8 m → 0.1 m, reflecting improving accuracy in phases like “Pick” or “Inspect”. (d) Orientation error per target orientation error per target improved precision by 0.5 s. (e) UR5 end-effector trajectory robotic arm’s end-effector ranges from 0 m to 0.6 m.
Figure 8. Dynamic task allocation and performance metrics for multi-task assessment, environment scalability, and adaptability metrics for multi-manipulator task execution. (a) Joint angle evolution tracks angular displacement rad over time s, (A: 0.5 rad, B: 1.0 rad at distinct timesteps). (b) Cost function convergence plots optimization progress cost: 1.5 → 0.5 → 0.0 indicating error reduction. (c) Position error per target measures positional deviation during tasks 0.8 m → 0.1 m, reflecting improving accuracy in phases like “Pick” or “Inspect”. (d) Orientation error per target orientation error per target improved precision by 0.5 s. (e) UR5 end-effector trajectory robotic arm’s end-effector ranges from 0 m to 0.6 m.
Robotics 14 00137 g008
Table 1. Iteration results: nodes, error (rad), and error reduction.
Table 1. Iteration results: nodes, error (rad), and error reduction.
IterationNodesError (Rad)Error Reduction
1250.1700
2180.16473.1%
3150.138216.1%
4130.085138.4%
5110.07917.0%
690.093718.5%
780.033264.6%
880.065998.5%
980.038042.3%
1080.065672.6%
Table 2. Performance comparison of trajectory optimization methods.
Table 2. Performance comparison of trajectory optimization methods.
StandardCentralizedStandaloneProposed FL
MetricRRT*RLRLFramework
Avg. Path Length (m)4.213.954.503.49
Collision Rate (%)15.28.522.76.3
Avg. Planning Time (ms)340220180150
Comm. Overhead (MB/task)10.5105.02.15.8
Success Rate (%)84.891.577.393.7
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Khan, F.; Meng, Z. Federated Learning for Distributed Multi-Robotic Arm Trajectory Optimization. Robotics 2025, 14, 137. https://doi.org/10.3390/robotics14100137

AMA Style

Khan F, Meng Z. Federated Learning for Distributed Multi-Robotic Arm Trajectory Optimization. Robotics. 2025; 14(10):137. https://doi.org/10.3390/robotics14100137

Chicago/Turabian Style

Khan, Fazal, and Zhuo Meng. 2025. "Federated Learning for Distributed Multi-Robotic Arm Trajectory Optimization" Robotics 14, no. 10: 137. https://doi.org/10.3390/robotics14100137

APA Style

Khan, F., & Meng, Z. (2025). Federated Learning for Distributed Multi-Robotic Arm Trajectory Optimization. Robotics, 14(10), 137. https://doi.org/10.3390/robotics14100137

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop