Budget-Aware Closed-Loop Incentive Allocation for Federated Learning with DDPG
Abstract
1. Introduction
- We propose a DRL-driven adaptive incentive framework for FL, enabling global coordination at the server and autonomous decision-making at clients. Specifically, we design a federated DRL decision architecture that integrates model aggregation, state management, and incentive distribution into a unified DRL loop. By modeling the state space, the system can continuously perceive client characteristics such as data quality, computational capability, and participation frequency. The server dynamically adjusts incentive strategies through a policy network, achieving jointly optimized regulation of client participation and training efficiency. This framework provides intelligent decision support for multi-party collaboration in heterogeneous federated environments.
- We design an actor–critic incentive allocation model based on the deep deterministic policy gradient (DDPG), establishing a joint optimization mechanism for policy generation and value evaluation. To mitigate instability in traditional incentive mechanisms when operating in continuous decision spaces, we formulate incentive allocation as a continuous control problem. The actor network generates optimal incentive strategies, while the critic network evaluates the corresponding value functions, enabling coordinated iterative optimization of policy and value. This design significantly enhances convergence stability in high-dimensional and dynamic environments and ensures that incentive allocation more accurately reflects clients’ actual contribution levels.
- We construct an adaptive optimization loop consisting of state awareness, policy evaluation, and benefit feedback and develop an end-to-end dynamic incentive update process that forms a positive feedback cycle between the server and clients. Clients adjust their data contribution ratios and local training strategies based on the incentive signals, while the server re-evaluates reward allocation according to the updated global state. Through multi-round interactions, the mechanism gradually converges to a dynamic equilibrium of incentive policies, effectively suppressing low-effort participation and fostering long-term collaboration from high-quality clients. Experimental results demonstrate that, in high-dimensional dynamic environments, the proposed mechanism can enhance participants’ data sharing rate, improve the generalization capability of the model, and validate the adaptive incentive allocation capability of the incentive distribution model.
2. Related Work
3. Incentive Mechanism Model
3.1. Model Architecture
3.2. Model Design
3.2.1. Global Model Aggregation Module
3.2.2. Model Evaluation Module
3.2.3. State Management Module
3.3. Incentive Allocation Module
4. Experiments
4.1. Data Preparation
4.2. Permutation Mechanism
- Random Permutation Generation: At the beginning of each training step, a random permutation of client indices is generated. This permutation defines a new order of client features for the current step.
- Observation Reordering: When constructing the observation provided to the incentive allocation model, the client feature vectors are reordered and concatenated according to the permutation generated in Step 1. This ensures that the model receives a randomly ordered input in each step, breaking its dependence on a fixed input order.
- Inverse Reordering of Actions: The action vector computed by the incentive allocation model corresponds to the permuted order of clients. Before applying this action vector, it must be restored to the original client order through an inverse permutation, allowing correct allocation of incentives to the corresponding clients.
4.3. Exploration Enhancement
4.4. Training of the Incentive Allocation Model
4.5. Experimental Results
4.5.1. Impact of Permutation
4.5.2. Incentive Allocation Model Performance
4.5.3. Aggregation Method Comparison
4.5.4. Incentive Allocation Strategy Comparison in Multiple Rounds of Federated Learning
4.5.5. Comparative Analysis and Ablation Study
4.5.6. Alignment Analysis of Policy and Theory
4.5.7. Stability and Resilience in Non-Stationary Environments
4.6. Scalability and Computational Complexity
5. Conclusions and Future Work
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Nguyen, D.C.; Ding, M.; Pathirana, P.N.; Seneviratne, A.; Li, J.; Vincent Poor, H. Federated Learning for Internet of Things: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2021, 23, 1622–1658. [Google Scholar] [CrossRef]
- Nair, A.K.; Coleri, S.; Sahoo, J.; Cenkeramaddi, L.R.; Raj, E.D. Incentivized Federated Learning: A Survey. IEEE Trans. Emerg. Top. Comput. Intell. 2025, 9, 3190–3209. [Google Scholar] [CrossRef]
- Ding, N.; Sun, Z.; Wei, E.; Berry, R. Incentive Mechanism Design for Federated Learning and Unlearning. In Proceedings of the Twenty-Fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, New York, NY, USA; MobiHoc’23; Association for Computing Machinery (ACM): New York, NY, USA, 2023; pp. 11–20. [Google Scholar]
- Guo, X.; Zhang, X.; Zhang, X. Incentive-oriented power-carbon emissions trading-tradable green certificate integrated market mechanisms using multi-agent deep reinforcement learning. Appl. Energy 2024, 357, 122458. [Google Scholar] [CrossRef]
- Wu, H.; Tang, X.; Zhang, Y.J.A.; Gao, L. Incentive Mechanism for Federated Learning with Random Client Selection. IEEE Trans. Netw. Sci. Eng. 2024, 11, 1922–1933. [Google Scholar] [CrossRef]
- Wang, S.; Luo, B.; Tang, M. Tackling System-Induced Bias in Federated Learning: A Pricing-Based Incentive Mechanism. In Proceedings of the 2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS); IEEE: Piscataway, NJ, USA, 2024; pp. 902–912. [Google Scholar] [CrossRef]
- Li, G.; Cai, J.; He, C.; Zhang, X.; Chen, H. Online Incentive Mechanism Designs for Asynchronous Federated Learning in Edge Computing. IEEE Internet Things J. 2024, 11, 7787–7804. [Google Scholar] [CrossRef]
- Huang, J.; Ma, B.; Wu, Y.; Chen, Y.; Shen, X. A Hierarchical Incentive Mechanism for Federated Learning. IEEE Trans. Mob. Comput. 2024, 23, 12731–12747. [Google Scholar] [CrossRef]
- Chen, Y.; Zhou, H.; Li, T.; Li, J.; Zhou, H. Multifactor Incentive Mechanism for Federated Learning in IoT: A Stackelberg Game Approach. IEEE Internet Things J. 2023, 10, 21595–21606. [Google Scholar] [CrossRef]
- Dai, Y.; Yang, H.; Yang, H. Deep Reinforcement Learning for Resource Allocation in Blockchain-Based Federated Learning. In Proceedings of the ICC 2023—IEEE International Conference on Communications; IEEE: Piscataway, NJ, USA, 2023; pp. 179–184. [Google Scholar] [CrossRef]
- Chen, J.; Cui, Y.; Wei, C.; Polat, K.; Alenezi, F. Advances in EEG-based emotion recognition: Challenges, methodologies, and future directions. Appl. Soft Comput. 2025, 180, 113478. [Google Scholar] [CrossRef]
- Tang, W.; Liu, E.; Ni, W.; Qu, X.; Huang, B.; Li, K.; Niyato, D.; Jamalipour, A. Game-Theoretic Incentive Mechanism for Blockchain-Based Federated Learning. IEEE Trans. Mob. Comput. 2025, 24, 10363–10376. [Google Scholar] [CrossRef]
- Wang, C.; Peeta, S. Incentive Mechanism for Privacy-Preserving Collaborative Routing Using Secure Multi-Party Computation and Blockchain. Sensors 2024, 24, 542. [Google Scholar] [CrossRef]
- Zhang, R.; Zhou, R.; Wang, Y.; Tan, H.; He, K. Incentive Mechanisms for Online Task Offloading with Privacy-Preserving in UAV-Assisted Mobile Edge Computing. IEEE/ACM Trans. Netw. 2024, 32, 2646–2661. [Google Scholar] [CrossRef]
- Sun, K.; Wu, J.; Li, J. Reputation-Aware Incentive Mechanism of Federated Learning: A Mean Field Game Approach. In Proceedings of the 2024 9th IEEE International Conference on Smart Cloud (SmartCloud); IEEE: Piscataway, NJ, USA, 2024; pp. 48–53. [Google Scholar] [CrossRef]
- Chen, T.; Wang, F.; Hou, W.; Tang, S.; Zheng, Z. Dynamic Incentive Model for Federated Learning Model Trading via Evolutionary Game Theory. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Han, B.; Li, B.; Wolter, K.; Jurdak, R.; Zhang, H.; Hu, Y.; Li, Y. Dynamic Incentive Design for Federated Learning Based on Consortium Blockchain Using a Stackelberg Game. IEEE Access 2024, 12, 160267–160283. [Google Scholar] [CrossRef]
- Zhao, H.; Zhou, M.; Xia, W.; Ni, Y.; Gui, G.; Zhu, H. Economic and Energy-Efficient Wireless Federated Learning Based on Stackelberg Game. IEEE Trans. Veh. Technol. 2024, 73, 2995–2999. [Google Scholar] [CrossRef]
- Hou, Y.; Liu, L.; Wei, Q.; Xu, X.; Chen, C. A novel DDPG method with prioritized experience replay. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC); IEEE: Piscataway, NJ, USA, 2017; pp. 316–321. [Google Scholar]
- Allahham, M.S.; Choudhury, S.; Hassanein, H.S. Reliable Federated Learning with Auction-Based Incentives at the Extreme Edge. In Proceedings of the GLOBECOM 2024—2024 IEEE Global Communications Conference; IEEE: Piscataway, NJ, USA, 2024; pp. 3134–3139. [Google Scholar] [CrossRef]
- Li, G.; Cai, J.; Lu, J.; Chen, H. Incentive Mechanism Design for Cross-Device Federated Learning: A Reinforcement Auction Approach. IEEE Trans. Mob. Comput. 2025, 24, 3059–3075. [Google Scholar] [CrossRef]
- Yuan, S.; Dong, B.; Lv, H.; Liu, H.; Chen, H.; Wu, C.; Guo, S.; Ding, Y.; Li, J. Adaptive Incentive for Cross-Silo Federated Learning in IIoT: A Multiagent Reinforcement Learning Approach. IEEE Internet Things J. 2024, 11, 15048–15058. [Google Scholar] [CrossRef]
- Ma, B.; Feng, Z.; Gao, Y.; Chen, Y.; Huang, J. Secure Service-Oriented Contract Based Incentive Mechanism Design in Federated Learning via Deep Reinforcement Learning. In Proceedings of the 2024 IEEE International Conference on Web Services (ICWS); IEEE: Piscataway, NJ, USA, 2024; pp. 535–544. [Google Scholar] [CrossRef]
- Yang, C.; Liu, J.; Sun, H.; Li, T.; Li, Z. WTDP-Shapley: Efficient and Effective Incentive Mechanism in Federated Learning for Intelligent Safety Inspection. IEEE Trans. Big Data 2024, 10, 1028–1037. [Google Scholar] [CrossRef]
- Wang, S.; Luo, B.; Tang, M. An Incentive Mechanism for Federated Learning with Time-Varying Client Availability. IEEE Trans. Mob. Comput. 2025, 25, 284–299. [Google Scholar] [CrossRef]
- Eslamnejad, M.; Taheri, R.; Shojafar, M.; Bader-El-Den, M. Federated learning-based robust android malware detection: Label-flipping attacks and defenses. Neural Comput. Appl. 2025, 37, 27057–27082. [Google Scholar] [CrossRef]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on International Conference on Machine Learning; ICML’14; JMLR.org: Beijing, China, 2014; Volume 32, pp. 387–395. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.M.O.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Zaheer, M.; Kottur, S.; Ravanbakhsh, S.; Poczos, B.; Salakhutdinov, R.R.; Smola, A.J. Deep sets. Adv. Neural Inf. Process. Syst. 2017, 30, 3391–3401. [Google Scholar]
- Kimura, M.; Shimizu, R.; Hirakawa, Y.; Goto, R.; Saito, Y. On permutation-invariant neural networks. arXiv 2024, arXiv:2403.17410. [Google Scholar] [PubMed]
- Deng, Y.; Lyu, F.; Ren, J.; Chen, Y.C.; Yang, P.; Zhou, Y.; Zhang, Y. FAIR: Quality-Aware Federated Learning with Precise User Incentive and Model Aggregation. In Proceedings of the IEEE INFOCOM 2021—IEEE Conference on Computer Communications; IEEE: Piscataway, NJ, USA, 2021; pp. 1–10. [Google Scholar] [CrossRef]















| Symbol | Description |
|---|---|
| Objective loss of client k in iteration t | |
| Minimum loss of client k across iterations | |
| Training data size of client k in iteration t | |
| Expense of client k in iteration t | |
| Macro-averaged accuracy of client k in iteration t | |
| Maximum macro-averaged accuracy of client k across iterations | |
| Server’s available budget in iteration t | |
| Contribution of client k to global loss based on loss values | |
| Contribution of client k to global macro-accuracy based on accuracy values | |
| Improvement in objective loss of client k in iteration t | |
| Relative improvement in objective loss of client k in iteration t | |
| Improvement in objective loss per unit of data of client k | |
| Improvement in objective loss per unit of expense of client k | |
| Improvement in macro-accuracy of client k in iteration t | |
| Improvement in macro-accuracy per unit of data of client k | |
| Improvement in macro-accuracy per unit of expense of client k | |
| Relative contribution of client k to global loss | |
| Contribution of client k to global loss per unit of data | |
| Contribution of client k to global loss per unit of expense | |
| Relative contribution of client k to global macro-accuracy | |
| Contribution of client k to macro-accuracy per unit of data | |
| Contribution of client k to macro-accuracy per unit of expense | |
| Relative change in server budget in iteration t | |
| Data contribution rate of client k per iteration | |
| Proportion of total training data contributed by client k |
| Method | Accuracy (%) | Conv. Round | Avg. Reward | Efficiency |
|---|---|---|---|---|
| External Baselines | ||||
| FedAvg (No Incentive) | 84.12 | 52 | - | - |
| Static Shapley [24] | 88.45 | 45 | 12.4 | 0.72 |
| Standard DRL-FL [23] | 89.10 | 42 | 14.8 | 0.78 |
| Ablation Study (Ours) | ||||
| w/o Permutation | 90.35 | 38 | 16.2 | 0.81 |
| w/o OU Noise | 89.55 | 40 | 15.5 | 0.79 |
| w/o Incentive Weights | 91.20 | 35 | 17.1 | 0.84 |
| Ours (Full Framework) | 93.58 | 28 | 19.5 | 0.91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Cao, Y.; Cai, H.; Zhu, H.; Zhang, S.; Hu, J. Budget-Aware Closed-Loop Incentive Allocation for Federated Learning with DDPG. Electronics 2026, 15, 1481. https://doi.org/10.3390/electronics15071481
Cao Y, Cai H, Zhu H, Zhang S, Hu J. Budget-Aware Closed-Loop Incentive Allocation for Federated Learning with DDPG. Electronics. 2026; 15(7):1481. https://doi.org/10.3390/electronics15071481
Chicago/Turabian StyleCao, Yang, Huimin Cai, Haotian Zhu, Sen Zhang, and Jun Hu. 2026. "Budget-Aware Closed-Loop Incentive Allocation for Federated Learning with DDPG" Electronics 15, no. 7: 1481. https://doi.org/10.3390/electronics15071481
APA StyleCao, Y., Cai, H., Zhu, H., Zhang, S., & Hu, J. (2026). Budget-Aware Closed-Loop Incentive Allocation for Federated Learning with DDPG. Electronics, 15(7), 1481. https://doi.org/10.3390/electronics15071481

