A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants
Abstract
1. Introduction
1.1. Background
1.2. Key Challenges in Energy Efficiency Optimization
- Environmental Complexity and Dynamics: Thermal power plant environments are inherently non-stationary. IIoT devices must contend with pronounced wireless channel variations stemming from EMI generated by power equipment, signal reflections from metallic structures, and fluctuating operational loads that might vary, for instance from 50% to 100%. These factors can cause the signal-to-noise ratio (SNR) to plummet from a stable 25 dB to as low as 5 dB within brief intervals. Concurrently, critical parameters such as boiler vibrations or turbine speeds can exhibit erratic changes: for example, the vibrations might escalate from a baseline 20 Hz to 60 Hz during an anomaly, demanding real-time adaptive capabilities from sensor nodes. Such pronounced dynamism renders static optimization strategies largely ineffective in maintaining peak performance [27,28,29].
- Multi-Objective Trade-offs: Critical monitoring applications in thermal power plants necessitate a delicate balance between energy efficiency, system reliability exemplified by data delivery success rates, and responsiveness characterized by low latency. An overemphasis on energy conservation can inadvertently degrade data transmission success rates or protract response times. In high-stakes scenarios, such as the early detection of equipment faults, these compromises can lead to severe consequences; for instance, the delayed detection of a turbine bearing failure could result in hourly losses amounting to hundreds of thousands of dollars. Optimization algorithms must therefore navigate these competing objectives to achieve a holistically optimal operational balance [30,31,32,33].
- Resource-Constrained Edge Devices: IIoT sensors in power plants are typically embedded systems with stringent resource limitations, often featuring microcontrollers (MCUs) like the STM32 series with modest processing power around 48 MHz and limited memory of approximately 32 KB of RAM and 256 KB of flash. In stark contrast, conventional DRL algorithms often demand considerable computational and memory resources, rendering them unsuitable for direct deployment on such edge devices. Consequently, a pivotal challenge lies in designing energy efficiency optimization algorithms that are not only effective but also sufficiently lightweight to operate within these constraints without significant performance degradation [34,35].
- Perceptual and Decisional Uncertainty: Sensor-derived perceptions of the operating environment, including channel quality and equipment vibration levels, are often subject to noise and inherent latencies. Decisions predicated on such imperfect information can, in turn, influence future state observations, creating intricate feedback loops. Furthermore, the unpredictable nature of power plant load variations and dynamic interference patterns introduces a high degree of uncertainty into the decision-making process, demanding robust algorithmic solutions [36,37].
1.3. Our Contributions
- A Novel Memory-Efficient Learning Mechanism: Gradient Memory. Our primary contribution is the proposal and formalization of the gradient memory mechanism, a conceptual alternative to experience replay designed specifically for memory-constrained edge learning. By storing and reusing a compact history of loss gradients, it fundamentally addresses the memory bottleneck of on-device DRL, paving the way for enhanced on-device intelligence in IIoT networks.
- A Holistic Lightweight DRL Framework for Edge Deployment. Building upon our core innovation, we design and implement the complete GM-DDQN framework. It integrates the gradient memory mechanism with complementary techniques, including a streamlined neural network architecture, parameter quantization, and efficient parameter update strategies. This results in a system that is not only theoretically novel but also practically deployable, drastically cutting computational overhead and inference time while preserving robust learning and decision-making capabilities.
- Joint Adaptive Wireless Communication and Sleep Scheduling. We introduce a novel approach that formulates transmission power control and sleep mode management as a unified joint decision-making problem. The GM-DDQN agent learns to dynamically coordinate these actions based on real-time environmental states, including SNR, equipment vibration intensity, and network traffic. This holistic management demonstrably achieves substantial energy savings, experimentally shown to be between 22% and 42%, directly enhancing device longevity.
- Multi-Objective Optimization Tailored for Industrial Needs. To meet the multifaceted operational requirements of thermal power plants, we designed a novel multi-objective reward function. This function intrinsically balances three critical performance indicators: energy efficiency, data transmission reliability (success rate), and communication latency. Through carefully assigned weighting factors, the GM-DDQN agent is trained to maximize energy efficiency while stringently maintaining data success rates above 95% and communication delays below a 400 ms threshold.
- Rigorous Validation in Realistic Simulated Scenarios. The efficacy of the proposed GM-DDQN framework is demonstrated through extensive and rigorous experimental evaluations. We systematically benchmark GM-DDQN against standard DRL algorithms (DDQN, DDPG, and PPO) and conventional strategies across diverse and realistic simulated thermal power plant scenarios. The results substantiate that GM-DDQN achieves performance comparable to state-of-the-art DRL methods in terms of energy efficiency and data reliability while operating with only a fraction of their computational resource footprint. This comprehensive validation underscores the practical viability of GM-DDQN for large-scale IIoT deployments.
2. Related Work
2.1. Energy Efficiency Optimization in IIoT
2.2. Reinforcement Learning-Based Energy and Resource Management
2.3. Lightweight DRL Methods and Edge Intelligence
2.4. Research Gaps and Contributions
3. Problem Modeling and Environment Definition
3.1. System Model
- Maximum battery capacity ( in mAh) and current battery level ().
- A discrete set of available transmission power levels ( in dBm).
- A discrete set of available sleep modes (), each mode having a specific power consumption rate ( in mW) and wake-up delay ( in ms).
- An average data generation rate ( in packets/s) and processing capability ( in MIPS).
3.2. Thermal Power Plant Environment Characteristics
3.2.1. Electromagnetic Interference Patterns
3.2.2. Equipment Vibration and Monitoring Needs
3.2.3. Temperature and Path Loss Variations
3.3. Energy Consumption Model
3.4. MDP Formulation
3.4.1. State Space ()
3.4.2. Action Space ()
3.4.3. Transition Probability ()
3.4.4. Reward Function ()
- Energy: , penalizing normalized energy consumption.
- Reliability: , rewarding successful transmissions.
- Latency: is defined as
3.4.5. Discount Factor ()
3.5. Optimization Objective
3.6. Problem Complexity Analysis
- High-Dimensional and Continuous States: The state space is vast, making tabular RL methods impractical.
- Non-Stationary Environment: Plant dynamics change, requiring adaptive policies.
- Delayed Rewards and Temporal Credit Assignment: The long-term effects of actions are hard to attribute.
- Complex Objective Trade-offs: Balancing energy, reliability, and latency is non-trivial.
- IIoT Device Resource Constraints: Algorithms must be lightweight and computationally efficient.
4. Lightweight Double-Deep Q-Network (GM-DDQN) Method
4.1. GM-DDQN Architecture Overview
4.2. The Core Innovation: The Gradient Memory Mechanism
4.3. Supporting Lightweighting Techniques
4.3.1. Compact Neural Network Architecture and Quantization
4.3.2. Efficient Target Network Updates
4.4. State and Action Representation
4.4.1. State Normalization
4.4.2. Action Discretization
4.5. Training Procedure
4.6. Implementation Considerations for IIoT Devices
- Memory Footprint Optimization: Quantized weights are stored in flash memory (often more plentiful than RAM). In-place computations for activations and direct matrix operations on quantized values further reduce RAM usage by avoiding intermediate dequantized copies.
- Computational Efficiency: Activation functions are implemented via lookup tables to bypass costly floating-point operations on MCUs lacking hardware FPUs. “Batch-free” inference (one sample at a time) minimizes peak RAM. Early stopping heuristics during inference can prune unnecessary computations if an action can be determined with partial evaluation.
- Energy-Aware Execution: The inference frequency itself is adapted based on battery level and environmental stability. Inference is coordinated with data transmission to utilize active processor states. Between decisions, the device enters optimized low-power modes, selected based on the predicted time to the next decision.
Algorithm 1 GM-DDQN Training Procedure |
|
4.7. Theoretical Analysis
5. Experiments
5.1. Experimental Setup
5.1.1. Simulation Environment
- Thermal Power Plant Environment Model: Incorporating electromagnetic interference patterns (baseline, variable, and burst), equipment vibration dynamics (normal, fluctuating, and anomalous), temperature variations, and their impact on path loss characteristics.
- Wireless Network Simulator: Modeling packet transmission success/failure based on SINR, channel dynamics (path loss influenced by temperature, as per Section 3.2.3), and interference effects.
- IIoT Device Energy Consumption Model: Accounting for energy consumed during transmission, reception, data processing, and different sleep modes, as detailed in Section 3.3.
- Hardware Resource Simulator: Tracking the computational (e.g., CPU cycles or equivalent time) and memory (ROM/RAM) footprint during algorithm execution on a target IIoT node.
5.1.2. Test Scenarios
- Scenario 1: High-Interference Environment. This scenario simulates operations near high-power electrical equipment, characterized by strong electromagnetic interference. The SINR fluctuates significantly, typically between 5 and 15 dB, due to frequent and intense interference bursts, demanding robust adaptive transmission strategies from the algorithm.
- Scenario 2: Variable Vibration Monitoring. This scenario focuses on applications like turbine health monitoring, where equipment vibration patterns frequently switch between normal operational levels and abnormal states (indicative of potential faults). This requires the algorithm to dynamically adjust its data sampling (implicitly linked to data generation for transmission) and transmission policies to capture critical events while conserving energy during quiescent periods.
- Scenario 3: Temperature-Constrained Deployment. This scenario models the dual impact of high ambient temperatures on IIoT nodes: (a) temperature-dependent variations in wireless channel path loss (as described in Section 3.2.3), and (b) accelerated battery discharge rates at elevated temperatures.
5.1.3. Implementation Details
- Processor: ARM Cortex-M4F operating at 48 MHz.
- Memory: 32 KB RAM and 256 KB flash.
- Radio: IEEE 802.15.4-compliant [59].
5.2. Baseline Methods for Comparison
- Fixed Policy (FP): A non-adaptive strategy using a fixed transmission power (10 dBm) and a predetermined periodic sleep schedule. This represents a basic IIoT deployment.
- Threshold-Based Adaptive (TBA): Adjusts transmission power based on observed PSR and sleep intervals based on vibration thresholds. This reflects common adaptive practices in current industrial IoT.
- Standard DDQN: A full-scale DDQN with 64 neurons per hidden layer and an experience replay buffer of 10,000 transitions, serving as a performance benchmark without strict resource constraints.
- Deep Deterministic Policy Gradient (DDPG): An advanced actor–critic RL method for continuous control problems.
- Proximal Policy Optimization (PPO): A stable policy gradient RL method widely used in various tasks.
- Q-Learning with Function Approximation (Q-FA): A lightweight RL method using linear function approximation instead of deep networks, offering low computational cost but limited representational power.
5.3. Evaluation Metrics
- Energy Efficiency:
- Energy consumption per successfully transmitted packet (mJ/packet).
- Average power consumption (mW).
- Estimated battery lifetime (months).
- Distribution of energy consumption across components (e.g., transmission, processing, and sleep), which will be detailed with the results where applicable.
- Communication Performance:
- PSR (%).
- End-to-end latency (ms).
- Throughput (kbps).
- Adaptability to interference bursts (e.g., time for PSR to stabilize after a burst).
- Resource Utilization (for on-device deployment profile):
- Memory footprint (ROM and RAM in KB).
- Computation time per decision-making step (ms).
- Energy overhead of algorithm execution per decision (mJ/decision).
- Initialization time (s).
- Learning Performance (for RL methods):
- Convergence speed (number of training episodes required).
- Cumulative reward achieved during training and evaluation.
- Policy stability (e.g., consistency of actions in similar states, assessed qualitatively and by observing reward variance).
- Adaptation time to significant environmental changes (recovery duration).
5.4. Experimental Results and Analysis
5.4.1. Energy Efficiency Performance
5.4.2. Communication Performance
5.4.3. Resource Utilization
5.4.4. Learning Performance
5.4.5. Performance Analysis in Specific Scenarios
- High-Interference Environment (Scenario 1): GM-DDQN effectively modulated its transmission power in response to dynamic interference levels. It increased power during interference peaks to maintain high PSR and reduced power during lulls to conserve energy. This outperformed the fixed policy, which wasted energy, and the Threshold-Based Adaptive strategy, which responded more slowly to sudden interference bursts.
- Variable Vibration Monitoring (Scenario 2): GM-DDQN adeptly managed sleep modes according to vibration intensity. It remained active during high vibration, transitioned to light sleep for moderate vibration, and entered deep sleep during normal (low) vibration periods. This strategy ensured the capture of critical data while maximizing energy savings, proving superior to the sub-optimal switching of the TBA strategy and the non-adaptive fixed policy.
- Temperature-Constrained Deployment (Scenario 3): GM-DDQN demonstrated the best resilience to temperature variations. Its energy consumption increased by only 32% as temperatures rose from 40 °C to 120 °C. This minimal increase, compared to the other algorithms, highlights its superior ability to learn and adapt to the complex interplay between environmental factors (like temperature-induced path loss changes) and optimal operational strategies.
5.4.6. Ablation Study and Component Analysis
- Without Gradient Memory: Removing the gradient memory and reverting to a simple single-sample update leads to a noticeable increase in energy consumption. This result validates the efficacy of our core innovation. As theorized in Section 4.7, the gradient memory provides a smoothed history-aware gradient estimate. This approach mitigates the high variance inherent in updates based on single, potentially noisy, state transitions, leading to a more stable learning process and a more consistently energy-efficient policy.
- Without Quantization: This variant reveals a crucial performance–resource trade-off. While forgoing quantization yields a marginal improvement in energy efficiency (a 3.7% reduction in mJ/packet), it causes a prohibitive 217% increase in computation time (from 12 ms to 38 ms). The performance impact of quantization is minimal because the learned Q-function is robust; the optimal actions in this problem space are determined by significant differences in Q-values, not by fine-grained distinctions that would be lost to 8-bit precision. This finding is consistent with the established literature, demonstrating that robust neural networks are highly amenable to quantization. Therefore, the immense computational gain, which is essential for on-device deployment, far outweighs the negligible loss in policy optimality.
- Single Hidden Layer: Using only a single hidden layer results in a significant 14% degradation in energy efficiency. This is attributed to insufficient representational capacity. A single-layer network struggles to model the complex non-linear relationships between the diverse state variables (e.g., SINR, vibration, and temperature) and the optimal action. The second hidden layer is crucial for creating hierarchical feature representations, allowing the model to capture the intricate trade-offs demanded by our multi-objective reward function, a task for which a flatter architecture is ill-equipped.
- Without Target Network: The removal of the target network causes the most substantial performance degradation, with a 25% decrease in energy efficiency. This instability arises because, without a stable separate target network, the learning target, , becomes highly non-stationary. The same parameters () are used to both estimate the current Q-value and calculate the target value, creating a destructive feedback loop that leads to severe learning instability and oscillations. The use of a slowly updated target network is therefore indispensable for decoupling the updates and ensuring stable convergence to a high-quality policy, a finding consistent with the foundational principles of deep Q-learning.
6. Discussion
6.1. Summary of Experimental Results and Their Significance
6.2. Research Contributions
6.3. Limitations and Future Work
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sarjan, H.; Ameli, A.; Ghafouri, M. Cyber-security of industrial internet of things in electric power systems. IEEE Access 2022, 10, 92390–92409. [Google Scholar] [CrossRef]
- Khan, W.Z.; Rehman, M.; Zangoti, H.M.; Afzal, M.K.; Armi, N.; Salah, K. Industrial internet of things: Recent advances, enabling technologies and open challenges. Comput. Electr. Eng. 2020, 81, 106522. [Google Scholar] [CrossRef]
- Erhueh, O.V.; Elete, T.; Akano, O.A.; Nwakile, C.; Hanson, E. Application of Internet of Things (IoT) in energy infrastructure: Lessons for the future of operations and maintenance. Compr. Res. Rev. Sci. Technol. 2024, 2, 28–54. [Google Scholar] [CrossRef]
- Qiu, F.; Kumar, A.; Hu, J.; Sharma, P.; Tang, Y.B.; Xiang, Y.X.; Hong, J. A Review on Integrating IoT, IIoT, and Industry 4.0: A Pathway to Smart Manufacturing and Digital Transformation. IET Inf. Secur. 2025, 2025, 9275962. [Google Scholar] [CrossRef]
- Majhi, A.A.K.; Mohanty, S. A Comprehensive Review on Internet of Things Applications in Power Systems. IEEE Internet Things J. 2024, 11, 34896–34923. [Google Scholar] [CrossRef]
- Hu, Y.; Jia, Q.; Yao, Y.; Lee, Y.; Lee, M.; Wang, C.; Zhou, X.; Xie, R.; Yu, F.R. Industrial internet of things intelligence empowering smart manufacturing: A literature review. IEEE Internet Things J. 2024, 11, 19143–19167. [Google Scholar] [CrossRef]
- Tabaa, M.; Monteiro, F.; Bensag, H.; Dandache, A. Green Industrial Internet of Things from a smart industry perspectives. Energy Rep. 2020, 6, 430–446. [Google Scholar] [CrossRef]
- Abdullahi, I.; Longo, S.; Samie, M. Towards a distributed digital twin framework for predictive maintenance in industrial internet of things (IIoT). Sensors 2024, 24, 2663. [Google Scholar] [CrossRef]
- Mao, W.; Zhao, Z.; Chang, Z.; Min, G.; Gao, W. Energy-efficient industrial internet of things: Overview and open issues. IEEE Trans. Ind. Inform. 2021, 17, 7225–7237. [Google Scholar] [CrossRef]
- Zhang, C.; Li, W.; Zhang, H.; Zhan, T. Recent Advances in Intelligent Data Analysis and Its Applications. Electronics 2024, 13, 226. [Google Scholar] [CrossRef]
- Liu, X.; Xu, F.; Ning, L.; Lv, Y.; Zhao, C. A Novel Sensor Deployment Strategy Based on Probabilistic Perception for Industrial Wireless Sensor Network. Electronics 2024, 13, 4952. [Google Scholar] [CrossRef]
- D’Agostino, P.; Violante, M.; Macario, G. A Scalable Fog Computing Solution for Industrial Predictive Maintenance and Customization. Electronics 2024, 14, 24. [Google Scholar] [CrossRef]
- Dong, Z.; Cao, Y.; Xiong, N.; Dong, P. EE-MPTCP: An Energy-Efficient Multipath TCP Scheduler for IoT-based power grid monitoring systems. Electronics 2022, 11, 3104. [Google Scholar] [CrossRef]
- Majid, M.; Habib, S.; Javed, A.R.; Rizwan, M.; Srivastava, G.; Gadekallu, T.R.; Lin, J.C.W. Applications of wireless sensor networks and internet of things frameworks in the industry revolution 4.0: A systematic literature review. Sensors 2022, 22, 2087. [Google Scholar] [CrossRef]
- Foukalas, F.; Pop, P.; Theoleyre, F.; Boano, C.A.; Buratti, C. Dependable wireless industrial IoT networks: Recent advances and open challenges. In Proceedings of the 2019 IEEE European Test Symposium (ETS), Baden-Baden, Germany, 27–31 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–10. [Google Scholar]
- Hudda, S.; Haribabu, K. A review on WSN based resource constrained smart IoT systems. Discov. Internet Things 2025, 5, 56. [Google Scholar] [CrossRef]
- Mohapatra, A.G.; Mohanty, A.; Pradhan, N.R.; Mohanty, S.N.; Gupta, D.; Alharbi, M.; Alkhayyat, A.; Khanna, A. An Industry 4.0 implementation of a condition monitoring system and IoT-enabled predictive maintenance scheme for diesel generators. Alex. Eng. J. 2023, 76, 525–541. [Google Scholar] [CrossRef]
- Aragonés, R.; Oliver, J.; Malet, R.; Oliver-Parera, M.; Ferrer, C. Model and Implementation of a Novel Heat-Powered Battery-Less IIoT Architecture for Predictive Industrial Maintenance. Information 2024, 15, 330. [Google Scholar] [CrossRef]
- Chaudhari, S.S.; Bhole, K.S.; Rane, S.B. Industrial Automation and Data Processing Techniques in IoT-Based Digital Twin Design for Thermal Equipment: A case study. J. Inst. Eng. (India) Ser. C 2025, 106, 553–569. [Google Scholar] [CrossRef]
- Aragonés, R.; Oliver, J.; Ferrer, C. Enhanced Heat-Powered Batteryless IIoT Architecture with NB-IoT for Predictive Maintenance in the Oil and Gas Industry. Sensors 2025, 25, 2590. [Google Scholar] [CrossRef]
- Zhang, J.; Wang, Y.; Yang, Y.; Ma, Y.; Dai, Z. Fault diagnosis and intelligent maintenance of industry 4.0 power system based on internet of things technology and thermal energy optimization. Therm. Sci. Eng. Prog. 2024, 55, 102902. [Google Scholar] [CrossRef]
- Prosper, J. Development of Wireless Temperature Sensing Systems for Rotating Equipment in Harsh Environments. 2023. [Google Scholar]
- Dagnino, A. Data Analytics in the Era of the Industrial Internet of Things; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Balali, F.; Nouri, J.; Nasiri, A.; Zhao, T. Data Intensive Industrial Asset Management; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Yılmaz, M.Y.; Üstüner, B.; Gül, Ö.M.; Çırpan, H.A. Sustainable Communication in 5G/6G Wireless Sensor Networks: A Survey on Energy-Efficient Collaborative Routing. ITU J. Wirel. Commun. Cybersecur. 2025, 2, 11–26. [Google Scholar]
- Dvir, E.; Shifrin, M.; Gurewitz, O. Cooperative Multi-Agent Reinforcement Learning for Data Gathering in Energy-Harvesting Wireless Sensor Networks. Mathematics 2024, 12, 2102. [Google Scholar] [CrossRef]
- O’Reilly, C.; Gluhak, A.; Imran, M.A.; Rajasegarar, S. Anomaly detection in wireless sensor networks in a non-stationary environment. IEEE Commun. Surv. Tutor. 2014, 16, 1413–1432. [Google Scholar] [CrossRef]
- Hicheri, R.; Abdelgawwad, A.; Pätzold, M. A non-stationary relay-based 3D MIMO channel model with time-variant path gains for human activity recognition in indoor environments. Ann. Telecommun. 2021, 76, 827–837. [Google Scholar] [CrossRef]
- Careem, M.A.A.; Dutta, A. Real-time prediction of non-stationary wireless channels. IEEE Trans. Wirel. Commun. 2020, 19, 7836–7850. [Google Scholar] [CrossRef]
- Singh, S.P.; Kumar, N.; Kumar, G.; Balusamy, B.; Bashir, A.K.; Al-Otaibi, Y.D. A hybrid multi-objective optimisation for 6G-enabled Internet of Things (IoT). IEEE Trans. Consum. Electron. 2024, 71, 1307–1318. [Google Scholar] [CrossRef]
- Vijayalakshmi, K.; Maheshwari, A.; Saravanan, K.; Vidyasagar, S.; Kalyanasundaram, V.; Sattianadan, D.; Bereznychenko, V.; Narayanamoorthi, R. A novel network lifetime maximization technique in WSN using energy efficient algorithms. Sci. Rep. 2025, 15, 10644. [Google Scholar] [CrossRef]
- Hamzei, M.; Khandagh, S.; Jafari Navimipour, N. A quality-of-service-aware service composition method in the internet of things using a multi-objective fuzzy-based hybrid algorithm. Sensors 2023, 23, 7233. [Google Scholar] [CrossRef]
- Singh, S.P.; Kumar, N.; Kumar, G.; Balusamy, B.; Bashir, A.K.; Al Dabel, M.M. Enhancing Quality of Service in IoT-WSN through Edge-Enabled Multi-Objective Optimization. IEEE Trans. Consum. Electron. 2025. [Google Scholar] [CrossRef]
- Hazra, A.; Tummala, V.M.R.; Mazumdar, N.; Sah, D.K.; Adhikari, M. Deep reinforcement learning in edge networks: Challenges and future directions. Phys. Commun. 2024, 66, 102460. [Google Scholar] [CrossRef]
- Kornaros, G. Hardware-assisted machine learning in resource-constrained IoT environments for security: Review and future prospective. IEEE Access 2022, 10, 58603–58622. [Google Scholar] [CrossRef]
- Chen, W.; Qiu, X.; Cai, T.; Dai, H.N.; Zheng, Z.; Zhang, Y. Deep reinforcement learning for Internet of Things: A comprehensive survey. IEEE Commun. Surv. Tutor. 2021, 23, 1659–1692. [Google Scholar] [CrossRef]
- Sagar, A.S.; Islam, M.Z.; Haider, A.; Kim, H.S. Uncertainty-aware federated reinforcement learning for optimizing accuracy and energy in heterogeneous industrial IoT. Appl. Sci. 2024, 14, 8299. [Google Scholar] [CrossRef]
- Yadav, R.K.; Malavika, V.; Rajendran, P.S. A Novel Approach to Optimize Energy Consumption in Industries Using IIoT and Machine Learning. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
- Hou, L.; Tan, S.; Zhang, Z.; Bergmann, N.W. Thermal energy harvesting WSNs node for temperature monitoring in IIoT. IEEE Access 2018, 6, 35243–35249. [Google Scholar] [CrossRef]
- Farné, S.; Bassi, E.; Benzi, F.; Compagnoni, F. IIoT based efficiency monitoring of a Gantry robot. In Proceedings of the 2016 IEEE 14th International Conference on Industrial Informatics (INDIN), Poitiers, France, 19–21 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 714–719. [Google Scholar]
- Pradhan, A.; Das, S.; Piran, M.J. Blocklength optimization and power allocation for energy-efficient and secure URLLC in industrial IoT. IEEE Internet Things J. 2023, 11, 9420–9431. [Google Scholar] [CrossRef]
- Solati, A.; Moghaddam, J.Z.; Ardebilipour, M. An Energy Efficiency Method in UAV-Assisted IIoT Network. In Proceedings of the 2023 7th International Conference on Internet of Things and Applications (IoT), Isfahan, Iran, 25–26 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Zhou, Z.; Zhang, C.; Xu, C.; Xiong, F.; Zhang, Y.; Umer, T. Energy-efficient industrial internet of UAVs for power line inspection in smart grid. IEEE Trans. Ind. Inform. 2018, 14, 2705–2714. [Google Scholar] [CrossRef]
- Jiang, D.; Wang, Y.; Lv, Z.; Wang, W.; Wang, H. An energy-efficient networking approach in cloud services for IIoT networks. IEEE J. Sel. Areas Commun. 2020, 38, 928–941. [Google Scholar] [CrossRef]
- Mamaghani, A.H.; Najafi, B.; Casalegno, A.; Rinaldi, F. Optimization of an HT-PEM fuel cell based residential micro combined heat and power system: A multi-objective approach. J. Clean. Prod. 2018, 180, 126–138. [Google Scholar] [CrossRef]
- Ma, Y.; Liu, J.; Zhu, L.; Li, Q.; Guo, Y.; Liu, H.; Yu, D. Multi-objective performance optimization and control for gas turbine Part-load operation Energy-saving and NOx emission reduction. Appl. Energy 2022, 320, 119296. [Google Scholar] [CrossRef]
- Kumar, R. A critical review on energy, exergy, exergoeconomic and economic (4-E) analysis of thermal power plants. Eng. Sci. Technol. Int. J. 2017, 20, 283–292. [Google Scholar] [CrossRef]
- Qu, M.; Pan, L.; Lu, L.; Wang, J.; Tang, Y.; Chen, X. Study on thermal cycle efficiency improvement of secondary-loop in nuclear power plants based on dual-region topology optimization. Int. Commun. Heat Mass Transf. 2024, 159, 108183. [Google Scholar] [CrossRef]
- Liu, Z.; Zhang, H.; Jin, X.; Zheng, S.; Li, R.; Guan, H.; Shao, J. Thermal economy analysis and multi-objective optimization of a small CO2 transcritical pumped thermal electricity storage system. Energy Convers. Manag. 2023, 293, 117451. [Google Scholar] [CrossRef]
- Cacciali, L.; Battisti, L.; Benini, E. Maximizing Efficiency in Compressed Air Energy Storage: Insights from Thermal Energy Integration and Optimization. Energies 2024, 17, 1552. [Google Scholar] [CrossRef]
- Zhang, W.; He, Y.; Zhang, T.; Ying, C.; Kang, J. Intelligent resource adaptation for diversified service requirements in industrial IoT. IEEE Trans. Cogn. Commun. Netw. 2024. [Google Scholar] [CrossRef]
- Dridi, A.; Afifi, H.; Moungla, H.; Badosa, J. A novel deep reinforcement approach for IIoT microgrid energy management systems. IEEE Trans. Green Commun. Netw. 2021, 6, 148–159. [Google Scholar] [CrossRef]
- Dolatabadi, A.; Abdeltawab, H.; Mohamed, Y.A.R.I. A novel model-free deep reinforcement learning framework for energy management of a PV integrated energy hub. IEEE Trans. Power Syst. 2022, 38, 4840–4852. [Google Scholar] [CrossRef]
- Chen, J.; Mi, J.; Guo, C.; Fu, Q.; Tang, W.; Luo, W.; Zhu, Q. Research on Offloading and Resource Allocation for MEC with Energy Harvesting Based on Deep Reinforcement Learning. Electronics 2025, 14, 1911. [Google Scholar] [CrossRef]
- Yi, M.; Lin, M.; Chen, W. Network Function Placement in Virtualized Radio Access Network with Reinforcement Learning Based on Graph Neural Network. Electronics 2025, 14, 1686. [Google Scholar] [CrossRef]
- Cicek, D.; Simsek, M.; Kantarci, B. Machine Learning-Driven Truck–Drone Collaborative Delivery for Time-and Energy-Efficient Last-Mile Deliveries. Electronics 2025, 14, 2026. [Google Scholar] [CrossRef]
- Rahman, S.; Akter, S.; Yoon, S. A Deep Q-Learning Based UAV Detouring Algorithm in a Constrained Wireless Sensor Network Environment. Electronics 2024, 14, 1. [Google Scholar] [CrossRef]
- Singh, S.; Ganorkar, A.M.; Anujhna, B. Enhancing Remote Oversight of Plants with a Compact IIoT System. In Proceedings of the 2023 IEEE International Conference on ICT in Business Industry & Government (ICTBIG), Indore, India, 8–9 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]
- IEEE Std 802.15.4-2020; IEEE Standard for Low-Rate Wireless Networks. IEEE: New York, NY, USA, 2020.
Memory Component | Std. DDQN (10k Samples) | GM-DDQN () |
---|---|---|
Experience/Gradient Storage | ∼480 KB (for transitions) | ∼4.5 KB (for gradients) |
Network Parameters (Quantized) | ∼21.3 KB (float32) | ∼0.56 KB (int8) |
Approx. Total On-Device | ∼501.3 KB | ∼5.06 KB |
Network Layer | Std. DDQN () | GM-DDQN () | Reduction |
---|---|---|---|
Input to Hidden 1 | 75.0% | ||
Hidden 1 to Hidden 2 | 4160 | 93.5% | |
Hidden 2 to Output | 73.8% | ||
Total Parameters | 5324 | 572 | 89.3% |
Category | Parameter | Value |
---|---|---|
Environment | Baseline interference () | −90 dBm |
Interference burst amplitude () | −80 to −60 dBm | |
Normal vibration level () | 20 Hz | |
Abnormal vibration level () | 60–80 Hz | |
Temperature range () | 40 °C to 120 °C | |
Network | Path loss exponent () | 3.5 |
Path loss at reference distance () | −40 dBm | |
Temperature coefficient () | 0.002 | |
Noise floor () | −100 dBm | |
Data packet size (L) | 128 bytes | |
IIoT Device | Battery capacity () | 1000 mAh |
Tx power levels () | dBm | |
Sleep mode power () | mW | |
Wake-up delay () | ms | |
Processor frequency | 48 MHz | |
Algorithm (GM-DDQN) | Learning rate () | 0.001 |
Discount factor () | 0.95 | |
Target network update rate () | 0.05 | |
Gradient memory size (m) | 8 |
Method | Scenario 1 | Scenario 2 | Scenario 3 |
---|---|---|---|
Fixed Policy (FP) | 8.2 | 9.1 | 7.5 |
Threshold-Based Adaptation (TBA) | 10.3 | 11.7 | 9.6 |
Q-FA | 11.8 | 12.5 | 10.2 |
GM-DDQN (Proposed) | 14.2 | 15.1 | 13.6 |
Standard DDQN | 14.8 | 15.7 | 14.1 |
DDPG | 15.0 | 15.9 | 14.3 |
PPO | 14.7 | 15.6 | 14.0 |
Method | ROM (KB) | RAM (KB) | Comp. Time (ms) | Energy Overhead (mJ/decision) |
---|---|---|---|---|
Fixed Policy (FP) | 0.5 | 0.1 | 0.02 | 0.001 |
Threshold-Based Adaptation (TBA) | 1.2 | 0.3 | 0.05 | 0.002 |
Q-FA | 2.3 | 1.2 | 0.8 | 0.04 |
GM-DDQN (Proposed) | 4.8 | 2.4 | 12 | 0.6 |
Standard DDQN | 42 | 520 | 85 | 4.2 |
DDPG | 78 | 620 | 120 | 6.0 |
PPO | 65 | 580 | 95 | 4.7 |
Method Variant | Energy Cons. (mJ/packet) | PSR (%) | Comp. Time (ms) |
---|---|---|---|
Full GM-DDQN | 0.28 | 96.5 | 12 |
Without Gradient Memory | 0.30 | 95.8 | 15 |
Without Quantization | 0.27 | 96.7 | 38 |
Single Hidden Layer | 0.32 | 94.2 | 8 |
Without Target Network | 0.35 | 92.3 | 10 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, S.; Zou, Y.; Feng, L. A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants. Electronics 2025, 14, 2569. https://doi.org/10.3390/electronics14132569
Gao S, Zou Y, Feng L. A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants. Electronics. 2025; 14(13):2569. https://doi.org/10.3390/electronics14132569
Chicago/Turabian StyleGao, Shuang, Yuntao Zou, and Li Feng. 2025. "A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants" Electronics 14, no. 13: 2569. https://doi.org/10.3390/electronics14132569
APA StyleGao, S., Zou, Y., & Feng, L. (2025). A Lightweight Double-Deep Q-Network for Energy Efficiency Optimization of Industrial IoT Devices in Thermal Power Plants. Electronics, 14(13), 2569. https://doi.org/10.3390/electronics14132569