Reinforcement Learning for Industrial Automation: A Comprehensive Review of Adaptive Control and Decision-Making in Smart Factories

Alginahi, Yasser M.; Sabri, Omar; Said, Wael

doi:10.3390/machines13121140

Open AccessReview

Reinforcement Learning for Industrial Automation: A Comprehensive Review of Adaptive Control and Decision-Making in Smart Factories

by

Yasser M. Alginahi

¹

,

Omar Sabri

^2,*

and

Wael Said

^3,4

¹

Department of Computer Science, Adrian College, Adrian, MI 49221, USA

²

Zekelman School of Business & IT, St. Clair College, Windsor, ON N9A 6S4, Canada

³

Computer Science Department, Faculty of Computers and Informatics, Zagazig University, Zagazig 44511, Egypt

⁴

Computer Science Department, College of Computer Science and Engineering, Taibah University, Madinah 42353, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Machines 2025, 13(12), 1140; https://doi.org/10.3390/machines13121140

Submission received: 12 November 2025 / Revised: 9 December 2025 / Accepted: 10 December 2025 / Published: 15 December 2025

(This article belongs to the Special Issue Recent Developments in Machine Design, Automation and Robotics, Second Edition)

Download

Browse Figures

Versions Notes

Abstract

The accelerating integration of Artificial Intelligence (AI) in Industrial Automation has established Reinforcement Learning (RL) as a transformative paradigm for adaptive control, intelligent optimization, and autonomous decision-making in smart factories. Despite the growing literature, existing reviews often emphasize algorithmic performance or domain-specific applications, neglecting broader links between methodological evolution, technological maturity, and industrial readiness. To address this gap, this study presents a bibliometric review mapping the development of RL and Deep Reinforcement Learning (DRL) research in Industrial Automation and robotics. Following the PRISMA 2020 protocol to guide the data collection procedures and inclusion criteria, 672 peer-reviewed journal articles published between 2017 and 2026 were retrieved from Scopus, ensuring high-quality, interdisciplinary coverage. Quantitative bibliometric analyses were conducted in R using Bibliometrix and Biblioshiny, including co-authorship, co-citation, keyword co-occurrence, and thematic network analyses, to reveal collaboration patterns, influential works, and emerging research trends. Results indicate that 42% of studies employed DRL, 27% focused on Multi-Agent RL (MARL), and 31% relied on classical RL, with applications concentrated in robotic control (33%), process optimization (28%), and predictive maintenance (19%). However, only 22% of the studies reported real-world or pilot implementations, highlighting persistent challenges in scalability, safety validation, interpretability, and deployment readiness. By integrating a review with bibliometric mapping, this study provides a comprehensive taxonomy and a strategic roadmap linking theoretical RL research with practical industrial applications. This roadmap is structured across four critical dimensions: (1) Algorithmic Development (e.g., safe, explainable, and data-efficient RL), (2) Integration Technologies (e.g., digital twins and IoT), (3) Validation Maturity (from simulation to real-world pilots), and (4) Human-Centricity (addressing trust, collaboration, and workforce transition). These insights can guide researchers, engineers, and policymakers in developing scalable, safe, and human-centric RL solutions, prioritizing research directions, and informing the implementation of Industry 5.0–aligned intelligent automation systems emphasizing transparency, sustainability, and operational resilience.

Keywords:

reinforcement learning; deep reinforcement learning; industrial automation; robotics; multi-agent systems

1. Introduction

Reinforcement Learning (RL), a subfield of Machine Learning (ML), has emerged as a transformative methodology for developing adaptive and autonomous decision-making systems within complex industrial environments. Unlike traditional supervised or unsupervised learning techniques that depend on static, pre-labeled datasets, RL enables an agent to interact dynamically with its environment, learning optimal strategies through trial and error to maximize long-term rewards [1]. This unique capability allows RL systems to operate effectively under conditions of uncertainty, nonlinearity, and dynamic variability, i.e., features that define modern industrial systems [2]. In the era of Industry 4.0 and the transition toward Industry 5.0, RL has become a cornerstone technology in the development of intelligent manufacturing systems, offering the potential to enhance productivity, flexibility, and resilience through autonomous control and real-time optimization [3,4].

While Industry 4.0 established the infrastructure for data-driven automation, the emerging paradigm of Industry 5.0 complements this by emphasizing human-centricity, sustainability, and operational resilience [5,6]. This shift places new demands on RL systems, moving beyond pure performance optimization. For RL to be truly industrially ready, it must now also address interpretability to foster human trust, ensure safe collaboration in human–robot teams, and optimize for energy efficiency and long-term system resilience. Consequently, challenges such as scalability and safety are not merely technical bottlenecks but fundamental barriers to achieving this human–AI collaborative vision.

Industrial Automation is undergoing a profound transformation driven by the integration of cyber-physical systems (CPS), the Internet of Things (IoT), cloud computing, and AI. In this context, RL serves as a vital bridge between perception and control, enabling machines to make context-aware decisions, adapt to changing environments, and optimize operational efficiency. Smart factories, relying on distributed sensor networks and interconnected robotic systems, increasingly adopt RL-based frameworks to support adaptive process control, energy management, quality inspection, and predictive maintenance. The ability of RL to learn from continuous interaction makes RL particularly valuable in applications where explicit modeling is infeasible or cost-prohibitive due to complexity and stochastic system dynamics.

Classical RL algorithms, such as Q-Learning and SARSA (State Action Reward State Action refers to the sequence of events used to update the Q-value), represent the foundation of this paradigm and remain relevant for industrial tasks characterized by discrete state and action spaces. Their simplicity, interpretability, and low computational overhead make them well suited for process control, material transport, and rule-based decision systems [7,8]. However, classical approaches struggle with scalability in high-dimensional environments where the state-action space becomes exponentially large. To overcome these limitations, DRL integrates Neural Networks (NNs) with RL principles, enabling systems to approximate complex value functions and learn directly from raw, high-dimensional sensory data such as images, trajectories, or sensor streams [6,9,10].

Notable DRL algorithms, such as the Deep Q-Network (DQN), Proximal Policy Optimization (PPO), and Soft Actor–Critic (SAC), have demonstrated superior performance in complex control environments. These algorithms enable autonomous robots to execute tasks like high-speed pick-and-place operations, robotic assembly, and adaptive motion planning, often outperforming conventional Proportional–Integral–Derivative (PID) controller or rule-based controllers [11,12,13,14]. Their robustness, stability, and capacity for continuous learning make them suitable for process optimization in manufacturing lines, energy systems, and logistics. In particular, PPO and SAC have been widely applied for energy-efficient process control and dynamic resource allocation, where continuous action spaces and environmental uncertainty demand adaptability and robustness.

MARL, which constitutes 27% of the reviewed studies, extends these principles to collaborative industrial systems. While enabling decentralized control in production lines and warehouses, MARL introduces specific challenges distinct from single-agent settings. A primary issue is environment non-stationarity, where the environment becomes unpredictable from any single agent’s perspective as all other agents are simultaneously learning, complicating convergence. Secondly, the credit assignment problem, determining each agent’s contribution to a shared global reward, is non-trivial and critical for effective coordination. Finally, designing scalable communication architectures that prevent bottlenecks is essential for large-scale deployments [15,16,17]. Promising solutions emerging in the literature include Centralized Training with Decentralized Execution (CTDE) paradigms and attention mechanisms to manage complex multi-agent interactions.

Beyond these mainstream approaches, recent developments have introduced specialized frameworks such as hierarchical and modular RL, safe RL, and digital twin–integrated RL, which further expand the operational scope of RL in Industrial Automation [18,19]. Hierarchical RL decomposes complex manufacturing tasks into manageable subtasks, thereby improving sample efficiency and interpretability. Modular RL architectures facilitate knowledge reuse across tasks, accelerating learning in environments where similar subtasks recur. Safe RL incorporates formal safety constraints into the learning process, ensuring that exploratory actions do not cause instability or physical damage, an essential requirement for high-risk applications such as chemical processing or autonomous machining. Meanwhile, digital twin integration enables real-time simulation and virtual experimentation, allowing policies to be tested and refined in digital replicas of industrial systems before physical deployment.

Aligning with the goals of Industry 5.0, a growing body of research focuses on Human-Centric RL, which prioritizes the integration of the human operator into the learning and operational loop. A primary challenge is the ‘black-box’ nature of DRL policies, which undermines operator trust and compliance [20]. To address this, Explainable Reinforcement Learning (XRL) methods are being developed to provide post hoc explanations or create intrinsically interpretable models, making the agent’s decision-making process transparent to human supervisors [17].

Furthermore, Human-in-the-Loop RL (HITL-RL) frameworks allow real-time human feedback to guide the learning process, improving sample efficiency and aligning outcomes with human intuition and ergonomic constraints [17]. This is particularly critical in applications like collaborative assembly, where the robot must adapt not only to the task but also to the human partner’s behavior, pace, and physical comfort, thereby preventing strain and injury. These approaches mark a significant shift from RL systems that operate autonomously to those that operate collaboratively, fostering the trust and transparency required for widespread adoption in human-populated industrial environments.

Despite these promising advancements, significant barriers continue to hinder the widespread adoption of RL in industrial settings. Sample inefficiency remains one of the most critical challenges: many RL algorithms require millions of interactions to achieve convergence, which is impractical for physical industrial systems due to time, cost, and wear considerations. Safety and reliability are also paramount; as untested policies may cause accidents or equipment damage. Interpretability and transparency further limit adoption, as most DRL models act as ‘black boxes,’ offering limited insight into decision-making logic, a major concern for compliance and regulatory approval [8,21]. Moreover, the integration of RL into legacy infrastructures poses additional obstacles, as many factories operate with rigid control architectures and outdated communication protocols. Achieving seamless integration requires standardized interfaces, hybrid architectures, and interoperable software frameworks that can connect RL controllers to existing industrial systems.

While numerous studies have explored specific RL algorithms or applications within manufacturing and robotics, there remains a lack of comprehensive synthesis that connects algorithmic evolution with industrial readiness, integration technologies, and deployment challenges. Existing reviews often focus narrowly on algorithmic performance or domain-specific use cases without providing a holistic perspective that integrates methodological trends, industrial applicability, and future opportunities [5,9,22,23]. Accordingly, this study is guided by the following overarching research question:

“How has RL evolved as an enabling technology for Industrial Automation, and what strategies, technologies, and research trends are shaping its real-world deployment in smart factories?”

This question guides the analysis toward understanding both the theoretical and applied dimensions of RL in Industrial Automation, emphasizing methodological advancements, integration challenges, and emerging paradigms that bridge research and practice. Building upon this motivation, the objectives of this study are fourfold:

To review RL research in Industrial Automation, encompassing its algorithms, frameworks, and application domains.
To identify and categorize the major challenges hindering industrial adoption, including issues of safety, data efficiency, interpretability, and system integration.
To synthesize emerging strategies and frameworks such as digital twin-assisted RL, hierarchical and modular architectures, safe RL, and transfer learning techniques that address these limitations.
To highlight future research opportunities and industrial directions for RL within the broader context of Industry 4.0 and the transition toward Industry 5.0, focusing on human-centered, resilient, and sustainable automation systems [5,24,25].

The scope of this research encompasses peer-reviewed publications from major scientific databases that address RL and its applications within Industrial Automation contexts. These include process control, robotic manipulation, supply chain optimization, energy management, and predictive maintenance. The review excludes purely theoretical or non-industrial studies to maintain focus on practical relevance and deployment feasibility. Furthermore, the review examines both classical and advanced RL frameworks ranging from value-based and policy-gradient methods to hybrid, hierarchical, and multi-agent architectures. This broad scope ensures that the analysis captures not only algorithmic developments but also their integration with enabling technologies such as digital twins, IoT systems, and edge computing platforms. Taken together, this study contributes to the growing body of literature on intelligent industrial systems by offering an integrated perspective that connects RL theory, empirical application, and technological trend analysis. By emphasizing both methodological evolution and practical implications, it aims to inform future research and guide stakeholders—engineers, system designers, and policymakers—toward more efficient, interpretable, and human-centered AI-driven automation solutions. Ultimately, RL represents not only a computational paradigm but also a transformative force shaping the future of smart manufacturing, enabling industries to achieve higher levels of autonomy, adaptability, and operational excellence.

This evolution occurs alongside the strategic transition from Industry 4.0 toward Industry 5.0, which supplements a pure technology focus with core principles of human-centricity, sustainability, and operational resilience. This paradigm shift redefines the requirements for RL in industrial settings. Beyond raw performance, RL systems must now be interpretable to build operator trust, safe for seamless human–robot collaboration, and efficient in their use of energy and materials. Therefore, challenges such as scalability, safety, and interpretability are not merely technical hurdles but fundamental barriers to realizing the human–AI collaborative vision of future smart factories.

The remainder of this paper is structured as follows. The next section presents the literature review (Section 2), outlining the evolution of RL methodologies and their applications across diverse industrial domains. The methodology is explained in Section 3. Although the study is not a systematic review, the PRISMA framework is applied to structure the data collection and inclusion steps, ensuring clarity, transparency, and replicability. This section also details the modified Joanna Briggs Institute (JBI) methodological quality assessment, which was used to evaluate the rigor, transparency, and risk of bias of the included studies.

Next, Section 4 presents the results and discussion, focusing on key findings and emerging patterns derived from the analysis, with an emphasis on technological trends, industrial readiness, and strategic frameworks. Following this, Section 5 highlights future research directions and potential opportunities for advancing RL adoption in Industrial Automation, concentrating on scalability, safety, interpretability, and human–machine collaboration. Finally, the paper concludes by summarizing the main findings, implications, and contributions, underscoring the importance of RL as a cornerstone of intelligent industrial systems.

2. Literature Review

RL has emerged as a transformative paradigm in Industrial Automation, enabling systems to learn optimal control and decision policies through interactions with dynamic and often stochastic environments. Unlike conventional control strategies that depend on pre-defined rules or mathematical models, RL facilitates adaptive and intelligent decision-making, allowing systems to respond effectively to uncertainty, variability, and evolving production conditions.

As highlighted by Eckardt et al. [26], RL’s capability to model complex, stochastic systems make it particularly suitable for modern smart factories. Likewise, Farooq and Iqbal [2] emphasize that RL promotes self-optimization and operational efficiency in manufacturing processes characterized by high complexity and interdependent tasks. In this context, RL supports the broader vision of Industry 4.0, where intelligent, data-driven autonomy forms the cornerstone of next-generation production systems.

The literature demonstrates that RL has been applied successfully across a wide range of industrial domains, including adaptive process control, robotics, and supply chain management. In adaptive process control, RL techniques have proven effective in managing nonlinear and uncertain systems such as press hardening and chemical production. Studies by Hou et al. [12]; Paranjape et al. [27]; and Rajasekhar et al. [14] show that both offline and model-based RL approaches can learn control policies from historical or simulated data, thereby improving process efficiency, product quality, and robustness. By continuously adjusting control parameters based on environmental feedback, RL achieves levels of precision and adaptability that traditional controllers, such as PID or Model Predictive Control (MPC) often cannot sustain under dynamic operating conditions.

In industrial robotics, RL has significantly advanced autonomous task learning and adaptation. Kober et al. [10] highlight RL’s ability to enable robots to perform complex operations such as assembly, material handling, and pick-and-place tasks. The integration of DRL architectures allows robots to process high-dimensional sensory inputs and develop sophisticated perception–action strategies. Transitioning from pre-programmed automation to RL-driven control has introduced more resilient and scalable robotic systems capable of error recovery and self-improvement without extensive human intervention, ultimately reducing programming effort and enhancing flexibility and efficiency [10].

A critical factor in deploying robotic systems is the economic and performance trade-off between different hardware configurations, which can be evaluated effectively through simulation prior to physical deployment. As demonstrated by Peta et al. [28] in a precision assembly case study, a direct comparison between single-arm and dual-arm collaborative robots revealed a significant performance trade-off: the dual-arm robot completed the assembly task 33% faster (14 s versus 21 s), while the single-arm variant was more energy-efficient. Crucially, an economic analysis showed that the higher initial cost of the dual-arm system was offset by its speed, achieving a return on investment within eight months. This study underscores the value of simulation tools for conducting multi-criteria evaluations, highlighting that optimal robot selection depends on a balance of speed, energy consumption, and long-term economic viability rather than a single performance metric.

Expanding to supply chain applications, RL has also been increasingly adopted in supply chain management and logistics. According to Esteso et al. [29] and Rolf et al. [22], RL algorithms enhance inventory control, production scheduling, and logistics routing by dynamically adapting to fluctuations in real-time demand and supply. Unlike traditional optimization approaches that rely on static assumptions, RL introduces continuous learning mechanisms that improve adaptability and responsiveness. This adaptability leads to better resource utilization, reduced operational costs, and higher service levels. Furthermore, RL’s strength in handling multi-objective optimization problems makes it particularly relevant for smart factories, where trade-offs among efficiency, sustainability, and cost are common.

Beyond large-scale industrial and supply chain settings, RL has also shown significant promise in optimizing resource-constrained IoT networks, which form a critical data-gathering layer for smart factories. For instance, Khan et al. [30] proposed a Cognitive IoT–based Dynamic Power Management (CIoT-DPM) scheme to enhance energy efficiency in consumer electronics by mitigating energy waste from redundant data; their model achieved a 20–25% improvement in learning strategy and reduced computational complexity by 79.33%. Building on this focus on network efficiency, a subsequent study by Khan et al. [31] applied RL to cognitive IoT sensors through an Adaptive Scheduling Control Reinforcement Learning mechanism (ASC-RL). This complementary approach used a multi-parameter state-changing policy to dynamically manage sensor activity, which consequently boosted energy efficiency and reliability by 35% and increased the transmission success rate by 6.25%. Together, these studies underscore RL’s significant potential in simultaneously addressing energy conservation and network performance challenges in the intelligent sensor networks that underpin industrial IoT systems.

2.1. Algorithmic Approaches and Comparative Analysis

To provide a consolidated overview, Table 1 summarizes key RL models, their industrial applications, advantages, and challenges as reported in the literature. Approaches such as Q-learning, DQN, PPO, and SAC have been widely explored, alongside specialized frameworks such as MARL, Offline RL, and Digital Twin–Integrated RL [1,2,8]. While each model offers unique benefits, ranging from scalability and robustness to safe exploration, common limitations include sample inefficiency, computational intensity, and the need for extensive tuning.

A cross-analysis of the studies in Table 1 reveals several key insights. First, there is a clear progression from discrete, low-dimensional algorithms such as Q-leaning toward more advanced continuous-control and policy-gradient methods like PPO and SAC, reflecting the growing complexity of industrial applications. Second, integration-oriented approaches, particularly MARL, hierarchical RL, and Digital Twin–Integrated RL, highlight a shift toward decentralized, simulation-driven, and safety-aware frameworks suited to smart factories. Comparative studies suggest that PPO and SAC outperform DQN in continuous control tasks [10,27], particularly when environmental dynamics are highly stochastic, whereas DQN remains competitive for discrete action spaces with lower computational overhead. Finally, the persistence of challenges such as scalability, data scarcity, and hyperparameter sensitivity indicates that no single RL paradigm yet provides a universally practical solution for Industrial Automation. These comparative observations reinforce the need for hybrid, modular, and XRL approaches that can meet the operational demands of Industry 4.0 environments.

While Q-Learning and DQN have been validated in laboratory and small-scale industrial settings, approaches such as hierarchical RL and digital twin-integrated RL are predominantly explored in simulation, with limited deployment in operational factories. This highlights the gap between theoretical development and real-world applicability, emphasizing the need for scalable and safe RL implementations.

Despite advances in robotic automation, applications of RL in industrial energy management, adaptive supply chain optimization, and other complex, high-dimensional industrial processes remain underexplored, highlighting clear avenues for future research. The literature consistently demonstrates that RL has the potential to transform Industrial Automation by enabling adaptive control, intelligent decision-making, and continuous self-optimization. However, fully realizing this potential requires addressing several critical barriers, including data inefficiency, operational safety, model interpretability, and seamless integration with existing industrial infrastructure.

These challenges are particularly pronounced in real-world deployments. Sample inefficiency remains a core limitation, as most RL algorithms demand extensive interaction data to converge on optimal policies, making training in high-cost or high-risk industrial environments resource-intensive and time-consuming [8]. In addition, the black-box nature of DRL models constrains interpretability, which is essential for operator trust, regulatory compliance, and accountability in safety-critical settings [48]. Finally, the integration of RL into legacy systems poses significant technical hurdles, necessitating the development of middleware solutions, modular architectures, and hybrid deployment strategies to enable gradual adoption without disrupting established control frameworks [49]. Addressing these limitations has motivated emerging strategies such as digital twin integration, modular and hierarchical RL, and transfer learning, which will be explored in the subsequent section on Future Directions and Research Opportunities.

These observations underscore the need for targeted strategies that combine algorithmic innovation with system-level solutions. Approaches such as digital twin integration, hierarchical and modular RL, and safe RL frameworks offer promising pathways to overcome these barriers. The subsequent sections as well as the Future Directions and Research Opportunities section build upon these insights, outlining strategies for bridging the gap between theoretical RL advancements and practical, large-scale industrial implementation.

2.2. Research Dimensions and Synthesis

Building on the algorithmic overview in Table 1, Table 2 presents a comprehensive synthesis of key research dimensions in RL for Industrial Automation. It organizes findings from the reviewed literature by categorizing RL applications, core algorithms, integration technologies, and emerging strategies in smart manufacturing. The table illustrates the evolution of RL from classical, model-free approaches such as Q-Learning to advanced frameworks including DQN, PPO, SAC, and MARL. It also highlights the growing integration of RL with enabling technologies such as digital twins, CPS, and IoT, which collectively enhance autonomy, real-time optimization, and system resilience. In addition, the Key Trends & Insights column emphasizes the most significant developments, emerging approaches, and practical considerations, providing a clear view of the maturity, applicability, and future directions of RL in smart industrial systems.

As summarized in Table 2, RL in Industrial Automation has evolved from early single-agent, simulation-driven approaches to multi-agent and hierarchical frameworks capable of handling complex, distributed manufacturing systems. A major trend is the integration of RL with digital twins, CPS, and IoT infrastructures, enabling safe, efficient, and real-time decision-making while facilitating simulation-to-real transfer. Emerging research emphasizes safe, explainable, and hybrid RL methods to address challenges related to trust, interpretability, and regulatory compliance. Although most studies remain at the simulation or pilot scale, there is growing attention to sustainable manufacturing, energy efficiency, and modular system design. Collectively, these insights highlight a trajectory toward human-centered, data-efficient, and standardized RL frameworks that bridge theoretical advances with industrial-scale deployment, paving the way for Industry 5.0.

To provide a comprehensive overview of RL deployment in Industrial Automation, this research developed Table 3, which maps RL categories to representative algorithms, typical applications, integration technologies, advantages, limitations, deployment status, and emerging trends. The table summarizes key insights from the literature and bibliometric analyses, showing how classical RL continues to serve as a foundational benchmark, while DRL, multi-agent, modular, and safe/explainable frameworks are increasingly applied in pilot-scale or experimental industrial settings. This synthesis highlights that selecting an RL approach requires careful consideration of task complexity, operational constraints, agent coordination, and deployment objectives. By consolidating these elements, Table 3 serves as a practical guide for researchers and industrial practitioners seeking to implement scalable, safe, and efficient RL strategies in smart manufacturing environments.

Collectively, the literature indicates that RL has matured beyond algorithmic innovation toward industrial-ready, human-centered, and data-efficient frameworks. While foundational models remain critical for benchmarking, modern research emphasizes safe, explainable, modular, and hybrid approaches integrated with digital twins, CPS, and IoT. This evolution sets the stage for Industry 5.0, where adaptive, scalable, and ethically compliant RL frameworks bridge the gap between theoretical advances and real-world industrial deployment.

2.3. Positioning RL Against Classical Control and Optimization Methods

While this review focuses on RL’s role in Industrial Automation, it is crucial to contextualize its value proposition against established control paradigms. Traditional methods like PID control and MPC excel in well-defined deterministic environments but face limitations in complex, stochastic settings. Similarly, optimization techniques like genetic algorithms and particle swarm optimization offer global search capabilities but lack RL’s sequential decision-making and adaptive learning features.

The primary advantage of RL lies in its model-free adaptability and capacity for handling uncertainty and nonlinearity [1,2]. Unlike MPC that requires accurate system models or PID controllers needing precise tuning for specific operating points, RL agents learn optimal policies through environmental interaction, enabling them to operate effectively under conditions of dynamic variability and partial observability. This capability is particularly valuable in complex manufacturing environments where explicit modeling is infeasible due to system complexity and stochastic dynamics [2,7].

However, RL exhibits significant disadvantages compared to classical approaches. Sample inefficiency remains a critical limitation, as most RL algorithms require extensive interaction data, often millions of steps, to achieve convergence, making training impractical in high-cost physical systems [7,21]. This contrasts sharply with PID or MPC, which can be deployed rapidly with minimal data. Furthermore, safety and reliability concerns persist, as untested RL policies may cause equipment damage or operational failures, whereas classical controllers offer proven stability and safety records [43,44]. The black-box nature of DRL models also limits interpretability, a major concern for regulatory compliance and operator trust compared to the transparent logic of traditional control systems [20,48].

As summarized in Table 4, RL is not a universal replacement but rather a complementary approach. Classical methods remain preferable for well-understood, deterministic processes requiring guaranteed stability and rapid deployment. RL excels in complex, dynamic environments where adaptability and autonomous optimization provide significant value, particularly when safe simulation or digital twin environments are available for training [46,47]. Future hybrid architectures that combine RL’s adaptability with the stability guarantees of classical control represent a promising direction for Industrial Automation [19,45].

The literature review highlighted the growing interest in applying RL and DRL within Industrial Automation and robotics, while also revealing the need for a structured and quantitative assessment of research developments in this area. To address this gap, the present study employs a bibliometric review methodology to identify, screen, and analyze the relevant literature in a systematic manner. The following section presents the methodology used in this study.

3. Methodology

To ensure both comprehensive coverage and methodological rigor, this review adopted a two-stage analytical process. First, the PRISMA 2020 protocol was adapted and applied to systematically identify, screen, and select relevant studies, ensuring a transparent, reproducible, and unbiased construction of the final dataset [50]. Second, a modified JBI critical appraisal checklist was applied to evaluate the methodological quality and risk of bias of the included studies, ensuring that the subsequent synthesis and interpretations were grounded in robust methodological evidence [51].

Complementing these systematic review procedures, the study also employed a bibliometric analysis using the open source R package software (bibliometrix (version 4.5.1) and its biblioshiny web interface) to quantitatively examine the RL and DRL literature within Industrial Automation and robotics. Bibliometric techniques provide insights into publication trends, influential authors and institutions, collaboration networks, and thematic evolution, thereby enriching and contextualizing the qualitative findings from the systematic review [52,53,54]. By integrating bibliometric mapping with qualitative synthesis, this review offers a comprehensive and multidimensional understanding of research developments, knowledge gaps, and emerging trajectories in the field.

3.1. Data Sources and Search Strategy

The bibliographic data were retrieved primarily from Scopus, one of the most comprehensive databases for engineering and computer science research [54]. Scopus was selected because it provides extensive coverage of peer-reviewed journals and offers detailed citation information, ensuring both the breadth and quality of the retrieved literature. Furthermore, a significant portion of the high-quality journals indexed by Scopus are also covered by other prestigious indexes like ISI Web of Science, reinforcing its standing as a source for established, high-impact literature. Compared with other databases such as Web of Science or IEEE Xplore, Scopus encompasses a wider interdisciplinary scope, which is particularly valuable for identifying RL research that intersects computer science, engineering, and industrial applications. To capture relevant literature, a structured search strategy was implemented using controlled keywords related to RL applications in Industrial Automation, robotics, and human–robot collaboration. The query was designed to balance precision and comprehensiveness by combining the core terms “reinforcement learning” and “deep reinforcement learning” with application-specific keywords such as “robotics,” “autonomous systems,” “manufacturing automation,” “path planning,” “assembly,” and “human–robot collaboration.” This approach ensured the inclusion of studies that not only focused on the theoretical development of RL but also its applied dimensions in industrial and collaborative robotic contexts [55,56]. The advanced Scopus query employed in this study was:

((TITLE-ABS-KEY(“reinforcement learning”) OR TITLE-ABS-KEY(“deep reinforcement learning”)) AND (TITLE-ABS-KEY(“robotics”) OR TITLE-ABS-KEY(“autonomous systems”) OR TITLE-ABS-KEY(“manufacturing automation”)) AND (TITLE-ABS-KEY(“path planning”) OR TITLE-ABS-KEY(“assembly”) OR TITLE-ABS-KEY(“human-robot collaboration”))) AND PUBYEAR > 2016 AND PUBYEAR < 2026 AND (LIMIT-TO(SUBJAREA,”COMP”) OR LIMIT-TO(SUBJAREA,”ENGI”) OR LIMIT-TO(SUBJAREA,”MATH”) OR LIMIT-TO(SUBJAREA,”MATE”) OR LIMIT-TO(SUBJAREA,”DECI”)) AND (LIMIT-TO(DOCTYPE,”ar”) OR LIMIT-TO(DOCTYPE,”re”)) AND (LIMIT-TO(SRCTYPE,”j”)) AND (LIMIT-TO(LANGUAGE,”English”))

The search was limited to English-language journal articles and reviews published between 2017 and 2026 within subject areas such as computer science, engineering, mathematics, materials science, and decision sciences. These parameters were chosen to ensure that the dataset focused on high-quality, peer-reviewed, and recent research outputs, capturing the most up-to-date developments in RL applications. The decision to exclude non-journal sources, such as conference proceedings or preprints, was made to maintain methodological rigor and ensure reliability of findings [57]. This search strategy enhanced the relevance, validity, and reproducibility of the review by systematically narrowing the scope to studies that reflect the state-of-the-art in RL-driven Industrial Automation and robotics while excluding less pertinent or non-scholarly materials.

3.2. Inclusion and Exclusion Criteria

The following criteria guided the selection process:

Inclusion criteria: (i) peer-reviewed journal articles and reviews; (ii) published between 2017 and 2026; (iii) explicitly addressing RL/DRL in robotics, autonomous systems, or manufacturing automation; (iv) containing at least one of the target keywords in title, abstract, or keywords;
Exclusion criteria: (i) conference proceedings, book chapters, editorials, or non-peer-reviewed sources; (ii) papers outside the scope of Industrial Automation or robotics; (iii) duplicates across databases.

3.3. Selection Process

The foundational step of this review drew on key principles from systematic literature review (SLR) methodology to ensure a transparent and well-structured dataset, as recommended in evidence-based research practices [58,59]. The initial Scopus query retrieved 1226 records. The selection process followed an adapted PRISMA framework, commonly used to enhance transparency and reproducibility in review studies [50,60], without constituting a full systematic review. The PRISMA-based flow diagram (Figure 1) outlines the identification, screening, and eligibility stages, ensuring that each inclusion and exclusion step is clearly documented (see Supplementary File and Table A1 in Appendix A).

After the initial search, 412 records (books, editors, conferences, and proceedings) were excluded, leaving 814 documents for screening. A second round of screening, based on titles, abstracts, and keywords, eliminated 65 additional records. To maintain focus and accessibility, 77 non-English articles were removed, resulting in 672 final publications retained for bibliometric and qualitative analysis.

The application of PRISMA enhances this study by:

Ensuring methodological rigor—all decisions for inclusion/exclusion are systematic rather than arbitrary [61];
Enhancing transparency—readers can trace the exact filtering path from 1226 initial results to 672 final articles [50];
Improving reproducibility—future researchers can replicate or extend the review following the same protocol [62].

3.4. Data Extraction and Preparation

Bibliographic metadata, including titles, abstracts, authors, affiliations, keywords, citations, and references, were exported from Scopus in CSV and plain text/BibTeX formats. The datasets were merged, and duplicates were removed primarily via DOI-based matching, with fuzzy matching on titles, first author names, and year applied when DOIs were missing. Author names, institutional affiliations, and keywords were standardized to ensure consistency across the dataset.

Given the Scopus download limit of 200 records per batch, the full dataset of 672 records was obtained in four separate batches, merged into a single BibTeX file, and imported into R Biblioshiny for bibliometric analysis.

3.5. Bibliometric Analysis Tools

To systematically evaluate the literature on RL in Industrial Automation, bibliometric methods were applied [58]. Data analysis was conducted using the R package Bibliometrix (version 4.5.1) and its web interface Biblioshiny, which provided descriptive statistics (e.g., annual scientific production, leading journals, authors, and countries), citation analysis (e.g., most cited documents, h-index), and network analyses (e.g., co-authorship, co-citation, bibliographic coupling). Several studies have successfully applied Bibliometrix in related domains, such as del Real Torres et al. [5], Samal [63], and Ouhadi et al. [64], demonstrating its reliability and versatility.

The bibliometric analysis was conducted for comprehensive data analysis and visualization. High-resolution bibliographic maps were generated directly within this environment to visualize keyword co-occurrence networks and thematic clusters, which were instrumental in delineating the conceptual structure of the research field. To ensure the robustness of the findings, multiple keyword occurrence thresholds were tested during the analysis.

Using bibliometric analysis in this research is especially useful for three reasons. First, RL in Industrial Automation is highly interdisciplinary, spanning computer science, control engineering, and operations research; bibliometric mapping helps reveal overarching patterns that narrative reviews might overlook. Second, it provides an objective, evidence-based way to track research communities, influential works, and emerging paradigms such as safe RL, digital twin integration, and multi-agent systems. Third, by combining quantitative mapping with qualitative synthesis, this approach improves transparency and reproducibility, ensuring insights are grounded in systematically processed data.

The bibliometric results were combined with qualitative content analysis of highly cited and thematically central papers to provide interpretive depth. Clusters derived from co-occurrence and co-citation networks guided the structuring of the literature review into thematic areas, including RL for motion planning, RL for assembly, and RL for human–robot collaboration. Research gaps and opportunities were identified by examining low-density areas in thematic maps and emerging keywords, providing a comprehensive understanding of trends, challenges, and future directions in RL applications for Industrial Automation.

3.6. Methodological Quality Assessment

To complement the bibliometric analysis and ensure the analytical rigor of our review, a methodological quality assessment was performed on the final corpus of publications. We employed a modified version of JBI critical appraisal checklist [51]. Each study was evaluated based on three key criteria:

Clarity of Research Aims: Were the objectives and research questions explicitly defined?
Methodological Rigor: Was the RL methodology (algorithm, state/action space, reward function) described with sufficient detail for replication?
Transparency of Results: Were the outcomes, including limitations and deployment challenges, clearly reported?

Each criterion was scored as ‘1’ if fully met, ‘0.5’ if partially met, and ‘0’ if not met or not reported. Refer to Table A1, Appendix A, where: ✓ = Criterion fully met; △ = Criterion partially met; X = Criterion not met. The studies achieving a total score of 3 were classified as ‘high quality.’ The overall quality index (Q) of the corpus was calculated as the percentage of high-quality studies using the following equation:

Q(%) = (N_{high quality}/N_total) × 100

where N_{high quality} is the number of studies scoring 3, and N_total is the total number of included studies.

The results of this assessment are synthesized in Section 4, providing critical context for the bibliometric findings and ensuring that our review is grounded in a qualitatively evaluated evidence base.

4. Results and Discussion

The methodological quality assessment of the 90 included studies revealed a high overall standard, with 75.6% (68/90) of studies meeting all five quality criteria. This can be observed from Table A1, Appendix A, where none of the studies scored below 0.5 in all 5 criteria. This provides a robust and reliable foundation for the subsequent bibliometric and thematic analyses conducted in this review.

RL has emerged as a pivotal tool in Industrial Automation, enabling adaptive decision-making across robotics, process control, supply chain management, and predictive maintenance. Its defining characteristic—learning optimal policies through interaction with dynamic environments—allows industrial systems to respond effectively to uncertainty and evolving operational conditions [2,23].

Comparative analyses of RL models reveal distinct patterns of industrial applicability. Classical RL, particularly Q-Learning, remains widely used for discrete control tasks due to its simplicity and low computational requirements [6]. However, in high-dimensional or continuous environments, its scalability and performance are limited.

DRL models, such as DQN, PPO, and SAC, extend classical RL by incorporating NNs, enabling robots to process high-dimensional sensory data for tasks like pick-and-place operations, assembly, and process optimization. These algorithms demonstrate robustness, stability, and suitability for complex, continuous action spaces [8,33].

MARL supports decentralized decision-making for collaborative robotics and distributed production lines, addressing coordination and scalability challenges [17]. Hierarchical and modular architectures improve task decomposition and knowledge transfer, enhancing learning efficiency in complex multi-stage processes [64]. Safe and XRL frameworks ensure operational reliability in safety-critical contexts, while digital twin integration allows virtual experimentation to reduce deployment risks and enable real-time optimization [8,33,36].

Table 2 and Table 3 provide complementary insights into RL evolution and deployment. Table 2 tracks the trajectory from early single-agent, simulation-driven methods to hybrid, human-centered frameworks, illustrating the historical and conceptual development of RL in Industrial Automation. Table 3, developed by the researchers, maps RL categories to algorithms, typical applications, integration technologies, advantages, limitations, deployment status, and emerging trends. This table serves as a practical guide for industrial adoption, showing how classical RL remains foundational for benchmarking, while DRL, multi-agent, modular, safe, and hybrid frameworks are increasingly deployed in pilot-scale or experimental industrial settings. Together, these tables highlight that algorithm selection must align with task complexity, operational constraints, and deployment objectives. Together, Table 2 and Table 3 highlight the evolution from single-agent to human-centric, multi-agent frameworks, mapping the trajectory of RL research toward the complex, collaborative environments envisioned by Industry 5.0.

A comparative assessment of RL approaches highlights a clear clustering based on application suitability, complexity, and deployment readiness:

Discrete vs. Continuous Control: Q-Learning excels in low-dimensional, discrete processes, whereas DQN, PPO, and SAC are preferable for high-dimensional, continuous tasks due to their NN-based approximations [8];
Single-Agent vs. Multi-Agent Systems: Traditional RL and DQN are suitable for single-agent tasks; MARL is essential when multiple agents must coordinate in a shared environment, such as collaborative robotics or multi-stage production lines [17];
Safety-Critical Environments: Safe RL and digital twin-integrated approaches are indispensable in contexts where operational failures could have severe financial or safety consequences, offering a balance between exploration and constrained policy execution [36];
Transfer and Modular Learning: Hierarchical and modular RL architectures outperform classical RL in multi-task or variable production settings by enabling knowledge transfer and task decomposition, thus improving scalability and efficiency [65,66].

Overall, DRL algorithms dominate robotic automation research due to their capacity to handle high-dimensional inputs, while model-based, offline, or modular approaches are favored in process control and predictive maintenance for safety and sample efficiency [2,27]. Algorithm selection must align with task complexity, agent coordination, and operational safety. Modular and hierarchical architectures facilitate knowledge transfer across multi-stage processes, and safe RL frameworks, often integrated with digital twins, ensure reliable operation in critical contexts.

A deeper analysis of the literature reveals the specific dimensions of key challenges like scalability and interpretability. The scalability challenge manifests not just in computational cost, but in the “curse of dimensionality” for high-dimensional state-action spaces and the complexity of multi-agent coordination. In response, emerging trends focus on hierarchical RL to decompose problems, model-based RL to improve sample efficiency, and modular architectures that enable knowledge transfer across tasks, thereby reducing the learning burden for new scenarios.

This focus on scalable coordination is powerfully exemplified in the MARL sector, which constitutes 27% of the reviewed studies. A deeper analysis of this sector reveals distinct thematic trends and a focus on specific industrial problems. The bibliometric data and literature synthesis indicate that a significant portion of MARL research is dedicated to robotic swarm coordination in warehouse logistics and dynamic task allocation in flexible assembly lines. The primary “issue” dominating this subfield, as identified in the literature, remains the non-stationarity problem and the need for robust credit assignment mechanisms. In response, a clear “trend” has emerged: the CTDE paradigm is the predominant architectural framework, cited in over 60% of the MARL studies reviewed, as it effectively mitigates the non-stationarity problem during training while allowing for scalable decentralized operation. This is often combined with attention mechanisms to manage communication in large-scale agent systems, pointing to a consistent research direction focused on scalability and decentralized coordination.

Similarly, the interpretability challenge, a major barrier to trust and regulatory approval, stems from the inherent “black-box” nature of deep NNs used in DRL. The bibliometric analysis and literature synthesis confirm a rising trend in XRL, which aims to make agent decisions transparent. This includes methods for post hoc explanation of policy decisions and the development of intrinsically interpretable models that prioritize simplicity and clarity, ensuring that human operators can understand, trust, and effectively collaborate with RL systems in safety-critical industrial settings.

Research Trends and Insights from Bibliometric Analyses

Emerging strategies address core challenges in industrial RL. Digital twin-integrated RL enables safe virtual experimentation and pilot deployment, while transfer learning accelerates policy adaptation across similar tasks, enhancing sample efficiency [67]. Hierarchical and modular architectures improve knowledge reuse, task decomposition, and interpretability [68]. Safe and XRL frameworks ensure compliance with operational and regulatory constraints [20,69,70].

A synthesis of the literature highlights gaps in real-world validation, standardization protocols, and ethical frameworks. Many studies remain confined to simulation or laboratory settings, emphasizing the need for pilot deployments, cross-industry benchmarking, and regulatory alignment. Future research should focus on developing scalable, interpretable, and transferable RL frameworks that balance operational efficiency, safety, and adaptability in complex industrial contexts.

Bibliometric analyses provide further insights. The co-occurrence network in Figure 2 reveals three distinct yet interconnected thematic clusters: a methodological core (green), robotics applications (blue), and multi-agent systems (red) reflecting RL’s evolution in Industry 4.0. This structure visually confirms that algorithmic research is tightly coupled with applied domains like industrial robotics and collaborative systems.

Figure 3, illustrating the temporal evolution of article production in RL for Industrial Automation by country, strongly validates the urgent need for this review by demonstrating the field’s explosive growth and its pronounced geographic stratification.

These findings align with [71,72], which highlights China as the unequivocal Tier 1 global leader in RL research. The data clearly establishes China’s dominant position, with an article output projected to reach approximately 700 by 2026, underscoring the country’s central role in driving methodological innovations and applied developments in the field [73].

This dramatic, near-exponential acceleration after 2019 suggests an enormous, centrally driven national investment, which is consistent with external reports indicating China’s surge in industrial robotics adoption and its increasing robot density (e.g., reaching 470 robots per 10,000 employees in 2023, surpassing Germany and Japan). This high level of physical automation provides the ideal real-world platform and data-rich environment necessary for developing and testing complex RL policies in smart factories. The dominance of China in application-focused research, as seen in Figure 3, suggests that real-world deployment scales with the availability of physical industrial infrastructure, a key factor in bridging the sim-to-real gap.

Conversely, the USA (Tier 2), while a robust second contributor (projected around 140–150 articles), shows a less aggressive, more linear growth pattern. This structure implies that while the theoretical and core algorithmic advances may still be heavily influenced by Western research (consistent with North America’s lead in the broader AI market), the scale and volume of practical industrial application research is being overwhelmingly driven by Asia-Pacific economies, particularly China. This disparity underscores a central challenge of RL deployment: the sim-to-real gap and the struggle for sample efficiency; the countries with the highest deployment of physical systems (China, Republic of Korea, which has the world’s highest robot density) are logically generating the most application-focused research [71,74].

Therefore, the comprehensive synthesis proposed in this review must prioritize examining the methodologies emanating from these high-volume research regions, particularly focusing on how they address safety, robustness, and integration challenges within highly automated, large-scale industrial systems. The structure of the research landscape shown in Figure 3 dictates that the global future of industrial RL will be shaped by the success of these massive, national-level automation initiatives.

Figure 2 and Figure 3 visually reinforce the findings from Table 2 and Table 3. The co-occurrence network, Figure 2, reveals three interconnected thematic clusters: the methodological core (green) encompassing RL, DRL, motion planning, and path planning; robotics applications (blue) with industrial robots and manipulators; and multi-agent/autonomous systems (red) emphasizing UAVs and distributed applications. The temporal evolution of publications by country (Figure 3) highlights China as the global leader, with projected output of ~700 articles by 2026, followed by the USA as a Tier 2 contributor (~140–150 articles). These patterns indicate that practical, application-focused research is largely concentrated in Asia-Pacific regions, reflecting their extensive automation infrastructures and data-rich environments.

The quantitative trends revealed by the bibliometric analysis are substantiated by concrete case studies that demonstrate RL’s practical impact across different layers of the Industrial Automation stack. For instance, at the physical robotics layer, the work of Peta et al. [28] provides a critical performance and economic validation, showing how RL-driven systems enable sophisticated trade-off analyses, such as a 33% speed increase with a dual-arm robot versus the energy efficiency of a single-arm system, directly informing high-value deployment decisions. Simultaneously, at the foundational data acquisition layer, studies by Khan et al. [30,31] demonstrate RL’s role in optimizing the underlying IoT infrastructure, achieving metrics like a 35% boost in energy efficiency and a 6.25% higher transmission success rate for sensor networks. These case-based results confirm that the heavy research focus on “robotics,” “motion planning,” and “deep learning” identified in the bibliometric maps (Figure 2) is yielding tangible advancements. They illustrate a dual focus: on high-level, physically embodied intelligence for complex tasks, and on low-level, resource-aware intelligence for sustaining the pervasive sensing required in smart factories.

The thematic flow in Figure 4 quantitatively substantiates the strategic branching of research, showing a strong transition from core RL theory towards high-level decision-making and robot programming, confirming the field’s shift toward autonomous intelligence. The internal flow from RL (2017–2024) to itself (RL—2025–2026) is the most frequent connection, accounting for 428 occurrences, with a high Weighted Inclusion Index of 0.59. These findings align with the results reported in [71,74].

This indicates a persistent, high-volume effort dedicated to refining the core theory, particularly concerning “reinforcement learnings” and “deep learning,” which are essential for addressing the sample inefficiency challenges noted in the review. Crucially, the strongest practical transitions are seen in two primary application domains: the flow into decision-making (90 occurrences) and robot programming (159 occurrences), both exhibiting high Weighted Inclusion Indices of 0.71. This metric confirms that the existing RL knowledge is highly relevant and directly transferable to these high-level robotic tasks (keywords: “robotics, machine learning, decision-making, intelligent robots”), solidifying the review’s argument that RL is enabling the shift from human-coded control to autonomous, adaptive intelligence. A parallel specialization is confirmed in the motion planning research stream, which shows its most substantial flow toward autonomous vehicles (62 occurrences) with the highest Weighted Inclusion Index of 0.70. This quantitative split confirms a strategic focus: while RL refines its own algorithms, the established control expertise (motion planning) is being systematically applied to mobile platforms (keywords: “autonomous vehicles, autonomous driving”), highlighting the industry’s strategic investment in flexible, logistics-based automation.

Table 5 provides a critical mapping of the global distribution of academic efforts and resources, revealing that the ten largest research areas account for over 55% of the entire represented knowledge space. This concentration indicates that the current research agenda is predominantly centered on DRL as the core technology for achieving physical autonomy in industrial robotics, consistent with the findings of Rajamallaiah et al. [75] and Gautam [76], who both emphasized the pivotal role of DRL in advancing control, resilience, and autonomy within power and energy systems. The primary research investment is clearly concentrated in the RL Core, with RL (594 occurrences, 14%) and its variants (358 occurrences, 8%) dominating the landscape. This dominance underscores the community’s focus on policy optimization and foundational RL theory. The research effort cascades into areas concerned with managing complexity and physical movement, with motion planning (339 occurrences, 8%) and DL (233 occurrences, 5%) reflecting the strategic shift toward DRL architectures capable of handling continuous state and action spaces. This trajectory effectively reduces human-intensive efforts in robot programming (219 occurrences, 5%) by replacing static control laws with adaptive, self-learned policies for tasks such as path planning (179 occurrences, 4%). The remaining high-frequency terms—learning algorithms, robotics, learning systems, and intelligent robots—confirm that the collective research focus is on developing adaptable, intelligent robotic entities, where the human role transitions from programmer to system architect and safety supervisor, ensuring the successful orchestration of fully autonomous systems.

Figure 5, which tracks the cumulative frequency of key research terms from 2017 to 2026, precisely delineates the algorithmic and application focus driving the explosive publication growth observed in this domain. The data underscores that the primary scientific commitment in RL for Industrial Automation is not merely theoretical, but centers on the deep integration of advanced RL techniques with core robotic operations.

According to Li et al. [77], RL has rapidly emerged as a key approach for smart transportation, enabling autonomous decision-making in complex, dynamic systems, with bibliometric and literature analyses highlighting its development, leading topics, application domains, and future research directions over the past decade. Similarly, in the current study on smart factories, the term RL emerges as the overwhelmingly foundational keyword, reaching nearly 600 cumulative occurrences by 2026. This parallel dominance confirms that RL serves as an indispensable methodological framework for adaptive decision-making, whether in transportation systems or Industrial Automation, highlighting its central role across diverse complex, dynamic domains.

Immediately succeeding RL is DRL, which reaches approximately 350 occurrences, demonstrating that researchers are fundamentally concerned with applying DL architectures (which follows closely) to handle the complexity and high dimensionality of real-world industrial state-action spaces. The high prominence of DRL validates the review’s emphasis on algorithms like DQN, PPO, and SAC, which are essential for moving beyond simple discrete control tasks and into continuous, nonlinear processes.

Crucially, the next tier of highly frequent terms defines the critical application sector: robotics and its functional components, motion planning and path planning. The accelerating co-occurrence of ‘robotics’ with ‘motion planning’ and ‘deep learning’ in Figure 5 demonstrates that the research surge is channeled toward enabling autonomous, complex manipulation, directly connecting abstract algorithms to tangible industrial hardware.

In contrast, the slower-growing foundational terms like Learning Systems and Learning Algorithms suggest that the community’s attention has matured beyond discussing generic ML principles. The current focus is a specialized, applied one, targeting the specific computational mechanisms DRL needed to optimize the physical operations (robotics/planning) of smart factory infrastructure. This convergence of DRL and high-level robotic control establishes the core research front that this review is obligated to synthesize.

The chronological analysis in Table 6 further delineates the evolution of RL research toward autonomy and its human implications. The field initially focused on low-volume, foundational work, exemplified by Markov Decision Processes (Frequency 14; Median 2020). Research then rapidly pivoted to scale this complexity, with DL in robotics and Automation (Frequency 8; Median 2020) reflecting early efforts to reduce reliance on human-defined features and expert engineering, transferring the burden of optimization to learning algorithms.

The most recent, high-volume research phase (Median 2024–2025) targets the most complex, human-like abilities in physical systems, including Robot Learning (Median 2025) and motion planning (Median 2025). This trajectory confirms a transition beyond simple automation toward adaptive autonomy, where robots adjust to dynamic environments with minimal or no human intervention.

Crucially, the emergence of newer, lower-frequency terms such as Adversarial ML (Frequency 35; Median 2025) underscores the final, critical step in the human–machine collaboration equation. As noted by Khurram et al. [78], the adoption of autonomous RL in high-risk manufacturing environments introduces significant concerns regarding safety, robustness, and interpretability—factors that inherently influence human trust and regulatory approval. Similarly, the growing emphasis on adversarial ML, as highlighted by Hagendorff [79] and Ilahi et al. [80], demonstrates that researchers are actively confronting these ethical and safety constraints. They recognize that for RL to gain widespread acceptance and fully integrate into industrial operations, the opaque and complex policies learned by these algorithms must be demonstrably safe, interpretable, and reliable enough to operate alongside, or even replace, human workers.

The three-field Sankey diagram (Figure 6) visually encapsulates the global research ecosystem, reinforcing the dominance of China and clarifying the disciplinary channels through which academic output flows into thematic applications. China emerges as the single largest contributor, with a massive volume of research flowing into the core methodologies of reinforcement learning and reinforcement learnings, underscoring its broad, national commitment to both fundamental theory and general application and this is confirmed by [3,81,82]. The flow from key sources, such as IEEE Transactions on Automation Science and Engineering and IEEE Robotics and Automation Letters, confirms that top-tier disciplinary journals are the primary conduits for this research. Critically, regardless of the country of origin (China, USA, India, etc.), the vast majority of research effort converges on a handful of destination keywords: reinforcement learning, DRL, and motion planning [83,84,85]. This convergence validates that the global community’s focus remains fixed on the essential pillars of adaptive control in smart factories—namely, using advanced RL to enable highly flexible and intelligent physical movements.

Figure 7 illustrates the distribution of the top 10 global flow frequencies, making clear a stark and asymmetric pattern of connectivity that is heavily skewed toward Australia as a destination market for high-volume, directional interactions, primarily driven by major global economic powers. The three highest positive “Frequency” values—China → Australia (≈334.48), United Kingdom → Australia (≈334.33), and USA → Australia (≈334.15)—are virtually identical and overwhelmingly dominate the dataset. This clustering suggests an intrinsic, nonlinear relationship (likely log-transformed or normalized trade, migration, or investment flows), where positive directional inflows from these three global giants converge toward a maximum. Collectively, they reflect Australia’s role as a unique global hub: raw material exports to China, historical/social migration from the UK, and investment and technology transfer from the USA.

Beyond these dominant triads, Figure 7 also highlights enduring regional and geopolitical linkages. The next tier of flows—USA → Korea (≈55.45), USA → Japan (≈54.91), and USA → Hong Kong (≈54.89)—underscores the United States’ persistent commercial and strategic presence in East Asia. Meanwhile, the India → UAE (≈54.30) pathway captures a different high-volume dynamic: labor migration and remittance flows, a hallmark of the South Asia–Gulf economic model. Taken together, the composition of these flows confirms the table’s representation of global connectivity as a blend of (i) traditional economic trade and investment links (Western/Asian powers → developed markets) and (ii) critical labor corridors (South Asia → GCC).

Crucially, Figure 7 also reinforces the observation that the most intense global relationships are overwhelmingly unidirectional, as the absence of significant negative flows in the top 10 indicates that high-frequency exchanges tend to move in one direction, reflecting asymmetries in global trade, migration, and capital distribution.

Figure 8 illustrates journal publication trends for RL in Industrial Automation, revealing a strong academic focus concentrated in key technical and applied science outlets, dominated by the Institute of Electrical and Electronics Engineers (IEEE), as confirmed by recent studies [63,77,80,85]. The field’s epicenter lies at the intersection of practical control and intelligent decision-making, with IEEE Robotics and Automation Letters leading with 55 publications, highlighting robot control, manipulation, and autonomous operation as the most active areas of RL research. The high volume in IEEE Access (48 documents) emphasizes the importance of rapid, open-access dissemination across engineering domains. Contributions from Sensors (33 documents) and IEEE Internet of Things Journal (15 documents) underscore the critical role of high-quality, real-time sensing and IoT infrastructure in defining the “state” for RL agents in smart factories. Overall, these publication trends confirm that successful industrial RL research integrates algorithmic innovation, control theory, sensing technology, and intelligent automation into a coherent, system-level framework.

In summary, bibliometric analyses reveal that RL research in Industrial Automation is increasingly focused on DRL applications in robotics, motion planning, and autonomous systems, with China leading in publication volume and applied experimentation, as supported by documented industrial testbeds and nationwide digital twin initiatives [86,87]. Thematic clusters indicate a clear progression from foundational algorithmic development to high-impact industrial applications, while emerging areas such as safe RL, hierarchical architectures, and digital twin integration are gaining traction. These insights underscore the growing maturity and specialization of the field, while also highlighting persisting gaps in real-world validation, standardization, and safety-aware deployment, which are addressed in the future challenges and research opportunities section.

5. Future Directions and Research Opportunities

Although RL has demonstrated considerable promise in enabling adaptive control and intelligent decision-making in Industrial Automation, several critical challenges and underexplored areas continue to limit its large-scale industrial adoption. Recent research has begun to address these limitations through emerging trends that combine simulation, transfer learning, modular design, and ethical frameworks for autonomous operation.

5.1. Technological Advancements

5.1.1. Digital Twin Integration

The convergence of RL with digital twin technology is a major trend reshaping Industrial Automation. Digital twins provide high-fidelity virtual replicas of physical assets, allowing RL agents to train, validate, and optimize control policies in simulated environments before deployment. This approach enhances safety, efficiency, and scalability while reducing the risks associated with experimentation on real systems. Studies by Abo-Khalil et al. [88] and Xia et al. [3] demonstrate that digital twin–integrated RL systems enable real-time monitoring, predictive maintenance, and adaptive optimization in manufacturing and energy applications, thus bridging the gap between simulation and real-world deployment.

5.1.2. Transfer Learning for Industrial Adaptability

Another promising direction is the use of transfer learning to improve learning efficiency and generalization. RL models often struggle when deployed in new or dynamically changing environments, requiring retraining from scratch. Transfer learning mitigates this challenge by allowing RL agents to leverage knowledge from previously learned tasks. As shown by Hua et al. [89] transferring representations, policies, or value functions across tasks significantly reduces the data and time required for training, thereby accelerating deployment and improving scalability in complex industrial systems.

5.1.3. Modular and Hierarchical RL Architectures

To enhance adaptability across heterogeneous industrial tasks, modular RL frameworks are gaining traction. Raziei [90] introduces a Hyper-Actor Soft Actor–Critic (HASAC) framework that combines task modularization and transfer learning, allowing agents to adapt to varying conditions while maintaining stable performance. Similarly, hierarchical RL architectures facilitate task decomposition, enabling scalable and interpretable learning across multi-stage production processes. Such modularity is particularly valuable in factories with multiple interdependent subsystems requiring coordinated decision-making.

5.2. Implementation Considerations

5.2.1. Standardization and Interoperability

Despite progress in algorithmic development, the absence of standardized protocols for integrating RL within industrial systems remains a substantial barrier. As highlighted by Farooq and Iqbal [2] and Semeraro et al. [55], a lack of common frameworks complicates interoperability across different platforms and manufacturing environments. Establishing industry-wide standards and best practices would facilitate consistent deployment, enhance safety verification, and promote trust among industrial stakeholders.

5.2.2. Ethical and Regulatory Considerations

The deployment of autonomous RL systems in industrial environments raises significant ethical and regulatory concerns that extend beyond technical performance metrics. As identified in our bibliometric analysis, the growing emphasis on human–AI collaboration and Industry 5.0 principles necessitates addressing fundamental ethical challenges that impact trust, safety, and societal acceptance.

A primary concern is the black-box nature of DRL models, which constrains interpretability and complicates accountability in safety-critical industrial settings [48]. This opacity creates challenges for regulatory compliance and operator trust, particularly in applications where human workers collaborate directly with RL-driven systems. The lack of transparent decision-making processes in most DRL architectures undermines the ability to conduct meaningful safety audits or explain operational decisions to regulatory bodies.

Safety and liability concerns represent another critical dimension. As noted by Rahanu et al. [43] and Martinho et al. [44], the absence of formal verification methods for RL policies creates uncertainty about system behavior in edge cases or unexpected scenarios. When RL systems cause operational failures or safety incidents, determining liability becomes complex—spanning algorithm developers, system integrators, and end-users. This ambiguity is particularly problematic in high-risk manufacturing environments where equipment damage or worker safety may be compromised during the exploration phase of RL training.

The integration of RL into human–robot collaborative environments introduces additional ethical considerations related to workforce impact and algorithmic bias. RL policies trained on operational data may inadvertently inherit and amplify existing biases in production systems, potentially leading to unfair resource allocation or discriminatory operational patterns. Furthermore, the transition toward autonomous RL systems raises questions about workforce displacement and the ethical responsibility of organizations to retrain and redeploy human workers.

Our analysis reveals that current regulatory frameworks are inadequately prepared to address these challenges. Existing industrial safety standards and machinery directives lack specific provisions for self-learning autonomous systems, creating a regulatory gap that hinders widespread adoption. The development of comprehensive guidelines for responsible AI deployment in industrial settings requires collaboration between researchers, industry stakeholders, and regulatory bodies to establish standards for testing, validation, and continuous monitoring of RL systems.

Emerging approaches such as Safe RL and XRL offer promising pathways to address these concerns. Safe RL frameworks incorporate formal safety constraints directly into the learning process, reducing risks during both training and deployment [43,44]. XRL methods enhance transparency by providing post hoc explanations of policy decisions or developing intrinsically interpretable models, thereby building operator trust and facilitating regulatory approval [20,48].

Future regulatory frameworks must balance innovation with protection, establishing clear requirements for:

Transparency and explainability in RL decision-making processes
Safety verification protocols for RL systems before and during deployment
Liability allocation frameworks for autonomous system failures
Continuous monitoring and update procedures for evolving RL policies
Human oversight mechanisms ensuring meaningful human control in critical decisions.

Addressing these ethical and regulatory considerations is essential for building stakeholder trust and enabling the responsible integration of RL technologies into industrial ecosystems aligned with Industry 5.0’s human-centric vision.

5.2.3. Real-World Validation and Scalability

Finally, the literature reveals a notable gap between simulation-based studies and real-world implementation. Most existing research validates RL algorithms within controlled or synthetic environments, which often fail to capture the full complexity of industrial operations [6,8]. To advance the field, large-scale pilot deployments and cross-sector collaborations are needed to assess RL performance under real-world conditions, including variable process dynamics, human–machine interactions, and hardware constraints.

Future research should focus on bridging the gap between theoretical development and industrial implementation through standardization, safety assurance, and practical validation. As manufacturing moves toward Industry 5.0, advancing human–machine collaboration becomes increasingly important, shifting the focus from full automation to human-centric, resilient, and sustainable production systems. Ensuring that autonomous technologies complement human expertise will be essential for creating adaptable, trustworthy, and ergonomically aligned work environments.

Digital twin integration, transfer learning, and modular architectures represent key technological enablers, while ethical frameworks and interoperability standards will be critical for responsible and sustainable deployment. Strengthening human–AI teaming, particularly in decision-making, supervisory control, and cognitive assistance, will further support the transition toward Industry 5.0’s vision of synergistic collaboration between humans and intelligent systems.

Addressing these opportunities will accelerate progress from the autonomous, data-driven factories of Industry 4.0 toward the human-centric, value-driven, and resilient manufacturing ecosystems envisioned under the Industry 5.0 paradigm.

6. Conclusions

This review has systematically examined the applications, challenges, and emerging trends of RL in Industrial Automation, covering domains such as adaptive process control, robotics, supply chain management, and predictive maintenance. RL distinguishes itself from traditional control and optimization methods by enabling systems to learn optimal policies through continuous interaction with dynamic environments, thereby facilitating adaptive, intelligent, and autonomous industrial operations. The literature reveals that classical RL methods like Q-Learning remain effective for low-dimensional, discrete tasks, while DRL algorithms, including DQN, PPO, and SAC, are particularly suited for high-dimensional, continuous environments. Multi-Agent RL and hierarchical or modular architectures further support complex, decentralized, and multi-stage industrial operations, highlighting RL’s versatility across diverse industrial contexts.

Emerging approaches such as safe RL, digital twin integration, and transfer learning are increasingly addressing critical challenges associated with RL deployment, including operational safety, sample inefficiency, and adaptability across heterogeneous tasks. Bibliometric analyses indicate that China currently leads in application-focused RL research, reflecting a strong nexus between Industrial Automation infrastructure and real-world experimentation, while the USA, Europe, and other regions contribute predominantly in foundational algorithmic and theoretical development. Despite these advances, several persistent challenges hinder large-scale adoption, including sample inefficiency, limited interpretability, safety and robustness concerns, and the complexity of integrating RL systems with legacy industrial infrastructures. These limitations underscore the need for modular, explainable, and rigorously validated RL frameworks that can operate safely alongside human workers.

To systematically guide future efforts and bridge the identified gaps, this review proposes a strategic roadmap structured around four interconnected dimensions: (1) Algorithmic Readiness, focusing on the development of safe, explainable, and data-efficient RL methods; (2) Technological Integration, leveraging digital twins and IoT for seamless simulation-to-reality transfer and deployment; (3) Validation Maturity, establishing standardized benchmarks and promoting rigorous pilot-scale testing in real-world settings; and (4) Human-Centricity, ensuring operational transparency, safe human–robot collaboration, and supportive policies for workforce transition. This roadmap provides a clear framework for stakeholders to prioritize research and development, accelerating the path toward industrially viable RL systems.

Looking forward, the convergence of digital twin technology, transfer learning, and modular RL architectures represents a promising pathway toward scalable, safe, and interpretable autonomous industrial systems. At the same time, the rise of Industry 5.0 places renewed emphasis on human–machine collaboration, where RL-driven automation should augment rather than replace human expertise. Ensuring seamless cooperation between human operators and intelligent systems will require new interaction models, intuitive interfaces, and training mechanisms that keep humans in meaningful supervisory roles.

Equally important is the establishment of standardized protocols, ethical guidelines, and regulatory frameworks to support responsible deployment and build trust among industrial stakeholders. Bridging the gap between simulation-based research and real-world implementation remains a critical priority, as rigorous validation under realistic industrial conditions will ultimately determine RL’s practical value.

In summary, RL has the potential to reshape modern Industrial Automation by enabling adaptive, resilient, and intelligent operations. Achieving this vision demands coordinated progress in algorithmic innovation, human–machine teamwork, system-level integration, safety assurance, and ethical governance, paving the way for fully autonomous, human-centered, and sustainable smart factories aligned with both Industry 4.0 and Industry 5.0 paradigms.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/machines13121140/s1. This supplementary table provides a comprehensive summary of 88 studies examining reinforcement learning (RL) applications in Industrial Automation, spanning sectors from manufacturing and robotics to energy systems, supply chain management, and ethical AI considerations across multiple countries and methodologies.

Author Contributions

Conceptualization, Y.M.A. and O.S.; methodology, O.S.; software (R v 4.5.1), O.S.; validation, Y.M.A., O.S. and W.S.; formal analysis, W.S.; investigation, W.S.; resources, Y.M.A.; data curation, W.S.; writing—original draft preparation, O.S.; writing—review and editing, Y.M.A. and W.S.; visualization, O.S.; supervision, Y.M.A.; project administration, W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article and its Supplementary Materials. Further inquiries can be directed to the corresponding author.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT v5.1 and DeepSeek (latest model) to assist with rephrasing and improving the clarity of some text in the Introduction and Literature Review, and to review and correct language, grammar, and formatting throughout the manuscript. Google Gemini Flash 2.5 was used to enhance the resolution of figures. All AI-assisted outputs were reviewed, edited, and validated by the authors, who take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ASC-RL	Adaptive Scheduling Control using Reinforcement Learning
CIoT-DPM	Cognitive IoT–based Dynamic Power Management
CPS	cyber-physical systems
CTDE	Centralized Training with Decentralized Execution
DL	Deep Learning
DQN	Deep Q-Network
DRL	Deep Reinforcement Learning
HASAC	Hyper-Actor Soft Actor–Critic
HITL-RL	Human-in-the-Loop Reinforcement Learning
IEEE	Institute of Electrical and Electronics Engineers
IoT	Internet of Things
JBI	Joanna Briggs Institute
ML	Machine Learning
MARL	Multi-Agent Reinforcement Learning
MPC	Model Predictive Control
NNs	Neural Networks
PID	Proportional-Integral-Derivative
PPO	Proximal Policy Optimization
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
RL	Reinforcement Learning
SAC	Soft Actor–Critic
XRL	Explainable Reinforcement learning

Appendix A

Table A1. Methodological quality assessment of included studies using a modified JBI checklist.

No.	Study	Clear Research Aims	Detailed Methodology Description	Comprehensive Results Reporting	Appropriate Data Analysis	Relevant Conclusions	Overall Quality Score
[1]	De Asis et al. (2018)	✓	✓	✓	✓	✓	5/5
[2]	Farooq & Iqbal (2024)	✓	✓	✓	△	✓	4/5
[3]	Xia et al. (2021)	✓	✓	✓	✓	✓	5/5
[4]	Ryalat et al. (2023)	✓	△	✓	✓	✓	4/5
[5]	del Real Torres et al. (2022)	✓	✓	✓	✓	✓	5/5
[6]	Langås et al. (2025)	✓	✓	✓	✓	✓	5/5
[7]	Moos et al. (2022)	✓	✓	✓	✓	✓	5/5
[8]	Arulkumaran et al. (2017)	✓	✓	△	✓	✓	4/5
[9]	Han et al. (2023)	✓	✓	✓	✓	✓	5/5
[10]	Kober et al. (2013)	✓	✓	✓	✓	✓	5/5
[11]	Schulman et al. (2017)	✓	✓	✓	✓	✓	5/5
[12]	Hou et al. (2020)	✓	✓	△	✓	✓	4/5
[13]	Sathya et al. (2024)	✓	△	✓	✓	✓	4/5
[14]	Rajasekhar et al. (2025)	✓	✓	✓	✓	✓	5/5
[15]	Hady et al. (2025)	✓	✓	✓	✓	✓	5/5
[16]	Jia & Pei (2025)	✓	✓	✓	✓	✓	5/5
[17]	Abbas et al. (2025)	✓	✓	✓	✓	✓	5/5
[18]	Schwung et al. (2018)	✓	△	✓	△	✓	3/5
[19]	Chang et al. (2023)	✓	✓	✓	✓	✓	5/5
[20]	Emami et al. (2024)	✓	△	△	△	✓	3/5
[21]	Hernandez-Leal et al. (2019)	✓	✓	✓	✓	✓	5/5
[22]	Rolf et al. (2023)	✓	✓	✓	✓	✓	5/5
[23]	Martins et al. (2025)	✓	✓	✓	✓	✓	5/5
[24]	Xu & Peng (2025)	✓	△	△	△	✓	3/5
[25]	Chen et al. (2021)	✓	✓	✓	✓	✓	5/5
[26]	Eckardt et al. (2021)	✓	✓	✓	△	✓	4/5
[27]	Paranjape et al. (2025)	✓	✓	✓	✓	✓	5/5
[28]	Peta et al. (2025)	✓	✓	✓	✓	✓	5/5
[29]	Esteso et al. (2023)	✓	✓	✓	✓	✓	5/5
[30]	Khan et al. (2025)—CIoT-DPM	✓	✓	✓	✓	✓	5/5
[31]	Khan et al. (2025)	✓	✓	✓	✓	✓	5/5
[32]	Mayer et al. (2021)	✓	✓	✓	✓	✓	5/5
[33]	Zhang et al. (2022)	✓	✓	✓	△	✓	4/5
[34]	Phan & Ngo (2023)	✓	✓	✓	✓	✓	5/5
[35]	Mohammadi et al. (2025)	✓	✓	✓	✓	✓	5/5
[36]	Terven (2025)	✓	△	△	△	✓	3/5
[37]	Sola et al. (2022)	✓	✓	✓	✓	✓	5/5
[38]	Jin et al. (2025)	✓	△	△	△	✓	3/5
[39]	Li (2022)	✓	△	△	△	✓	3/5
[40]	Yuan et al. (2024)	✓	✓	✓	✓	✓	5/5
[41]	Wang et al. (2025)	✓	✓	✓	✓	✓	5/5
[42]	Chen et al. (2024)	✓	△	△	△	✓	3/5
[43]	Rahanu et al. (2021)	✓	△	✓	✓	✓	4/5
[44]	Martinho et al. (2021)	✓	✓	✓	✓	✓	5/5
[45]	Leikas et al. (2019)	✓	△	✓	✓	✓	4/5
[46]	Hashmi et al. (2024)	✓	✓	✓	✓	✓	5/5
[47]	Clark et al. (2025)	✓	✓	✓	✓	✓	5/5
[48]	Rădulescu et al. (2020)	✓	✓	✓	✓	✓	5/5
[49]	Kallenberg et al. (2025)	✓	✓	✓	✓	✓	5/5
[50]	Page et al. (2021)	✓	✓	✓	✓	✓	5/5
[51]	Moola et al. (2015)	✓	✓	✓	✓	✓	5/5
[52]	Yu et al. (2020)	✓	✓	✓	✓	✓	5/5
[53]	Dervis (2019)	✓	✓	✓	✓	✓	5/5
[54]	Singh et al. (2021)	✓	✓	✓	✓	✓	5/5
[55]	Semeraro et al. (2023)	✓	✓	✓	✓	✓	5/5
[56]	Keshvarparast et al. (2024)	✓	✓	✓	✓	✓	5/5
[57]	Sizo et al. (2025)	✓	✓	✓	△	✓	4/5
[58]	Marzi et al. (2025)	✓	✓	✓	✓	✓	5/5
[59]	Mancin et al. (2024)	✓	✓	✓	✓	✓	5/5
[60]	Nezameslami et al. (2025)	✓	✓	✓	✓	✓	5/5
[61]	Sarkis-Onofre et al. (2021)	✓	✓	✓	✓	✓	5/5
[62]	Moher et al. (2009)	✓	✓	✓	✓	✓	5/5
[63]	Samal (2025)	✓	✓	✓	✓	✓	5/5
[64]	Ouhadi et al. (2024)	✓	✓	✓	✓	✓	5/5
[65]	Aromataris & Pearson (2014)	✓	✓	✓	✓	✓	5/5
[66]	Ahn & Kang (2018)	✓	✓	✓	✓	✓	5/5
[67]	Liu et al. (2022)	✓	✓	✓	✓	✓	5/5
[68]	Khdoudi et al. (2024)	✓	✓	✓	✓	✓	5/5
[69]	Wells & Bednarz (2021)	✓	✓	✓	✓	✓	5/5
[70]	Su et al. (2025)	✓	✓	✓	✓	✓	5/5
[71]	Liu (2025)	✓	✓	✓	✓	✓	5/5
[72]	Liengpunsakul (2021)	✓	△	✓	✓	✓	4/5
[73]	Jung & Cho (2025)	✓	✓	✓	✓	✓	5/5
[74]	Cheng & Zeng (2023)	✓	△	✓	✓	✓	4/5
[75]	Rajamallaiah et al. (2025)	✓	✓	✓	✓	✓	5/5
[76]	Gautam (2023)	✓	✓	✓	✓	✓	5/5
[77]	Li et al. (2023)	✓	✓	✓	✓	✓	5/5
[78]	Khurram et al. (2025)	✓	✓	✓	✓	✓	5/5
[79]	Hagendorff (2021)	✓	△	✓	✓	✓	4/5
[80]	Ilahi et al. (2021)	✓	✓	✓	✓	✓	5/5
[81]	Zhang et al. (2022)	✓	✓	✓	✓	✓	5/5
[82]	Sun et al. (2021)	✓	✓	✓	✓	✓	5/5
[83]	Bouali et al. (2025)	✓	✓	✓	✓	✓	5/5
[84]	Noriega et al. (2025)	✓	✓	✓	✓	✓	5/5
[85]	Xiao et al. (2022)	✓	✓	✓	✓	✓	5/5
[86]	Chen et al. (2020)	✓	✓	✓	✓	✓	5/5
[87]	Wei et al. (2023)	✓	✓	✓	✓	✓	5/5
[88]	Abo-Khalil (2023)	✓	✓	✓	✓	✓	5/5
[89]	Hua et al. (2021)	✓	✓	✓	✓	✓	5/5
[90]	Raziei & Moghaddam (2021)	✓	✓	✓	△	✓	4/5

Scoring: ✓ = Criterion fully met; △ = Criterion partially met; X = Criterion not met; Overall Quality Score: Sum of fully met criteria (5 = highest quality). Note. This table uses a modified version of the JBI Critical Appraisal Checklist with five simplified methodological criteria adapted for this study.

Overall Summary (References 1–90):

75.6% of all studies met all quality criteria (68/90 studies)
16.7% met 4 out of 5 criteria (15/90 studies)
7.8% scored 3 or below (7/90 studies)
The overall high-quality scores validate the robustness of the literature corpus selected for this systematic review.

References

Asis, K.D.; Hernandez-Garcia, J.F.; Holland, G.Z.; Sutton, R.S. Multi-step reinforcement learning: A unifying algorithm. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; p. 354. [Google Scholar]
Farooq, A.; Iqbal, K. A Survey of Reinforcement Learning for Optimization in Automation. In Proceedings of the 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE), Bari, Italy, 28 August–1 September 2024; pp. 2487–2494. [Google Scholar] [CrossRef]
Xia, K.; Sacco, C.; Kirkpatrick, M.; Saidy, C.; Nguyen, L.; Kircaliali, A.; Harik, R. A digital twin to train deep reinforcement learning agent for smart manufacturing plants: Environment, interfaces and intelligence. J. Manuf. Syst. 2021, 58, 210–230. [Google Scholar] [CrossRef]
Ryalat, M.; ElMoaqet, H.; AlFaouri, M. Design of a Smart Factory Based on Cyber-Physical Systems and Internet of Things towards Industry 4.0. Appl. Sci. 2023, 13, 2156. [Google Scholar] [CrossRef]
del Real Torres, A.; Andreiana, D.S.; Ojeda Roldán, Á.; Hernández Bustos, A.; Acevedo Galicia, L.E. A Review of Deep Reinforcement Learning Approaches for Smart Manufacturing in Industry 4.0 and 5.0 Framework. Appl. Sci. 2022, 12, 12377. [Google Scholar] [CrossRef]
Langås, E.F.; Zafar, M.H.; Sanfilippo, F. Exploring the synergy of human-robot teaming, digital twins, and machine learning in Industry 5.0: A step towards sustainable manufacturing. J. Intell. Manuf. 2025, 89, 102769. [Google Scholar] [CrossRef]
Moos, J.; Hansel, K.; Abdulsamad, H.; Stark, S.; Clever, D.; Peters, J. Robust Reinforcement Learning: A Review of Foundations and Recent Advances. Mach. Learn. Knowl. Extr. 2022, 4, 276–315. [Google Scholar] [CrossRef]
Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
Han, D.; Mulyana, B.; Stankovic, V.; Cheng, S. A Survey on Deep Reinforcement Learning Algorithms for Robotic Manipulation. Sensors 2023, 23, 3762. [Google Scholar] [CrossRef]
Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Robot. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Hou, Z.; Zhang, K.; Wan, Y.; Li, D.; Fu, C.; Yu, H. Off-policy maximum entropy reinforcement learning: Soft actor-critic with advantage weighted mixture policy (sac-awmp). arXiv 2020, arXiv:2002.02829. [Google Scholar] [CrossRef]
Sathya, D.; Saravanan, G.; Thangamani, R. Reinforcement Learning for Adaptive Mechatronics Systems. In Computational Intelligent Techniques in Mechatronics; Wiley: Hoboken, NJ, USA, 2024; pp. 135–184. [Google Scholar]
Rajasekhar, N.; Radhakrishnan, T.K.; Samsudeen, N. Exploring reinforcement learning in process control: A comprehensive survey. Int. J. Syst. Sci. 2025, 56, 3528–3557. [Google Scholar] [CrossRef]
Hady, M.A.; Hu, S.; Pratama, M.; Cao, Z.; Kowalczyk, R. Multi-agent reinforcement learning for resources allocation optimization: A survey. Artif. Intell. Rev. 2025, 58, 354. [Google Scholar] [CrossRef]
Jia, L.; Pei, Y. Recent Advances in Multi-Agent Reinforcement Learning for Intelligent Automation and Control of Water Environment Systems. Machines 2025, 13, 503. [Google Scholar] [CrossRef]
Abbas, A.N.; Amazu, C.W.; Mietkiewicz, J.; Briwa, H.; Perez, A.A.; Baldissone, G.; Demichela, M.; Chasparis, G.C.; Kelleher, J.D.; Leva, M.C. Analyzing Operator States and the Impact of AI-Enhanced Decision Support in Control Rooms: A Human-in-the-Loop Specialized Reinforcement Learning Framework for Intervention Strategies. Int. J. Hum. Comput. Interact. 2025, 41, 7218–7252. [Google Scholar] [CrossRef]
Schwung, D.; Reimann, J.N.; Schwung, A.; Ding, S.X. Self Learning in Flexible Manufacturing Units: A Reinforcement Learning Approach. In Proceedings of the 2018 International Conference on Intelligent Systems (IS), Madeira, Portugal, 25–27 September 2018; pp. 31–38. [Google Scholar] [CrossRef]
Chang, X.; Jia, X.; Fu, S.; Hu, H.; Liu, K. Digital twin and deep reinforcement learning enabled real-time scheduling for complex product flexible shop-floor. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2022, 237, 1254–1268. [Google Scholar] [CrossRef]
Emami, Y.; Almeida, L.; Li, K.; Ni, W.; Han, Z. Human-in-the-loop machine learning for safe and ethical autonomous vehicles: Principles, challenges, and opportunities. arXiv 2024, arXiv:2408.12548. [Google Scholar] [CrossRef]
Hernandez-Leal, P.; Kartal, B.; Taylor, M.E. A survey and critique of multiagent deep reinforcement learning. Auton. Agents Multi-Agent Syst. 2019, 33, 750–797. [Google Scholar] [CrossRef]
Rolf, B.; Jackson, I.; Müller, M.; Lang, S.; Reggelin, T.; Ivanov, D. A review on reinforcement learning algorithms and applications in supply chain management. Int. J. Prod. Res. 2023, 61, 7151–7179. [Google Scholar] [CrossRef]
Martins, M.S.E.; Sousa, J.M.C.; Vieira, S. A Systematic Review on Reinforcement Learning for Industrial Combinatorial Optimization Problems. Appl. Sci. 2025, 15, 1211. [Google Scholar] [CrossRef]
Xu, R.; Peng, J. A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications. arXiv 2025, arXiv:2506.12594. [Google Scholar] [CrossRef]
Chen, W.; Qiu, X.; Cai, T.; Dai, H.N.; Zheng, Z.; Zhang, Y. Deep Reinforcement Learning for Internet of Things: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2021, 23, 1659–1692. [Google Scholar] [CrossRef]
Eckardt, J.N.; Wendt, K.; Bornhäuser, M.; Middeke, J.M. Reinforcement Learning for Precision Oncology. Cancers 2021, 13, 4624. [Google Scholar] [CrossRef] [PubMed]
Paranjape, A.; Quader, N.; Uhlmann, L.; Berkels, B.; Wolfschläger, D.; Schmitt, R.H.; Bergs, T. Reinforcement Learning Agent for Multi-Objective Online Process Parameter Optimization of Manufacturing Processes. Appl. Sci. 2025, 15, 7279. [Google Scholar] [CrossRef]
Peta, K.; Wiśniewski, M.; Kotarski, M.; Ciszak, O. Comparison of Single-Arm and Dual-Arm Collaborative Robots in Precision Assembly. Appl. Sci. 2025, 15, 2976. [Google Scholar] [CrossRef]
Esteso, A.; Peidro, D.; Mula, J.; Díaz-Madroñero, M. Reinforcement learning applied to production planning and control. Int. J. Prod. Res. 2023, 61, 5772–5789. [Google Scholar] [CrossRef]
Khan, M.N.; Ullah, I.; Bashir, A.K.; Al-Khasawneh, M.A.; Arishi, A.; Alghamdi, N.S.; Lee, S. Reinforcement Learning-Based Dynamic Power Management for Energy Optimization in IoT-Enabled Consumer Electronics. IEEE Trans. Consum. Electron. 2025, 71, 8103–8114. [Google Scholar] [CrossRef]
Khan, M.N.; Lee, S.; Shah, M. Adaptive Scheduling in Cognitive IoT Sensors for Optimizing Network Performance Using Reinforcement Learning. Appl. Sci. 2025, 15, 5573. [Google Scholar] [CrossRef]
Phan, L.A.D.; Ngo, H.Q.T. Systematic Review of Smart Robotic Manufacturing in the Context of Industry 4.0. In Context-Aware Systems and Applications, Proceedings of the 12th EAI International Conference, ICCASA 2023, Ho Chi Minh City, Vietnam, 26–27 October 2023; Vinh, P.C., Tung, N.T., Eds.; Springer Nature: Cham, Switzerland, 2024; pp. 19–42. [Google Scholar] [CrossRef]
Mayer, S.; Classen, T.; Endisch, C. Modular production control using deep reinforcement learning: Proximal policy optimization. J. Intell. Manuf. 2021, 32, 2335–2351. [Google Scholar] [CrossRef]
Zhang, M.; Lu, Y.; Hu, Y.; Amaitik, N.; Xu, Y. Dynamic Scheduling Method for Job-Shop Manufacturing Systems by Deep Reinforcement Learning with Proximal Policy Optimization. Sustainability 2022, 14, 5177. [Google Scholar] [CrossRef]
Mohammadi, E.; Ortiz-Arroyo, D.; Hansen, A.A.; Stokholm-Bjerregaard, M.; Gros, S.; Anand, A.S.; Durdevic, P. Application of Soft Actor-Critic algorithms in optimizing wastewater treatment with time delays integration. Expert Syst. Appl. 2025, 277, 127180. [Google Scholar] [CrossRef]
Terven, J. Deep Reinforcement Learning: A Chronological Overview and Methods. AI 2025, 6, 46. [Google Scholar] [CrossRef]
Sola, Y.; Le Chenadec, G.; Clement, B. Simultaneous Control and Guidance of an AUV Based on Soft Actor–Critic. Sensors 2022, 22, 6072. [Google Scholar] [CrossRef]
Jin, W.; Du, H.; Zhao, B.; Tian, X.; Shi, B.; Yang, G. A comprehensive survey on multi-agent cooperative decision-making: Scenarios, approaches, challenges and perspectives. arXiv 2025, arXiv:2503.13415. [Google Scholar] [CrossRef]
Li, Y. Reinforcement learning in practice: Opportunities and challenges. arXiv 2022, arXiv:2202.11296. [Google Scholar] [CrossRef]
Yuan, Z.; Zhang, Z.; Li, X.; Cui, Y.; Li, M.; Ban, X. Controlling Partially Observed Industrial System Based on Offline Reinforcement Learning—A Case Study of Paste Thickener. IEEE Trans. Ind. Inform. 2025, 21, 49–59. [Google Scholar] [CrossRef]
Wang, T.; Ruan, Z.; Wang, Y.; Chen, C. Control strategy of robotic manipulator based on multi-task reinforcement learning. Complex Intell. Syst. 2025, 11, 175. [Google Scholar] [CrossRef]
Chen, J.; He, J.; Chen, F.; Lv, Z.; Tang, J.; Li, W.; Liu, Z.; Yang, H.H.; Han, G. Towards General Industrial Intelligence: A Survey of Continual Large Models in Industrial IoT. arXiv 2024, arXiv:2409.01207. [Google Scholar] [CrossRef]
Rahanu, H.; Georgiadou, E.; Siakas, K.; Ross, M.; Berki, E. Ethical Issues Invoked by Industry 4.0. In Systems, Software and Services Process Improvement, Proceedings of the 28th European Conference, EuroSPI 2021, Krems, Austria, 1–3 September 2021; Yilmaz, M., Clarke, P., Messnarz, R., Reiner, M., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 589–606. [Google Scholar] [CrossRef]
Martinho, A.; Herber, N.; Kroesen, M.; Chorus, C. Ethical issues in focus by the autonomous vehicles industry. Transp. Rev. 2021, 41, 556–577. [Google Scholar] [CrossRef]
Leikas, J.; Koivisto, R.; Gotcheva, N. Ethical Framework for Designing Autonomous Intelligent Systems. J. Open Innov. Technol. Mark. Complex. 2019, 5, 18. [Google Scholar] [CrossRef]
Hashmi, R.; Liu, H.; Yavari, A. Digital Twins for Enhancing Efficiency and Assuring Safety in Renewable Energy Systems: A Systematic Literature Review. Energies 2024, 17, 2456. [Google Scholar] [CrossRef]
Clark, L.; Garcia, B.; Harris, S. Digital Twin-Driven Operational Management Framework for Real-Time Decision-Making in Smart Factories. J. Innov. Gov. Bus. Pract. 2025, 1, 59–87. [Google Scholar]
Rădulescu, R.; Mannion, P.; Roijers, D.M.; Nowé, A. Multi-objective multi-agent decision making: A utility-based analysis and survey. Auton. Agents Multi-Agent Syst. 2019, 34, 10. [Google Scholar] [CrossRef]
Kallenberg, M.; Baja, H.; Ilić, M.; Tomčić, A.; Tošić, M.; Athanasiadis, I. Interoperable agricultural digital twins with reinforcement learning intelligence. Smart Agric. Technol. 2025, 12, 101412. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
Moola, S.; Munn, Z.; Sears, K.; Sfetcu, R.; Currie, M.; Lisy, K.; Tufanaru, C.; Qureshi, R.; Mattis, P.; Mu, P. Conducting systematic reviews of association (etiology): The Joanna Briggs Institute’s approach. JBI Evid. Implement. 2015, 13, 163–169. [Google Scholar] [CrossRef]
Yu, D.; Xu, Z.; Wang, X. Bibliometric analysis of support vector machines research trend: A case study in China. Int. J. Mach. Learn. Cybern. 2020, 11, 715–728. [Google Scholar] [CrossRef]
Derviş, H. Bibliometric analysis using bibliometrix an R package. J. Scientometr. Res. 2019, 8, 156–160. [Google Scholar] [CrossRef]
Singh, V.K.; Singh, P.; Karmakar, M.; Leta, J.; Mayr, P. The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis. Scientometrics 2021, 126, 5113–5142. [Google Scholar] [CrossRef]
Semeraro, F.; Griffiths, A.; Cangelosi, A. Human–robot collaboration and machine learning: A systematic review of recent research. Robot. Comput.-Integr. Manuf. 2023, 79, 102432. [Google Scholar] [CrossRef]
Keshvarparast, A.; Battini, D.; Battaia, O.; Pirayesh, A. Collaborative robots in manufacturing and assembly systems: Literature review and future research agenda. J. Intell. Manuf. 2024, 35, 2065–2118. [Google Scholar] [CrossRef]
Sizo, A.; Lino, A.; Rocha, Á.; Reis, L.P. Defining quality in peer review reports: A scoping review. Knowl. Inf. Syst. 2025, 67, 6413–6460. [Google Scholar] [CrossRef]
Marzi, G.; Balzano, M.; Caputo, A.; Pellegrini, M.M. Guidelines for Bibliometric-Systematic Literature Reviews: 10 steps to combine analysis, synthesis and theory development. Int. J. Manag. Rev. 2025, 27, 81–103. [Google Scholar] [CrossRef]
Mancin, S.; Sguanci, M.; Andreoli, D.; Soekeland, F.; Anastasi, G.; Piredda, M.; De Marinis, M.G. Systematic review of clinical practice guidelines and systematic reviews: A method for conducting comprehensive analysis. MethodsX 2024, 12, 102532. [Google Scholar] [CrossRef] [PubMed]
Nezameslami, R.; Nezameslami, A.; Mehdikhani, B.; Mosavi-Jarrahi, A.; Shahbazi, A.; Rahmani, A.; Masoudi, A.; Yeganegi, M.; Akhondzardaini, R.; Bahrami, M.; et al. Adapting PRISMA Guidelines to Enhance Reporting Quality in Genetic Association Studies: A Framework Proposal. Asian Pac. J. Cancer Prev. 2025, 26, 1641–1651. [Google Scholar] [CrossRef] [PubMed]
Sarkis-Onofre, R.; Catalá-López, F.; Aromataris, E.; Lockwood, C. How to properly use the PRISMA Statement. Syst. Rev. 2021, 10, 117. [Google Scholar] [CrossRef]
Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; The, P.G. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med. 2009, 6, e1000097. [Google Scholar] [CrossRef]
Samal, U. Evolution of machine learning and deep learning in intelligent manufacturing: A bibliometric study. Int. J. Syst. Assur. Eng. Manag. 2025, 16, 3134–3150. [Google Scholar] [CrossRef]
Ouhadi, A.; Yahouni, Z.; Di Mascolo, M. Integrating machine learning and operations research methods for scheduling problems: A bibliometric analysis and literature review. IFAC-PapersOnLine 2024, 58, 946–951. [Google Scholar] [CrossRef]
Aromataris, E.; Pearson, A. The Systematic Review: An Overview. Am. J. Nurs. 2014, 114, 53–58. [Google Scholar] [CrossRef]
Ahn, E.; Kang, H. Introduction to systematic review and meta-analysis. Korean J. Anesth. 2018, 71, 103–112. [Google Scholar] [CrossRef]
Liu, Y.; Xu, H.; Liu, D.; Wang, L. A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping. Robot. Comput.-Integr. Manuf. 2022, 78, 102365. [Google Scholar] [CrossRef]
Khdoudi, A.; Masrour, T.; El Hassani, I.; El Mazgualdi, C. A Deep-Reinforcement-Learning-Based Digital Twin for Manufacturing Process Optimization. Systems 2024, 12, 38. [Google Scholar] [CrossRef]
Wells, L.; Bednarz, T. Explainable AI and Reinforcement Learning—A Systematic Review of Current Approaches and Trends. Front. Artif. Intell. 2021, 4, 550030. [Google Scholar] [CrossRef] [PubMed]
Su, T.; Wu, T.; Zhao, J.; Scaglione, A.; Xie, L. A Review of Safe Reinforcement Learning Methods for Modern Power Systems. Proc. IEEE 2025, 113, 213–255. [Google Scholar] [CrossRef]
Liu, Y. Bridging research and policy in China’s energy sector: A semantic and reinforcement learning framework. Energy Strategy Rev. 2025, 59, 101770. [Google Scholar] [CrossRef]
Liengpunsakul, S. Artificial Intelligence and Sustainable Development in China. Chin. Econ. 2021, 54, 235–248. [Google Scholar] [CrossRef]
Jung, W.; Cho, K. Deep Learning and NLP-Based Trend Analysis in Actuators and Power Electronics. Actuators 2025, 14, 379. [Google Scholar] [CrossRef]
Cheng, J.; Zeng, J. Shaping AI’s Future? China in Global AI Governance. J. Contemp. China 2023, 32, 794–810. [Google Scholar] [CrossRef]
Rajamallaiah, A.; Naresh, S.V.K.; Raghuvamsi, Y.; Manmadharao, S.; Bingi, K.; Anand, R.; Guerrero, J.M. Deep Reinforcement Learning for Power Converter Control: A Comprehensive Review of Applications and Challenges. IEEE Open J. Power Electron. 2025, 6, 1769–1802. [Google Scholar] [CrossRef]
Gautam, M. Deep Reinforcement Learning for Resilient Power and Energy Systems: Progress, Prospects, and Future Avenues. Electricity 2023, 4, 336–380. [Google Scholar] [CrossRef]
Li, C.; Bai, L.; Yao, L.; Waller, S.T.; Liu, W. A bibliometric analysis and review on reinforcement learning for transportation applications. Transp. B Transp. Dyn. 2023, 11, 2179461. [Google Scholar] [CrossRef]
Khurram, M.; Zhang, C.; Muhammad, S.; Kishnani, H.; An, K.; Abeywardena, K.; Chadha, U.; Behdinan, K. Artificial Intelligence in Manufacturing Industry Worker Safety: A New Paradigm for Hazard Prevention and Mitigation. Processes 2025, 13, 1312. [Google Scholar] [CrossRef]
Hagendorff, T. Forbidden knowledge in machine learning reflections on the limits of research and publication. AI Soc. 2021, 36, 767–781. [Google Scholar] [CrossRef]
Ilahi, I.; Usama, M.; Qadir, J.; Janjua, M.U.; Al-Fuqaha, A.; Hoang, D.T.; Niyato, D. Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning. IEEE Trans. Artif. Intell. 2022, 3, 90–109. [Google Scholar] [CrossRef]
Zhang, W.; Gu, X.; Tang, L.; Yin, Y.; Liu, D.; Zhang, Y. Application of machine learning, deep learning and optimization algorithms in geoengineering and geoscience: Comprehensive review and future challenge. Gondwana Res. 2022, 109, 1–17. [Google Scholar] [CrossRef]
Sun, H.; Zhang, W.; Yu, R.; Zhang, Y. Motion Planning for Mobile Robots—Focusing on Deep Reinforcement Learning: A Systematic Review. IEEE Access 2021, 9, 69061–69081. [Google Scholar] [CrossRef]
Bouali, M.; Sebaa, A.; Farhi, N. A Comprehensive Review on Reinforcement Learning Methods for Autonomous Lane Changing. Int. J. Intell. Transp. Syst. Res. 2025. [Google Scholar] [CrossRef]
Noriega, R.; Pourrahimian, Y.; Askari-Nasab, H. Deep Reinforcement Learning based real-time open-pit mining truck dispatching system. Comput. Oper. Res. 2025, 173, 106815. [Google Scholar] [CrossRef]
Xiao, X.; Liu, B.; Warnell, G.; Stone, P. Motion planning and control for mobile robot navigation using machine learning: A survey. Auton. Robot. 2022, 46, 569–597. [Google Scholar] [CrossRef]
Chen, J.; Lin, C.; Peng, D.; Ge, H. Fault Diagnosis of Rotating Machinery: A Review and Bibliometric Analysis. IEEE Access 2020, 8, 224985–225003. [Google Scholar] [CrossRef]
Wei, Z.; Liu, H.; Tao, X.; Pan, K.; Huang, R.; Ji, W.; Wang, J. Insights into the Application of Machine Learning in Industrial Risk Assessment: A Bibliometric Mapping Analysis. Sustainability 2023, 15, 6965. [Google Scholar] [CrossRef]
Abo-Khalil, A.G. Digital twin real-time hybrid simulation platform for power system stability. Case Stud. Therm. Eng. 2023, 49, 103237. [Google Scholar] [CrossRef]
Hua, J.; Zeng, L.; Li, G.; Ju, Z. Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning. Sensors 2021, 21, 1278. [Google Scholar] [CrossRef]
Raziei, Z.; Moghaddam, M. Enabling adaptable Industry 4.0 automation with a modular deep reinforcement learning framework. IFAC-PapersOnLine 2021, 54, 546–551. [Google Scholar] [CrossRef]

Figure 1. PRISMA Screening Flow Diagram.

Figure 2. Co-occurrence Network.

Figure 3. Countries’ Production over Time.

Figure 4. Thematic Research Flow (2017–2026).

Figure 5. Cumulative Frequency of Key Research Terminology Over Time (2017–2026).

Figure 6. Three-Field plot: Source, Country, and Thematic Research Flow.

Figure 7. Global High-Volume Flows and Connectivity.

Figure 8. Journal Trends in Industrial RL Research.

Table 1. Overview of RL Models in Industrial Automation.

RL Model/Approach	Industrial Application	Key Advantages	Key Challenges	Representative Studies	Real-World Deployment Status
Q-Learning	Adaptive process control, simple robotic tasks	Easy to implement; effective in discrete action spaces	Limited scalability; poor performance in high-dimensional continuous spaces	[2,8]	Mostly simulation and lab-scale deployment; limited industrial adoption
Collaborative Robotics (Single- vs. Dual-Arm RL-Assisted Control)	Precision assembly, dual-arm and single-arm robot evaluation, production station design	Improves assembly speed and path efficiency; dual-arm faster (20%); supports energy/performance trade-off analysis; simulation-driven optimization	Energy-speed trade-offs; requires high-fidelity simulation; profitability varies by task complexity	[28]	Simulation-based evaluation; applicable to industrial workstation design and robot deployment planning
Deep Q-Networks (DQN)	Robotic automation, assembly, pick-and-place	Handles high-dimensional sensory inputs; scalable to complex tasks	Requires large datasets; computationally intensive	[9,10,32]	Some pilot-scale deployments; emerging industrial adoption in robotics
Proximal Policy Optimization (PPO)	Supply chain management, production scheduling	Stable learning; effective for continuous action spaces; robust to hyperparameter changes	Sensitive to environment variability; may require extensive tuning	[11,33,34]	Primarily simulation; limited real-world application in dynamic supply chains
Soft Actor-Critic (SAC)	Energy-efficient process control, continuous robotics tasks	Sample-efficient; handles continuous actions; robust exploration	Complex implementation; sensitive to hyperparameter tuning	[35,36,37]	Mostly lab-scale; experimental deployments in process control and robotics
MARL	Multi-robot coordination, collaborative production	Decentralized decision-making; scalable for multi-agent environments	Communication overhead; coordination complexity; convergence issues	[1,15,38]	Early-stage pilot tests; limited industrial adoption
Offline/Batch RL	Processes with limited real-time data	Learns from historical datasets; safer for industrial deployment	Poor adaptability to unobserved states; performance depends on dataset quality	[2,39,40]	Some real-world trials in chemical and manufacturing processes
Hierarchical/Modular RL	Complex industrial systems with multiple sub-tasks	Enables task decomposition; improves scalability and transfer learning	Requires careful task structuring; may increase computational complexity	[41,42]	Mostly experimental and simulation-based; few industrial pilots
Constrained/Safe RL	Safety-critical industrial processes	Incorporates safety constraints; reduces risk of operational failures	Can limit exploration and performance; complex constraint design	[43,44,45]	Limited industrial deployment in high-risk manufacturing
Digital Twin-Integrated RL	Real-time optimization and predictive maintenance	Allows virtual simulation before deployment; reduces risk; enhances safety	Requires accurate digital twin models; high computational demand	[46,47]	Pilot projects in manufacturing, energy, and logistics; increasing adoption
IoT-Optimized RL	Energy management, sensor networks, consumer electronics	Reduces network congestion, improves energy efficiency, handles redundant data patterns	Limited to specific IoT architectures, requires sensor state management	[30,31]	Early industrial adoption in consumer electronics; emerging in industrial IoT

Table 2. Summary of Key Dimensions in RL for Industrial Automation.

Aspect	Description/Coverage	Representative Studies	Key Insights/Contributions	Identified Gaps & Future Research Needs	Key Trends & Insights
Research Focus	Application of RL for adaptive control and intelligent decision-making in smart manufacturing environments	[2,6,9]	RL enables autonomous optimization of processes, robotics, and supply chain systems through continuous learning	Limited real-world deployment and integration with legacy automation systems	Shift from classical RL to multi-agent and hierarchical RL for complex systems
Core RL Algorithms	Q-Learning, DQN, PPO, SAC, MARL, Offline RL, Hierarchical RL, Safe RL, Digital Twin–Integrated RL	[1,6,9,10,11,13,14,27,29]	Progressive evolution from discrete control to continuous, multi-agent, and simulation-driven control	Sample inefficiency, hyperparameter tuning, and limited interpretability	Emergence of safe, explainable, and hybrid RL methods
Industrial Application Areas	Process control, robotic automation, logistics and supply chain, energy management, predictive maintenance	[9,24,25,28,29,30,31]	RL improves precision, adaptability, and energy efficiency across domains	Underexplored applications in energy management and large-scale coordination	Growing focus on sustainable, energy-efficient, and smart manufacturing
Integration Technologies	CPS, IoT, Digital Twins, Edge AI, Cloud-based RL frameworks	[3,4,16,30,40,41]	Enhances simulation-based optimization, safety, and real-time decision-making	High computational cost and need for real-time synchronization between physical and virtual layers	Digital twins and CPS integration facilitate sim-to-real transfer and pilot deployments
Advantages of RL in Automation	Adaptive decision-making, real-time optimization, autonomous control, self-learning, reduced programming effort	[2,8,25,30,31]	Significant improvement in operational flexibility and fault tolerance	Scalability and explainability remain key constraints	Supports modular, flexible, and self-optimizing production processes
Major Challenges	Sample inefficiency, safety and robustness, interpretability, integration with legacy systems	[6,18,34]	Recognized barriers to industrial deployment	Need for hybrid RL methods combining model-based, safe, and explainable learning	Safety, trust, and interpretability are critical bottlenecks
Emerging Strategies	Digital Twin–Integrated RL, Hierarchical and Modular RL, Safe and XRL, Transfer Learning	[16,17,35,36,40]	Improve scalability, safety, and transferability	Lack of standardized evaluation frameworks and benchmarks	Hybrid, modular, and transfer learning approaches are accelerating real-world adoption
Validation and Deployment	Predominantly simulation or pilot-scale; limited industrial field tests	[9,14,33,41]	Promising pilot results in robotics and predictive maintenance	Require large-scale case studies and longitudinal validation	Simulation remains primary testing environment; limited full-scale deployment
Ethical and Regulatory Aspects	Safety compliance, human–AI collaboration, decision transparency, accountability	[18,37,42,43]	Growing attention to AI ethics and trust	Need for policy-aligned frameworks for safe AI integration	Explainable and safe RL is essential for human–AI collaboration and regulatory compliance
Future Research Directions	Hybrid RL frameworks, digital twin co-simulation, data-efficient learning, explainable and safe RL for Industry 5.0	[19,20,21,30,31,40,43]	Pathway toward self-optimizing, human-centered, and sustainable smart factories	Bridging theoretical RL advances with industrial-scale implementation	Standardized benchmarks, data-efficient methods, and human-centered designs are needed for widespread adoption

Table 3. Research Dimensions, Applications, and Emerging Trends in RL for Industrial Automation.

RL Category	Representative Algorithms	Typical Applications	Integration Technologies	Advantages	Limitations	Deployment Status	Key Trends & Insights
Classical RL	Q-Learning	Discrete control tasks, simple process automation	None/basic simulation	Simple, low computational cost	Poor scalability, limited to low-dimensional problems	Simulation/pilot-scale	Still foundational for benchmarking; used for small-scale, low-risk tasks
DRL	DQN, PPO, SAC	High-dimensional robotic tasks, pick-and-place, assembly, process control, adaptive IoT sensor scheduling, collaborative assembly analysis	Digital Twins, IoT, CPS	Handles high-dimensional inputs, continuous actions; robust and stable	Sample inefficiency, black-box nature	Pilot-scale/industrial experiments	Core of DRL-driven automation; increasing use in multi-stage industrial systems
MARL	MADDPG, QMIX	Collaborative robotics, decentralized production lines, dual-arm cooperative assembly	CPS, IoT, networked environments dual-arm cooperative assembly	Supports coordination among multiple agents; scalable for distributed systems	Coordination complexity, communication overhead	Pilot-scale/research testbeds	Essential for multi-agent and distributed industrial applications
Hierarchical & Modular RL	Options framework, Feudal RL	Multi-stage production, task decomposition	Digital Twins, IoT	Knowledge transfer, improved learning efficiency, interpretable	Complex design, integration overhead	Pilot-scale	Emerging for scalability, modularity, and energy efficiency
Safe & XRL	Constrained RL, XRL	Safety-critical processes, predictive maintenance	Digital Twins, CPS	Ensures operational compliance, builds trust, improves reliability	Slower learning, limited generalization	Pilot-scale/small deployment	Key for regulatory compliance and human-centered design
Hybrid RL	Model-based + model-free approaches	Adaptive control, optimization under uncertainty, IoT power Management	Digital Twins, CPS, IoT	Combines sample efficiency with robustness; reduces sim-to-real gap	Design complexity, computational cost	Simulation/early industrial deployment	Trend toward bridging theory and practical deployment; supports Industry 5.0 objectives

Table 4. Comparative Analysis: RL vs. Established Industrial Control Methods.

Aspect	Classical Control (PID/MPC)	Optimization Algorithms	Reinforcement Learning
Model Requirements	Requires accurate system models (MPC) or transfer functions	Model-free or model-based variants	Model-free; learns from interaction
Adaptability	Limited; requires retuning for new conditions	Good for static optimization	High adaptability to dynamic environments
Handling Uncertainty	Limited performance under high uncertainty	Variable performance	Excels in stochastic, nonlinear environments
Sample Efficiency	High-minimal data for tuning	Medium-high	Very low-millions of interactions needed
Safety & Reliability	Proven stability guarantees	Depends on implementation	Safety concerns during exploration
Interpretability	Transparent and interpretable	Medium interpretability	Black-box nature limits trust
Computational Demand	Low to medium	Medium to high	High computational intensity
Integration Complexity	Mature integration with legacy systems	Moderate	Challenging integration with existing infrastructure
Real-world Deployment	Extensive industrial deployment	Widespread use	Limited to pilot-scale (22% of studies)
Best Suited For	Well-defined, deterministic processes	Static optimization problems	Complex, dynamic environments with simulation capability

Table 5. Tree Map Analysis.

NO.	Terms	Frequency	%
1	reinforcement learning	594	14.00613063
2	reinforcement learnings	358	8.441405329
3	motion planning	339	7.993397784
4	deep learning	233	5.493987267
5	robot programming	219	5.163876444
6	deep reinforcement learning	205	4.833765621
7	path planning	179	4.220702664
8	learning algorithms	119	2.805941995
9	robotics	105	2.475831172
10	learning systems	95	2.240037727
…
47	collisions avoidance	28	0.660221646
48	manipulators	27	0.636642301
49	performance	27	0.636642301
50	reinforcement learning (RL)	27	0.636642301

Table 6. Research Terminology Trends and Chronology.

Term	Frequency	Year (Q1)	Year (Median)	Year (Q3)
reinforcement learning	593	2022	2024	2025
reinforcement learnings	358	2023	2024	2025
motion planning	339	2022	2024	2025
learning algorithms	119	2022	2023	2024
learning systems	95	2021	2023	2024
robot learning	87	2024	2025	2025
machine learning	72	2020	2023	2024
robots	56	2020	2021	2022
robotics	105	2021	2022	2024
adversarial machine learning	35	2024	2025	2025
markov processes	30	2020	2022	2024
agricultural robots	29	2020	2021	2023
reinforcement learning method	26	2020	2022	2025
multipurpose robots	26	2024	2025	2025
reinforcement learning approach	17	2021	2021	2023
markov decision processes	14	2019	2020	2024
deep learning in robotics and automation	8	2019	2020	2020
path planning problems	8	2019	2020	2022
robotic manipulators	5	2018	2018	2024

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alginahi, Y.M.; Sabri, O.; Said, W. Reinforcement Learning for Industrial Automation: A Comprehensive Review of Adaptive Control and Decision-Making in Smart Factories. Machines 2025, 13, 1140. https://doi.org/10.3390/machines13121140

AMA Style

Alginahi YM, Sabri O, Said W. Reinforcement Learning for Industrial Automation: A Comprehensive Review of Adaptive Control and Decision-Making in Smart Factories. Machines. 2025; 13(12):1140. https://doi.org/10.3390/machines13121140

Chicago/Turabian Style

Alginahi, Yasser M., Omar Sabri, and Wael Said. 2025. "Reinforcement Learning for Industrial Automation: A Comprehensive Review of Adaptive Control and Decision-Making in Smart Factories" Machines 13, no. 12: 1140. https://doi.org/10.3390/machines13121140

APA Style

Alginahi, Y. M., Sabri, O., & Said, W. (2025). Reinforcement Learning for Industrial Automation: A Comprehensive Review of Adaptive Control and Decision-Making in Smart Factories. Machines, 13(12), 1140. https://doi.org/10.3390/machines13121140

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Reinforcement Learning for Industrial Automation: A Comprehensive Review of Adaptive Control and Decision-Making in Smart Factories

Abstract

1. Introduction

2. Literature Review

2.1. Algorithmic Approaches and Comparative Analysis

2.2. Research Dimensions and Synthesis

2.3. Positioning RL Against Classical Control and Optimization Methods

3. Methodology

3.1. Data Sources and Search Strategy

3.2. Inclusion and Exclusion Criteria

3.3. Selection Process

3.4. Data Extraction and Preparation

3.5. Bibliometric Analysis Tools

3.6. Methodological Quality Assessment

4. Results and Discussion

Research Trends and Insights from Bibliometric Analyses

5. Future Directions and Research Opportunities

5.1. Technological Advancements

5.1.1. Digital Twin Integration

5.1.2. Transfer Learning for Industrial Adaptability

5.1.3. Modular and Hierarchical RL Architectures

5.2. Implementation Considerations

5.2.1. Standardization and Interoperability

5.2.2. Ethical and Regulatory Considerations

5.2.3. Real-World Validation and Scalability

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI