Study on Energy Efficiency and Maintenance Optimization of Run-Out Table in Hot Rolling Mills Using Long Short-Term Memory-Autoencoders

Yun, Ju-Woong; Choi, So-Won; Lee, Eul-Bum

doi:10.3390/en18092295

Open AccessFeature PaperArticle

Study on Energy Efficiency and Maintenance Optimization of Run-Out Table in Hot Rolling Mills Using Long Short-Term Memory-Autoencoders

by

Ju-Woong Yun

^1,2,

So-Won Choi

¹

and

Eul-Bum Lee

^1,3,*

¹

Graduate Institute of Ferrous and Eco Materials Technology, Pohang University of Science and Technology (POSTECH), Pohang 37673, Republic of Korea

²

Electrical Steel Maintenance Section, Rolling Facilities Department Ⅱ, Pohang Iron and Steel Company (POSCO), Pohang 37754, Republic of Korea

³

Department of Industrial and Management Engineering, Pohang University of Science and Technology (POSTECH), 77 Cheongam-Ro, Nam-Ku, Pohang 37673, Republic of Korea

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(9), 2295; https://doi.org/10.3390/en18092295

Submission received: 31 March 2025 / Revised: 25 April 2025 / Accepted: 28 April 2025 / Published: 30 April 2025

(This article belongs to the Special Issue Artificial Intelligence for a Sustainable Oil and Gas Industry and Energy Transition)

Download

Browse Figures

Versions Notes

Abstract

The steel industry, as a large-scale equipment-intensive sector, emphasizes the importance of maintaining and managing equipment without failure. In line with the recent Fourth Industrial Revolution, there is a growing shift from preventive to predictive maintenance (PdM) strategies for cost-effective equipment management. This study aims to develop a PdM model for the Run-Out Table (ROT) equipment in hot rolling mills of steel plants, utilizing artificial intelligence (AI) technology, and to propose methods for contributing to energy efficiency through this model. Considering the operational data characteristics of the ROT equipment, an autoencoder (AE), capable of detecting anomalies using only normal data, was selected as the base model. Furthermore, Long Short-Term Memory (LSTM) networks were chosen to address the time-series nature of the data. By integrating the technical advantages of these two algorithms, a predictive maintenance model based on the LSTM-AE algorithm, named the Run-Out Table Predictive Maintenance Model (ROT-PMM), was developed. Additionally, the concept of an anomaly ratio was applied to identify equipment anomalies for each coil production. The performance evaluation of the ROT-PMM demonstrated an F1-score of 91%. This study differentiates itself by developing an optimized model that considers the specific environment and large-scale equipment operation of steel plants, and by enhancing its applicability through performance verification using actual failure data. Furthermore, it emphasizes the importance of PdM strategies in contributing to energy efficiency. It is expected that this research will contribute to increased energy efficiency and productivity in industrial settings, including the steel industry.

Keywords:

predictive maintenance (PdM); steel industry; hot rolling mill; Run-Out Table (ROT); anomaly detection; LSTM-AE; Run-Out Table Predictive Maintenance Model (ROT-PMM)

1. Introduction

1.1. Background of Study

1.1.1. Introduction to Maintenance and Energy Efficiency in Steel Manufacturing

The steel industry is a process-intensive sector that relies on large-scale equipment. Malfunctions or failures in production equipment can halt operations, causing lost production opportunities and quality defects. As a result, facility management is imperative for the steel industry. Facility management refers to all activities designed to maintain, repair, and oversee equipment effectively, as well as to prevent failures, thereby enhancing product quality and productivity [1]. Facility management has evolved alongside advances in industrial technology and shifting site requirements, among which predictive maintenance (PdM) has gained increasing significance in the era of the Fourth Industrial Revolution [2].

PdM refers to a maintenance approach that employs predictive tools, such as artificial intelligence (AI), statistical inference, and engineering techniques, to detect incipient faults from historical data and service equipment at the optimal moment [3]. By detecting abnormal signs of equipment faults in advance and taking appropriate action, PdM establishes a more stable operating environment while eliminating safety risks associated with unplanned maintenance. PdM is closely linked to energy efficiency [4].

Equipment operates most efficiently when it remains in optimal condition. PdM ensures equipment functions within its design parameters, such as temperature, speed, and pressure. For example, a motor with worn bearings requires more energy to deliver the same output as one with healthy bearings. PdM reduces energy waste by identifying issues, such as friction, leaks, overheating, or component wear, before they lead to inefficiencies. Well-maintained equipment demands less energy during normal operation and minimizes unexpected failures that could result in inefficient emergency interventions. In addition, PdM simultaneously prevents under-maintenance (leading to inefficiencies) and over-maintenance (wasting resources on unnecessary tasks). By scheduling maintenance only, when necessary, PdM optimizes the utilization of energy and other resources required for maintenance activities.

The adoption of PdM in steelmaking and hot rolling processes at steel plants is emerging as an essential strategy for improving energy efficiency and optimizing operations. Unexpected equipment failures cause production downtime and can lead to additional energy consumption during the recommissioning of equipment. Therefore, implementing PdM plays a crucial role in optimizing energy consumption in manufacturing processes and enhancing operational efficiency.

In hot rolling processes, reheating furnaces account for approximately 60–70% of total energy consumption, making them one of the primary energy-consuming facilities. Zanoli et al. reported that the implementation of an integrated advanced process control (APC) system capable of managing all operating conditions across various types of reheating furnaces (e.g., walking beam, pusher type) reduced fuel consumption by 2–6% and enabled the real-time control of multiple furnace systems [5]. This indicates that improving equipment reliability can be directly linked to a reduction in unnecessary energy losses.

The steelmaking process often suffers from energy inefficiencies due to harsh operating environments and unexpected equipment downtimes. It was emphasized that implementing the InsiteAI PdM solution, which enables the real-time monitoring of critical systems, such as rolling mills and cooling lines, can extend equipment lifespan, reduce maintenance costs, and achieve sustainable and efficient production [6]. This demonstrates that PdM can ultimately reduce energy costs and enhance operational efficiency.

Research on the development of PdM systems incorporating AI has also been actively conducted. In the hot rolling process of stainless-steel manufacturing, the SiMoDiM project aimed to develop a PdM system by analyzing a vast amount of historical and real-time data collected from sensors installed in the Steckel mill. This project also conducted the correlation between the deployment of intelligent systems capable of implementing PdM and the optimization of energy efficiency [7]. These advanced predictive techniques can be utilized to enhance energy and operational efficiency in industrial environments.

PdM also contributes to extending machinery service life, thereby saving the energy embodied in manufacturing replacement parts. Consequently, PdM can play a crucial role in enhancing sustainability by reducing energy costs, optimizing resources, and mitigating environmental impact. In this context, there is a pressing need to develop diagnostic technologies capable of preempting failures in complex systems characterized by numerous repetitive faults and extensive equipment portfolios that are challenging to manage.

1.1.2. Hot Rolling Plant Process and Run-Out Table (ROT) Failure

The manufacturing process of steel products is broadly divided into ironmaking, steelmaking, continuous casting, and rolling [8]. Ironmaking involves charging iron ore and coking coal into a blast furnace approximately 100 m high, then blowing hot air of about 1200 °C to generate heat from the combustion of coal, which melts the ore into molten iron. Steelmaking involves transferring molten iron from the blast furnace into a converter, where pure oxygen is blown in to remove impurities, such as carbon, phosphorus, and sulfur, producing molten steel. Continuous casting is the process of pouring liquid steel into a mold, then cooling and solidifying it as it passes through a continuous casting machine to produce intermediate products, such as slabs, blooms, and billets. Rolling is the process of passing material between multiple rolls to apply continuous force that elongates or thins it. It is classified into hot rolling, cold rolling, and plating. Hot rolling is the process of reheating slabs produced from continuous casting above the recrystallization temperature and rolling them to produce products with appropriate width and thickness. Cold rolling and plating are processes performed at room temperature. Cold rolling involves re-rolling and annealing hot-rolled steel sheets to produce cold-rolled products for use in automobiles, furniture, and other applications. Plating is a process that coats the surface of steel sheets with zinc to improve corrosion and contamination resistance [9].

In particular, the hot strip mill is a process in which slabs from the preceding continuous casting process are heated above the recrystallization temperature, rolled at the same temperature using a rolling mill, and then cooled below it to produce coil-shaped products that comply with client specifications in terms of dimensions, shape, and mechanical properties. The high-temperature material is rolled under a heavy load (5000 tons), transported at a high speed (approximately 1500 mpm), and directly cooled with water, which creates a hostile operational condition for equipment due to impact during material transfer and the corrosive effects of water and moisture vapor. The hot rolling process consists of several main facilities, including the reheating furnace, roughing mill (RM), finishing mill (FM), cooling table (Run-Out Table), and down coiler (DC). The reheating furnace heats slabs produced by continuous casting to a temperature range of 1100–1300 °C to ensure rollability for hot rolling. The roughing mill is an intermediate stage where slabs from the reheating furnace are processed into bars suitable for finishing. During this process, descaling and width reduction are performed to remove surface scale and adjust the width and thickness according to order requirements. The finishing mill reduces the material to the final thickness specified by the client while also controlling metallurgical properties, surface quality, and shape through appropriate rolling temperatures for each application. After finishing, the strip is cooled on the cooling table to a target winding temperature that ensures optimal mechanical performance. Coiling is the process of winding the hot-rolled strip into a coil [10]. Company P (POSCO, Pohang, Korea) operates six hot rolling mills with a combined production capacity of approximately 28 M tons/year.

The Run-Out Table (ROT) in hot rolling is a facility that cools the strip after it has been rolled in the FM and conveys it to the DC. At the top of the ROT system, a laminar bank sprays cold water onto the strip for cooling. At the lower section, the ROT transfers the strip to the DC. During this conveyance, cooling water is sprayed onto the strip to achieve the coiling temperature and impart the required mechanical properties. Over 300 rollers are arranged along a length of about 130 m within the ROT system. Specifically, each roller consists of a motor that drives it, a coupling that transmits torque from the motor, and two bearings that support rotation. Figure 1 illustrates the configuration of the ROT facility.

When production stops due to a failure in the ROT system, fixed processing costs continue to accrue, resulting in economic losses. Therefore, equipment faults should be minimized to prevent direct financial damage. However, ROT equipment is vulnerable to faults because it is directly exposed to material impact, high temperatures, and cooling water. Company P reported 74 ROT system failures in its four hot rolling plants during the period from January 2014 to December 2021. These failures resulted in a total of 94.7 h of production downtime due to corrective maintenance. On an annual basis, this corresponds to nine ROT system breakdown cases and 11.8 h of downtime per year.

The most frequent causes of ROT system failures were roller bearing fracture, motor insulation failure, and coupling failure, in that order. Roller bearing fracture occurred four times per year, causing six hours of downtime. Motor insulation failures occurred three times per year, leading to three hours of downtime. Each year, one case of motor-to-roller coupling failure was recorded, resulting in about one hour of downtime. Additionally, failures in indoor-installed power and control systems, which are not field-mounted, occurred once annually and caused about two hours of downtime [11]. Table 1 summarizes the status of ROT equipment fault at Company P.

Each equipment failure results in a loss of production opportunity, and the time required to restore the facility to normal operation directly causes additional production loss. Although the equipment has a relatively simple configuration and the causes of failure are generally clear, the main reason for continued failures appears to be the lack of a facility management method capable of effectively diagnosing and evaluating the equipment condition, despite the large number of rollers and part-level monitoring points. Various technologies have been applied in the field to detect equipment anomalies in advance, but the implementation of ROT PdM has not yet been achieved. The authors will examine this in detail in the following section.

1.2. Problem Statement and Research Objectives

The ROT system, which is directly related to production and quality, always keeps all 300 rollers in normal condition, as a failure in even a single roller can affect both production and product quality. Therefore, PdM is essential for this system. Until now, vibration measurement, acoustic diagnosis, and motor current monitoring have been applied to diagnose equipment condition and implement PdM, but these efforts have not led to successful realization. First, vibration sensors are attached to motors and roll bearings to measure velocity and acceleration, representing a typical condition monitoring system (CMS) method [12]. Vibration measurement offers broad applicability, as it allows monitoring across all points and is codified in international standards, such as ISO 10836 [13]. However, due to the characteristics of the ROT environment—such as strip impact and coolant exposure—data acquisition is unstable, leading to low sensor reliability and difficulty in maintaining the monitoring equipment. Next, acoustic sensors are being tested at selected points, but they are more costly than vibration sensors and lack clearly defined criteria for distinguishing between normal and abnormal conditions in the ROT system. Lastly, motor current data can be reliably collected from the drive panel located in the electrical room, rather than from the equipment operation area where vibration and moisture vapor are present. However, because identical equipment is installed in parallel, current patterns are similar across units, and frequent alarms caused by strip impact make it difficult to accurately detect anomalies in field applications. Although various technologies have been applied in the field to minimize equipment failures, they have not yet reached the level of predicting abnormal signs in advance to prevent failures.

Accordingly, Company P has adopted a preventive maintenance (PM) method to prevent failures in ROT equipment and is currently implementing both time-based maintenance (TBM) and condition-based maintenance (CBM). Table 2 presents the replacement cycles and input costs for each major component of ROT equipment.

New rollers are used for five years, after which only the bearings are replaced, allowing the rollers to be reused for two additional cycles. Motors are operated for five years, then reused twice as much after internal coil rewinding. Couplings are used for five years and then discarded without reusing. Although TBM has been used to set replacement cycles for major components, these cycles were based more on the experience of maintenance personnel than on statistical analysis. As a result, concerns remain regarding excessive maintenance costs and inadequate preparation for unexpected failures. CBM primarily relies on sensory-based inspections by operators due to the lack of a properly implemented CMS, making it difficult for them to manage all inspection tasks. In addition, inspections during equipment operation pose safety risks.

In the context of industrial manufacturing, the paradigm shift from preventive maintenance (PM) to predictive maintenance (PdM) represents a fundamental transition toward more intelligent, cost-efficient, and performance-driven asset management. PM is typically conducted at fixed intervals regardless of the actual condition of the equipment, often resulting in unnecessary downtime and component replacements, which increase maintenance costs and reduce resource efficiency [14]. It also fails to respond flexibly to real-time operational conditions, negatively affecting production continuity and equipment reliability. In contrast, PdM utilizes real-time data, machine learning algorithms, and anomaly detection models to predict failures in advance and enable condition-based maintenance, significantly reducing maintenance costs and operational downtime [15]. PdM not only extends equipment lifespan through timely diagnostics but also enhances energy efficiency by ensuring that machinery operates within optimal performance ranges. Empirical evidence from industrial applications consistently demonstrates that PdM significantly reduces failure rates, downtime, and spare parts consumption. Therefore, considering the limitations of PM and the strategic advantages of PdM, adopting PdM as a core maintenance strategy in AI-driven smart manufacturing environments is justified.

Therefore, this study was conducted to explore a method for implementing PdM for ROT systems using advanced intelligent technologies. To this end, the study proposes a model based on AI that more accurately predicts anomalies in ROT using operational data from Company P. Considering the absence of abnormal labels and the time-series nature of the ROT operational data, this study employed long short-term memory and autoencoders (LSTM-AE), a reliable deep anomaly detection technique. Based on this approach, the ROT predictive maintenance model (ROT-PMM) was developed as a fault prediction model for ROT equipment.

The ROT-PMM developed in this study was applied directly to the field rather than being limited to a proof-of-concept (PoC). For this purpose, field data were used to optimize hyper-parameters, validate prediction accuracy, and refine threshold criteria through an in-depth analysis. A dedicated server for the AI model was also installed and networked with the existing data acquisition system, enabling the on-site operation of ROT-PMM and the delivery of anomaly prediction results to the engineers. By considering production conditions and current maintenance practices, this study demonstrates the economic benefits of improving ROT productivity and extending replacement cycles, distinguishing it from previous research.

1.3. Overall Research Process

The overall workflow of this study is outlined as follows. Section 3 describes the data collection and preprocessing procedures used for model development. The analyzed data are the current data of the motor that transmit torque to the ROT drive equipment, and preprocessing was conducted to facilitate model development. During preprocessing, irrelevant features were eliminated, and outliers and missing values were identified and addressed to shape the dataset for effective model training. In Section 4, the authors reviewed the techniques required for model development and adopted LSTM-AE, which integrates the advantages of LSTM and AE. Based on this, ROT-PPM, a ROT drive facility failure prediction model, is developed. For model training, the dataset was divided into training data and validation data. Subsequently, the hyper-parameters were tuned to identify the optimal fault prediction model. In Section 5, the authors evaluate the performance of the fault prediction model selected in Section 4. Test data from actual fault cases in ROT equipment were used, and model performance was evaluated by calculating precision, recall, accuracy, and F1-score using a confusion matrix. In Section 6, the authors discuss whether the ROT-PMM developed in this study can predict actual anomalies in advance and elaborate on the system implementation methodology. Finally, Section 7 assesses the quantitative impacts achieved through the implementation of ROT-PMM. For this purpose, the analysis incorporated opportunity loss due to downtime and the cost structure of TBM. In other words, by predicting potential failures in the ROT drive system and responding proactively, the model demonstrated economic benefits through reduced production loss and lowered maintenance costs via periodic replacements. Figure 2 illustrates the overall workflow of this study.

2. Literature Review

Facility management can be classified into post, preventive, and PdM [16]. Post-maintenance is conducted after a failure has occurred. PM aims to prevent failures before they occur, and there are two methods: TBM based on time and CBM based on facility condition. PdM is proposed as an innovative maintenance paradigm that performs maintenance only after analytical models predict equipment faults or degradation, along with technological advances, such as smart manufacturing, Internet of Things (IoT), data mining, and AI.

Although PM and PdM have the same goal of preventing equipment faults, they are different from each other. This study analyzed the limitations of PM and PdM and reviewed the research trends of PdM and the applications of this study. Furthermore, the authors examined the trend of ROT equipment research and the need for this study through a review of prior research on ROT equipment.

2.1. Studies on Preventive Maintenance Optimization

PM is classified into TBM and CBM. TBM is a method of maintenance based on a set time schedule. TBM has been mainly researched for its methodology to determine the timing of maintenance considering faults and economics. Wang et al. proposed a replacement scheduling methodology that builds and evaluates a hierarchical structure to prevent equipment performance degradation and maximize profit [17]. To accomplish this, they identified key components considering their replaceability, failure consequences, and life span, and prioritized their replacement based on reliability and economic evaluation. To prevent equipment faults, Satow et al. suggested a mathematical cumulative damage model that checks the cumulative status of the impact value applied to the equipment over time and replaces it when the threshold level is exceeded [18]. Chan and Asgarpoor proposed a method to calculate the probability of equipment states and determine the optimal maintenance interval based on a Markov decision process to maximize the use of the equipment [19]. Crowder and Lawless studied the general principles and methodology of economic maintenance based on the condition that the equipment cannot operate normally when the wear level reaches a certain level [20]. They performed complex statistical modeling to evaluate the economic replacement point according to the wear life distribution of the component, but they also stated that sufficient data, such as life and wear rate, must be accumulated. Panagiotidou and Tagaras presented an economic model for optimizing preventive maintenance in a production process [21]. For this purpose, they presented a mathematical methodology to derive the optimal solution of when maintenance should be performed by considering the failure rate, profit, and failure probability.

Next, CBM is a maintenance method that supports maintenance decisions based on information collected through equipment condition monitoring. Representative CBM techniques include vibration, acoustic, temperature, and lubricant condition monitoring. Jayaswal et al. presented a brief overview of the technological advances in machine fault detection, and discussed machine fault diagnosis techniques, including rolling bearing fault diagnosis through vibration analysis [22]. Márquez et al. introduced studies on vibration analysis, acoustic analysis, lubrication analysis, temperature analysis, ultrasound, and stress as CMS techniques for monitoring the condition of wind turbines [23]. Bagavathiappan et al. analyzed the principle and application of non-contact infrared thermal imaging camera technology for detecting an abnormal temperature of devices [24]. Zhu et al. provided information on the latest online sensor technologies (e.g., optical, acoustic) for measuring lubricant properties (e.g., wear, viscosity, corrosion, moisture) as an effective approach to determine the health of a machine and provide early warning of the progression of machine faults [25].

Wang et al. proposed a state-based PM policy and an optimization approach using the semi-Markov decision process (SMDP) to optimize PM in balanced systems [26]. Akl et al. developed a simulation–optimization framework that integrates a discrete event simulation model, differential evolution (DE) algorithm, and k-means clustering to optimize PM scheduling [27]. Lolli et al. introduced a maintenance policy and decision support system (DSS) by comparing additive manufacturing (AM) and conventional manufacturing (CM) components for optimizing PM and spare parts management [28]. Shi et al. proposed a PM strategy optimization (PMSO) model to enhance structural reliability and minimize maintenance costs [29]. Su et al. developed an adaptive learning framework based on multi-agent reinforcement learning (MARL) to establish a cost-efficient PM policy for large-scale manufacturing systems [30]. Dui et al. proposed a PM strategy incorporating important measures to optimize the reliability and maintenance cost of industrial robotic systems [31]. Li et al. introduced a multiple degradation-driven PM (MDPM) policy to address machine reliability degradation and production rate (PR) reduction in serial-parallel and multi-station manufacturing systems (SMMS) [32]. An et al. developed a hybrid multi-objective evolutionary algorithm (HMOEA) by integrating real-time order acceptance (ROA) and condition-based PM (CBPM) to solve the adaptive flexible job-shop rescheduling problem [33]. Wu et al. proposed an availability assessment method for performance sharing systems subject to random shocks, utilizing a Markov process and universal generating function (UGF) to optimize PM [34]. Zhang et al. established a maintenance optimization strategy for the air conditioning temperature regulation subsystem, implementing medium repair, major repair, and replacement based on real-time risk level variations to assess system reliability and risk [35].

As a result of prior research on PM, in terms of TBM, the authors found that there have been studies on economical PM methodologies considering the prevention of equipment deterioration and faults and the resulting maintenance costs. From the CBM perspective, many commercialized sensor technologies that can measure specific conditions of equipment have already been developed, and there have been many studies on the principles and application cases of CBM. TBM requires regular inspections and maintenance even if they do not fail, and it may not be economically efficient due to cost issues in the event of a sudden fault. Furthermore, CBM is mainly used to measure specific conditions, which limits their ability to predict the overall condition and lifetime of equipment. Because of these limitations, it is difficult for PM to implement an optimal maintenance approach that considers both reliability and economics.

2.2. Predictive Maintenance Using AI

The PdM strategy has emerged prominently among equipment management strategies, drawing considerable attention in the industry 4.0 era [36]. PdM is characterized by its application of predictive tools to determine the appropriate timing for maintenance [37]. AI algorithms have emerged as powerful tools for PdM due to their capability to handle multivariate data and extract hidden relationships within the data in complex and dynamic environments [37]. Orru et al. developed a predictive model based on support vector machine (SVM) and multi-layer perceptron (MLP) algorithms for the early fault detection of centrifugal pumps in the oil and gas industry [38]. Hsu et al. developed a method for early fault prediction in wind turbines by detecting anomalies using statistical process control (SPC) and predicting failures with decision tree and random forest (RF) models [39]. Cheliotis et al. developed a regression-based fault detection model for ship engines, employing power, speed, and scavenging air pressure as inputs to predict exhaust gas temperature (EGT) [40]. Serradilla et al. implemented an anomaly detection system employing a 2D convolutional neural network-autoencoder (2D-CNN-AE) model to detect faults in press machines [41]. Khalid et al. proposed an ML-based optimal sensor selection approach to detect boiler tube leaks and turbine electric motor failures in thermal power plants [42]. Shi et al. proposed a fault diagnosis method integrating natural language processing (NLP) and ML to analyze fault records of signal equipment stored in the form of unstructured text data, thereby contributing to the automation of railway maintenance systems and supporting stable railway operations [43]. Oh et al. developed supervised and unsupervised learning models based on sensor data from hydraulic systems for fault detection in manufacturing equipment [44]. Ghazali et al. developed a RF-based automated twisted pair cable fault diagnosis system to replace conventional manual fault detection methods in communication network maintenance [45]. Choi et al. developed a tap temperature prediction model (TTPM) employing a support vector regression algorithm to predict tap temperature in real-time and automatically optimize power input, aiming to enhance efficiency in the electric arc furnace (EAF) process [46]. However, this study was limited as it relied solely on information regarding the type and weight of the scrap. Choi et al. developed a laser welder PdM model (LW-PMM) based on a long short-term memory-autoencoder (LSTM-AE) to automatically detect equipment faults in laser welders used in continuous galvanizing lines (CGL) [47]. Hadi et al. developed an AutoML model using PyCaret to automatically identify various types of faults in order to improve the maintenance process for rolling-element bearings (REBs) [48]. Chandu proposed a PdM framework employing ANN, RF, CART, and LR algorithms, utilizing 944 IoT sensor data points and 10 features, to provide PdM solutions for industrial manufacturers using IoT sensor data [49]. Bampoula et al. proposed a PdM approach combining LSTM-Autoencoders and a Transformer encoder to predict asset failures and estimate remaining useful life (RUL) for the metal processing industry [50]. Faizanbasha and Rizwan developed an advanced semi-Markov decision process (SMDP) model to integrate the burn-in process and PdM into a two-unit series manufacturing system (TUMS) for the first time, aiming to enhance system reliability and maintenance efficiency in manufacturing industries [51]. Chapelin et al. proposed a data-driven drift detection and diagnostics framework employing novelty detection, ensemble learning, and continuous learning to support PdM for heterogeneous processes [52]. Zhang et al. developed a multi-channel MLP model without convolution architectures to perform accurate PdM using high-dimensional key performance indicator (KPI) sequences in cloud-edge service systems, demonstrating effective PdM capabilities without CNN [53].

2.3. Recent Studies on Fault Prognosis

Guo et al. developed a hybrid fault-prognosis method that combines a non-linear Wiener process with a health indicator (HI) constructed for predicting the remaining useful life (RUL) of rolling bearings, thereby enhancing the reliability of rotating systems [54]. Yang et al. proposed a mission reliability-centered opportunistic-maintenance optimization model for multi-state manufacturing systems, optimizing maintenance scheduling and resource allocation [55]. Huai et al. introduced a weighted moving average-based intermittent fault (IF) detection algorithm for fault prognosis in linear stochastic systems subject to IFs and strong noise, extracting HIs and enabling RUL prediction based on those indicators [56]. Liu et al. proposed a task-free continual learning-based online fault prognosis framework for real-time RUL prediction in dynamic industrial environments [57]. Kim et al. developed a deep learning-based prognostics and health management (PHM) model for enhancing the reliability of safety valves in extreme environments in oil and gas plants, enabling real-time anomaly detection and contributing to the prevention of system failures [58]. Wang et al. proposed a deep learning-based prediction framework for bearing remaining useful life (RUL) prediction in intelligent manufacturing, which efficiently represents degradation states and resolves temporal dependency issues to ensure high reliability and optimize rotating machinery performance [59].

2.4. Research Trends for Run-Out Tables

Several studies have been conducted to investigate the ROT of the hot rolling process, which is the subject of this study. Li et al. developed a real-time cooling temperature monitoring system by combining a finite-difference model based on Fourier’s heat equation with a soft-sensing technique to enhance the control accuracy of ROT cooling temperature [60]. Sugihara et al. proposed a laminar stability index to quantitatively evaluate the stability of laminar flow in the ROT cooling process, ensuring the stability of water flow from the nozzles [61]. This contributed to uniform cooling and improved steel plate quality. Woo et al. proposed an air jet impingement system to reduce strip wave, which occurs before the upper part of the steel strip reaches the coiler mandrel in ROT [62]. Aoe et al. proposed a method to determine the ROT critical stable threading velocity by applying a numerical analysis technique based on multi-body dynamics (MBDs) to address the running instability of steel strips on the ROT in the hot rolling process [63]. Luk’yanov et al. proposed an optimization strategy for controlling the ROT drive system by developing an electric drive torque calculation model to minimize the velocity difference between the strip and the rollers, thereby reducing roller wear and maintenance costs [64]. Tatebe et al. proposed the inverse solution of the heat conduction equation to evaluate the surface heat flux of moving high-temperature steel strips during ROT cooling, enabling precise temperature control [65]. They numerically calculated the amount of heat removal in the water jet impingement zone. Wu et al. developed mathematical models based on water flow central crown and center wave compensation control to control residual stress occurring during the ROT cooling process, contributing to the improvement of flatness after rolling is finished [66]. Yazdani et al. developed a 2D finite element model (FEM) to perform inverse heat conduction (IHC) in a limited computational domain to optimize the ROT cooling pattern [67]. Jena et al. developed a strip buckling model using the Hamiltonian system of symplectic space to mathematically analyze the effects of residual stresses occurring in the ROT cooling stages of hot-rolled strips on flatness, thereby estimating edge wave defects [68].

Existing studies on ROT have primarily focused on process improvements for equipment, while cases directly applying the concept of PdM for fault prediction and preventive maintenance are still uncommon. Implementing PdM in ROT systems requires an ML-based predictive model capable of detecting anomalies—such as cooling performance degradation, roller wear, and strip wave formation—in advance and optimizing maintenance schedules. Future research should expand towards utilizing AI technology to enhance anomaly detection and fault prediction for ROT equipment. Table 3 presents the overview of previous studies by category.

2.5. Limitations of Previous Studies

Based on the results of the previous studies, the authors found that as AI technology has recently advanced, research has been actively conducted in the field of facility management to realize economical PdM. The key characteristic observed in previous studies is that most research focuses on developing AI models for the early detection of anomalies in equipment for PdM. These studies typically utilize operational data, select appropriate AI algorithms to develop models, and evaluate their performance. Additionally, the authors reviewed studies on rotating machinery, such as pumps, motors, and turbines, which share similarities with ROT-driven equipment. However, research related to hot rolling ROT equipment has primarily focused on production and quality rather than integrating AI techniques for fault prediction. Furthermore, the application of AI-based PdM technologies in real industrial settings for hot rolling ROT equipment remains limited. Another notable observation is that AI-based technologies in existing studies are predominantly applied to single equipment units or fault detection of specific components. There is a lack of research on comprehensive monitoring and fault prediction for large-scale, interconnected systems, like hot rolling ROT equipment, where multiple (more than 300 units) complex machines operate simultaneously in an integrated manner.

Consequently, it is necessary to develop a PdM technology that reflects the characteristics of hot rolling ROT equipment and can be practically applied in industrial settings. Based on the findings from the literature review, this study developed a PdM model for ROT equipment using an LSTM-AE algorithm. Given that the operational data of ROT equipment is time-series data and lacks labeled abnormal data for supervised learning, the study employed an unsupervised learning approach, leveraging the strong performance of LSTM-AE for anomaly detection.

3. Data Preparation

In this section, the authors describe how operational data were acquired and processed for the development of a predictive model for ROT fault prediction model. Preprocessing, such as selecting fault-related data and removing outliers and missing values, is a crucial step for improving model training and performance.

3.1. Data Collection

Initial data collection was performed on the ROT equipment of Company P’s C hot rolling facility to conduct this study. To implement PdM, real-time sensor data from the ROT equipment were collected and centrally managed through the Iba platform developed by Company P. Iba is a system that integrates and stores production and equipment operation data, supporting both macro and micro-level analysis and the development of AI models [69]. Figure 3 illustrates the overall process from data collection to ROT-PMM model development using the Iba platform.

Process data collected from the production site were stored in the Programmable Logic Controller (PLC), while sensor data were saved in the data acquisition (DAQ) server and then transferred to the manufacturing execution system (MES) of the Iba platform for long-term storage. The data stored in the MES undergo preprocessing, such as handling missing values and removing outliers, after which model development, training, and deployment are performed through the Iba platform. Sensor and process data from the ROT equipment at the production site are transmitted to the DAQ system and stored in a centralized database. The data were collected at 100 ms intervals by the DAQ system, with one data point generated every 0.1 s. For the development of the PdM system, equipment operation data were collected over 31 days from 20 March to 19 April 2023. In this study, 3660 data points were sampled per second. Based on a daily duration of 86,400 s, the total daily data volume was calculated as 3660 × 86,400 = 316,224,000 data points. Over 31 days of continuous collection, a total of 9,802,944,000 data points were accumulated. The data volume amounted to 70 MB per hour, 1.7 GB per day, and approximately 40.3 GB over one month, indicating a substantial dataset. Table 4 presents the data acquisition configuration and the resultant data volumes for the ROT equipment.

Table 5 outlines the structure, count, and data types (set or process values) associated with ROT time, production, and operation data. The ROT operation data are classified into three main categories and consist of 16 variables encompassing 431 items.

As shown in Table 5, the ROT operation data can be categorized into time, production, and equipment operation information. The time information includes two types, such as date and time. The production information consists of five types, including coil number, steel grade, thickness, width, and weight. The operation information contains nine types, which include motor current, ROT speed, and FM speed. A total of 431 data items across 16 types were collected, resulting in 9,802,944,000 data points.

Chronologically ordered data are referred to as time-series data, which play a key role in data analysis and algorithm selection. In addition, the production information includes detailed attributes, such as coil number, steel grade, and thickness, providing a comprehensive view of production conditions. In addition to operation parameters, such as FM and ROT speeds, current data from 366 motors are acquired in the equipment operation dataset. This enables the acquisition of real-time load information for each motor.

This study conducted high-frequency sampling at 100 ms intervals (3660 times per second) on 366 ROT roller drive motors to perform real-time analysis of current data. Since the anomaly status of each of the 366 motors must be assessed during each coil winding process, a large volume of time-series data is generated over a 120–150 s duration per winding cycle. To minimize latency in processing high-frequency data, the preprocessing and csv file generation were implemented in C language to enhance storage and access efficiency, while the AI-based anomaly detection and prediction model was executed in a Python 3.7 environment.

Figure 4 shows the csv format of ROT operation data collected through Company P’s DAQ system. This csv file contains the same data items as those listed in Table 4.

The ROT system in the hot rolling process of steel mills operates under harsh industrial conditions, such as high temperatures, moisture, and mechanical impact. Therefore, selecting an algorithm that carefully accounts for data characteristics is essential when constructing a PdM model. The ROT system studied in this research consists of 366 motors and operates as a high-frequency parallel system that collects current data from each motor at 100 ms intervals. The collected ROT operational data consist of high-frequency time-series data sampled at 100 ms intervals, including current signals from all 366 individual motors, resulting in a large-scale dataset. Moreover, due to the absence of labeled data for abnormal conditions, anomaly detection methods based on unsupervised learning are required rather than supervised approaches. The time-series structures of the data, the lack of labels, and the need for robustness against real-world noise impose technical constraints on the selection of applicable algorithms. Table 6 summarizes the alignment between the structural characteristics of the ROT data from the hot rolling process and the requirements for algorithm selection.

Given the characteristics of the operational data collected from the ROT, the selected algorithm must be capable of learning time-series structures, supporting parallel scalability, ensuring robustness to noise, and performing label-free anomaly detection.

This study developed a predictive model based on motor current data, but single-variable analysis is insufficient to comprehensively capture diverse fault patterns. Therefore, future studies should incorporate multivariate data analysis to improve fault prediction accuracy by considering the correlations among multiple signals.

However, the ROT system investigated in this study faced environmental limitations due to high temperatures, mechanical impact, and moisture exposure in the hot rolling process, which made it difficult to install vibration and temperature sensors. In future studies, researchers should collect additional multivariate variables closely related to equipment failures, including vibration, temperature, rotational speed, and bearing temperature, along with current data, in order to develop a model that can more precisely reflect the unique characteristics of each fault type.

3.2. Data Preprocessing

Model performance significantly influenced by the quality of the data used in analysis, and excessively large datasets can degrade analytical efficiency. Therefore, preprocessing is a critical process [70]. For fault prediction model development, the current data from 366 motors, collected as time-series data, were selected for analysis. Motor current is known to have a strong correlation with faults, such as bearing defects and motor anomalies [71]. When anomalies, such as bearing failures or coupling detachment, occur during ROT roller operation, they directly affect motor current load. As overcurrent or undercurrent may occur under such conditions, it is expected that detecting early anomalies in motor current data can enable the prediction of ROT system failures. Table 7 summarizes the information that motors can provide during ROT anomalies, as identified by experts with domain knowledge in motor systems. Mechanically, overcurrent occurs when a roller bearing is fractured, and undercurrent arises when a coupling fails due to load disconnection. Additionally, abnormalities in the base bolt can cause vibration; bearing faults in motors lead to overcurrent, and insulation issues in the motor or cable reduce insulation resistance.

The real-time operational data of the ROT equipment stored in the MES requires preprocessing before it can be used for model training. This study utilized large-scale raw data, which were preprocessed using the Iba system developed by Company P. A total of 9,802,944,000 motor current data points from the ROT equipment were collected over 31 days at a sampling rate of 3660 Hz, and four preprocessing steps were applied for model training.

First, missing and negative values were eliminated from the dataset. The elimination of missing values is a crucial step in handling incomplete data, as it enhances data integrity and improves the performance of machine learning models [72]. Null and negative values caused by communication errors unrelated to equipment faults were entirely excluded. As a result, 1,960,589 data points, accounting for 0.02% of the total, were eliminated.

Second, failure buffer intervals were excluded. Failure buffer interval removal refers to the process of excluding data collected within a specific time window before and after true fault events, as these data may be distorted by early symptoms or maintenance activities yet still labeled as normal [73]. In this study, a ±1-h interval surrounding the coupling failure event on 6 January 2023, was excluded from the training data, resulting in the removal of an additional 26,352,000 data points (0.27%).

Third, the head/tail shock intervals were eliminated. Elimination of head/tail shock intervals refers to excluding transient signals occurring immediately after equipment startup (head) or shutdown (tail), thereby ensuring that the model learns only from stable operating conditions [74]. In this study, since the collection of motor current data was deemed stable, all data were used to train the model on normal operating conditions. However, during untensioned segments of operation where the strip is not engaged with either the finishing mill or the down coiler, the head and tail ends can cause abrupt fluctuations in motor load. These fluctuations may degrade model performance, so the corresponding data were removed. In this study, a 10 s interval for each coil corresponding to the strip’s head and tail sections between the FM and DC was excluded. Assuming an average of 700 coils per day over 31 days, 79,422,000 data points (0.81%) corresponding to these intervals were removed.

Fourth, outliers in the motor current data were eliminated. Outliers refer to values that deviate significantly from the typical range of the collected data, either excessively high or low. Such data often result from process instability, measurement errors, or data contamination, and must be addressed or removed before analysis to prevent model distortion [75]. Based on the statistical characteristics of the data, outliers are typically detected using the z-score method for normally distributed data, and the interquartile range (IQR) method for non-normally distributed data [76]. During the IQR-based removal stage, values below (Q1) − 1.5 × IQR or above (Q3) + 1.5 × IQR for each variable were treated as statistical outliers and excluded. As a result of statistical outlier removal, an additional 68,620,608 data points (0.70%) were excluded.

Combining all four preprocessing steps, a total of 176,355,197 data points (1.80%) were eliminated, leaving 9,626,588,803 refined data points (98.20%) for model training.

The refined data obtained were subsequently scaled. Data scaling is essential because differences in value ranges can cause the model to converge to zero or diverge to infinity during training. Therefore, data scaling adjusts and normalizes all data distributions and ranges, ultimately improving model accuracy [77]. In this study, StandardScaler was applied to normalize the distribution of all variables to a mean of 0 and a variance of 1, ensuring numerical stability. The dataset was randomly split in an 80:20 ratio, with 7,701,271,042 data points used for training and 1,925,317,761 for testing. Table 8 presents the number and proportion of motor current data points eliminated at each preprocessing step.

4. Modeling and Training for Run-Out Table

To develop a fault prediction model for ROT equipment, it was necessary to select a reliable AI algorithm suitable for the characteristics of the equipment operation data to be analyzed. After the algorithm is selected, the performance of the predictive model can be achieved through hyper-parameter review and model training. This section describes this process in detail.

4.1. Model Selection

Before modeling, the authors reviewed the characteristics of the collected ROT operation data and algorithms aligned with the study’s goal of early fault prediction. Accordingly, the authors adopted an AI-based PdM strategy for early anomaly detection in equipment operations [78]. In the anomaly detection method, the availability of labels for normal and abnormal data is critical in selecting an appropriate AI technique for model development. Therefore, anomaly detection methods are generally categorized into supervised, semi-supervised, and unsupervised approaches based on the presence or absence of labels [79].

Supervised anomaly detection refers to methods that apply supervised learning, as both normal and abnormal data with corresponding labels are available in the training dataset. Supervised learning is known for its high accuracy, and its performance improves as more labeled abnormal data becomes available. However, a major drawback is that abnormal data are extremely scarce in typical industrial environments, making their acquisition resource intensive.

Semi-supervised anomaly detection is designed to address the data imbalance issue in supervised methods, particularly when there are insufficient abnormal data. This approach establishes a boundary for normal data and classifies data outside that boundary as anomalous. It sets the boundary of normal data and considers data outside the boundary as abnormal. While it has the advantage of being trainable using only normal data, it suffers from reduced accuracy in distinguishing between normal and abnormal instances, which limits its practical applicability.

Finally, unsupervised anomaly detection is useful in situations where abnormal labels are unavailable, as it enables model training using only normal data without the need to acquire abnormal data. In stabilized industrial environments, where most of the available data represent normal conditions, the demand for unsupervised methods is increasing, and related research is actively ongoing. Table 9 presents algorithms associated with the anomaly detection methods based on the availability of labels described above.

Since there are no available labels for model training in the case of ROT operation data, the authors focused on evaluating the applicability of unsupervised anomaly detection techniques. Unsupervised anomaly detection encompasses a variety of techniques, including AE, LSTM, RNN, GAN, and AAE. Among these, the authors examined the characteristics of representative techniques. First, autoencoder (AE) is a type of artificial neural network that uses unlabeled training data to learn an encoding process that compresses input data into latent representations, and a decoding process that reconstructs the compressed data to match or approximate the original input [80].

RNN is a type of neural network used to analyze time-series data. It is used to develop a model based on previously trained sequence data and perform predictions on new input sequence data. Anomalies are detected based on the prediction results.

LSTM is an improved form of RNN that addresses the limitations of learning long-term dependencies by introducing gate and cell structures [81].

These individual algorithms have been further developed into hybrid forms by combining their respective strengths to enhance model performance. Figure 5 illustrates various types of AE-based anomaly detection algorithms used for fault detection.

The high-frequency (100 ms) time-series data collected from the ROT system are composed of over 99% normal operating states, making fault labels virtually unavailable. In addition, the simultaneous operation of 366 motors necessitates large-scale parallel computation, and the data are acquired in a harsh industrial environment characterized by frequent exposure to moisture, mechanical impact, and electrical noise. These characteristics demand algorithms capable of learning long-term temporal dependencies, performing unsupervised anomaly detection, exhibiting robustness to noise, and supporting real-time inference. Table 10 provides a comparative evaluation of major algorithms based on the structural characteristics of ROT data and the practical constraints of industrial environments.

Conventional autoencoders (AEs) support unsupervised learning but are limited in their ability to capture temporal dependencies in time-series data. RNNs can learn sequential information but are constrained by vanishing gradient issues when learning long-term dependencies, limiting their effectiveness. Gated recurrent units (GRUs) have a simpler gating structure and faster training speed than LSTMs, but they exhibit lower prediction accuracy when applied to large-scale and high-dimensional data [82]. Generative models, such as variational autoencoders (VAEs) and generative adversarial networks (GANs), may be useful for probabilistic anomaly detection, but their convergence stability and limited real-time applicability make them unsuitable for industrial environments [83,84]. Supervised learning models, such as support vector machines (SVMs), random forest, and XGBoost offer strong predictive performance when labeled data are available, but are difficult to apply in cases like this study where fault labels are unavailable [85].

In contrast, the long short-term memory autoencoder (LSTM-AE) is a hybrid architecture that integrates the time-series learning capability of LSTM with the reconstruction-based anomaly detection framework of autoencoders. It is capable of learning patterns from normal operational data and identifying anomalies based on deviations from these learned patterns. The LSTM-AE preserves long-term dependencies through the gated recurrent structure of LSTM and quantifies anomalies in unlabeled data using reconstruction error generated by the autoencoder. This architecture offers higher anomaly detection sensitivity and lower false positive rates compared to simple sequence prediction models, such as RNN-AE or GRU-AE, and demonstrates superior training stability and field applicability relative to generative models, like GAN-AE, or probabilistic models, like VAE. Therefore, LSTM-AE is considered the most suitable algorithm for developing predictive maintenance models that utilize high-frequency, unlabeled, time-series equipment data. It meets the structural requirements of ROT system data and the practical constraints of industrial settings, while offering notable performance in both anomaly detection accuracy and real-world applicability.

4.2. ROT-PMM

Based on the LSTM-AE algorithm selected for the fault prediction model, the authors developed the ROT-PMM to predict anomalies in the ROT equipment. A brief overview of the LSTM-AE architecture used to develop the ROT-PMM is as follows: (1) input data at each time step is received and compressed through an encoding process. The time series data pass through the LSTM layer and are compressed into a vector format that can be projected into a latent space. Next, (2) the compressed input data are reconstructed through decoding using LSTM layers to generate output data. (3) The deviation between the input data and the restored output data is referred to as the reconstruction error (R.E). The loss function measures the difference between the actual input and the data reconstructed by the decoder of the LSTM-AE model, and the model undergoes training to minimize the defined loss function. The R.E. value is a factor that computes the loss function, and this study adopted mean squared error (MSE) as the loss metric. MSE calculates the average of the squared differences between actual and predicted values. By squaring the deviations, it increases sensitivity to large outliers, making it effective for anomaly detection [86]. Figure 6 presents the basic structure of the LSTM-AE described above.

The ROT-PMM was developed using LSTM-AE architecture, and its operation consists of two stages: model training and model test. During the training stage, the LSTM-AE model is trained using only preprocessed normal data, with its hyper-parameters selected accordingly. The model receives actual facility operation data as input and learns to distinguish between normal and abnormal equipment conditions. In the testing stage, the model calculates the MSE for each coil production (test data) using new facility operation data and determines the proportion of time points that exceed the maximum MSE observed in the training data. If this proportion exceeds a predefined threshold, the instance is identified as an equipment anomaly. Figure 7 illustrates the overall operational mechanism of the ROT-PMM described above.

4.3. Model Training and Fine Tuning

For model training, the authors collected equipment operation data from 20 March to 19 April 2023. Through preprocessing, the current data from 366 motors were selected for training. Then, the dataset was separated into training and validation sets in an 8:2 ratio. This dataset consists of normal ROT operation data used for training. For model testing, fault data from a single motor, MOT0901 (Youshin Electric Industry, Siheung-si, Republic of Korea), recorded on 6 January 2023, were collected and preprocessed using the same procedure to generate test data for hyper-parameter fine tuning.

To optimize the performance of the ROT-PMM, it is necessary to analyze and select appropriate hyper-parameters for model training. A hyper-parameter is a parameter manually set by the user during modeling, and it is distinguished from parameters determined by the model or data. Hyper-parameters directly affect the model’s performance [87]. Table 11 lists the hyper-parameters configured for the final implementation of the ROT-PMM.

The main hyper-parameters include window size, hidden layer, unit, epoch, optimizer, loss function, batch size, and train/validation split. Window size, hidden layer, and unit are hyper-parameters associated with the characteristics of the LSTM-AE. The window size, also referred to as the time step, is a parameter that incorporates the temporal characteristics of time series data into the model and is crucial as it significantly affects prediction performance [88]. The hidden layer, positioned between the input and output layers, serves as an intermediate computation layer that reconstructs or compresses the input data. The unit refers to an LSTM cell, which is the fundamental building block of each layer [89]. The optimizer is an algorithm designed to find model parameters that minimize the value of the loss function [90]. The loss function quantifies the difference between the model’s predicted output and the actual target value [85]. The train/validation split separates the dataset to prevent overfitting and validate model performance [91]. An epoch refers to one complete iteration over the entire training dataset, while batch size denotes the number of samples in each mini batch used during training [92]. In this study, tests were conducted to determine the optimal values for window size, hidden layers, and units, while the other hyper-parameters were set to commonly accepted defaults.

First, to determine the window size, the model was trained using window sizes of 5, 10, 20, and 30, and the corresponding training and validation losses were examined. The model is considered to perform best when both training and validation losses are minimized. This indicates that the model is well-generalized to the training data and is expected to make accurate predictions on new input data. Table 12 presents the data shape and the minimum values of training and validation losses for each window size of 5, 10, 20, and 30. It was confirmed that the window size of 5 yielded the most optimal error metrics under consistent learning conditions.

While achieving low loss is essential, rapid convergence is equally critical in assessing the efficiency of model training. When the window size was set to (a) five, both training and validation losses decreased rapidly and converged toward minimal values. When the window size was (b) 10, the loss curve also converged rapidly but exhibited minor oscillations, indicating the presence of hunting. As the window size increased, the loss curves showed more pronounced hunting. At a window size of (d) 30, neither the training nor the validation loss converged, and severe oscillations were observed. Figure 8 illustrates the loss curves used to evaluate whether the model converges toward minimizing the loss during training.

Next, the authors analyzed the optimal values for the hidden layers and units. With the window size set to five as previously validated, the authors examined detection performance by modifying hidden layer and unit values while analyzing production data involving faulty motors. Three motors were identified as abnormal: MOT0901, which experienced an actual coupling fracture, and MOT1814 and MOT2111, which had no physical failure but exhibited fixed current values due to data transmission anomalies. A total of eight models were created with varying configurations of hidden layers and units. For each model trained solely on normal data, the training loss, validation loss, and R.E. values for the selected motors were evaluated. Additionally, the R.E. values were obtained by inputting test data into the models trained only on normal data, and the differences in R.E. values between the normal and abnormal states of the faulty motors were compared. Table 13 presents the results of the tests conducted with varying hidden layers and units.

As a result, the training loss values were generally low and showed little variation across models. However, their performance in identifying abnormal motors from test data with actual faults differed significantly. In summary, models #1–#7 exhibited errors, such as misclassifying normal motors as abnormal or failing to detect actual faults. For instance, model #7 predicted anomalies in three motors but failed to identify one motor with an actual failure and incorrectly flagged a normal motor as faulty.

Finally, model #8, configured with a window size of five, eight hidden layers, and 512 units, was the only model that accurately identified both normal and abnormal motors. Figure 9 graphically presents the results of each model in Table 13 regarding the classification of normal and abnormal motors. It can be observed that (d) model #8 accurately identified all three abnormal motors: MOT0901, MOT1814, and MOT2111.

5. Test and Validation

5.1. Test Setup

The test dataset was used to evaluate the performance of the proposed ROT-PMM in this study. To assess model performance, the ROT operation data from 300 coils prior to the motor failure were assumed to be abnormal, while the data from 300 coils after the repair were considered normal. The evaluation metrics included accuracy, precision, recall, and F1 score, based on a comparison between predicted and actual outcomes. In addition, the anomaly detection criteria were refined to facilitate practical deployment in real operating environments. Previously, as part of hyper-parameter selection, a single instance was classified as abnormal if the R.E. exceeded 20% of the maximum observed in normal data. However, in actual field applications, such thresholds may cause repeated alarms from spurious fluctuations, potentially undermining engineer trust due to false positives. Therefore, the authors introduced an algorithm that calculates the proportion of data points exceeding the threshold during a single coil’s production. If this proportion exceeds a predefined limit, the instance is classified as abnormal. This ratio is defined as the anomaly ratio (%) and is expressed in Equation (1).

Anomaly ratio (%) = \frac{Σ N u m b e r o f M S E a b o v e \max v a l u e}{Σ N u m b e r o f M S E}

(1)

To detect anomalies, a threshold is used to identify outliers that deviate from the range of normal data. In this study, the threshold was set as the maximum MSE value observed from motors trained on normal data.

The optimal anomaly detection ratio and corresponding model performance were derived using a confusion matrix [93]. Table 14 summarizes the four variables of the confusion matrix.

The four variables comprising the confusion matrix are defined as follows.

True Positive (TP): An actual anomaly occurred, and the model correctly predicted it as abnormal.
False Negative (FN): No actual anomaly occurred, but the model incorrectly predicted it as abnormal.
False Positive (FP): An actual anomaly occurred, but the model incorrectly predicted it as normal.
True Negative (TN): No actual anomaly occurred, and the model correctly predicted it as normal.

These four variables are used to calculate accuracy, precision, recall, and F1 score [93]. Equation (2) provides the formula for accuracy, which represents the proportion of total predictions where the model correctly identified true (abnormal) and false (normal) cases. Accuracy is considered the most intuitive performance metric.

Accuracy = \frac{T P + T N}{T P + T N + F P + F N} \times 100 %

(2)

Equation (3) represents precision, which is the ratio of instances the model predicted as true (abnormal) that were actually abnormal.

Precision = \frac{T P}{(T P + F P)} \times 100 %

(3)

Equation (4) represents recall, which is the ratio of actual abnormal (true) instances that the model correctly predicted as abnormal. Precision and recall are complementary metrics, and a well-performing model achieves high values for both.

Recall = \frac{T P}{(T P + F N)} \times 100 %

(4)

Equation (5) defines the F1 score, which is calculated as the harmonic mean of precision and recall.

F 1 s core = 2 \times \frac{(P r e c i s i o n \times R e c a l l)}{(P r e c i s i o n + R e c a l l)} \times 100 %

(5)

5.2. Test Results

When deployed in the actual field, applying a fixed threshold value may trigger repetitive alarms due to transient anomalies occurring as the strip progresses. Excessive false alarms can erode user confidence and lead to alarm fatigue, resulting in abandonment of the system on-site. To address this, an algorithm was introduced that calculates the percentage of values exceeding the threshold during the production of a single coil. If this percentage exceeds a certain limit, the equipment is classified as abnormal. This percentage is referred to as the anomaly ratio (%). The model’s performance under different anomaly ratio settings, implemented via Python program, is illustrated in Figure 10. In other words, the anomaly ratio was varied from 1–100% to evaluate performance of the model to correctly predict 600 total cases, including both normal and abnormal conditions. Since precision and recall performance tend to be inversely proportional to the anomaly ratio, it is preferable to evaluate the performance of the model with F1-score, which is their harmonic mean. Even if the recall—indicating the proportion of actual failures correctly predicted—is high, the model loses its practical value if precision is low, as it leads to false alarms and imposes unnecessary inspection burdens on operators. Therefore, ROT-PMM achieved its highest performance at an F1-score of 91%, which corresponded to an anomaly ratio of 27%.

Table 15 presents the confusion matrix values and overall performance metrics corresponding to the best-performing configuration of the ROT-PMM.

When the model achieved its highest prediction accuracy, there were 259 coils classified as TP, meaning both actual and predicted anomalies were present. FN occurred in 41 coils, where anomalies occurred but were not predicted. There were 12 FP, where no anomaly occurred but the model incorrectly predicted one. Finally, 288 coils were TN, where no anomaly occurred, and the model correctly predicted normal conditions. Accordingly, the LSTM-AE-based ROT-PMM achieved a performance of 91% accuracy, 96% precision, 86% recall, and 91% F1 score. Based on the above results, the accuracy of the model was 91%, and the values of precision and recall were also quite high. The precision of 96% indicates that the majority of instances predicted as faults by the model were indeed actual faults. This provides justification for field operators to respond when the deployed model triggers an anomaly alarm. Recall was 86%, indicating the proportion of actual faults successfully detected by the model. However, this value may be considered relatively low. In the developed model, the threshold was optimized for overall performance, making further improvements to recall through threshold adjustment infeasible. To improve the recall, one may consider acquiring additional failure-correlated data or generating derived features from motor current to enhance anomaly visibility.

6. Case Study and System Application

6.1. Case Study

To validate the actual operational performance of the ROT-PMM, the authors used data from a tire coupling fracture in MOT0901 that occurred on 6 January 2023 to assess the PdM capability of the ROT system. The anomaly ratio was calculated using ROT-PMM based on the equipment data collected from the production of 3500 coils starting 1 January 2023. Then, the authors identified the point at which the calculated anomaly ratio exceeded the threshold and detected an anomaly. The ROT-PMM predicted the equipment fault 40 h in advance and triggered an alarm. Until 5 January, the equipment operated normally, after which the anomaly ratio rose significantly starting at 2:00 on that day. It was confirmed that 40 h elapsed between the model’s initial anomaly detection and the operator’s recognition and response to the equipment fault. This result demonstrates the potential of ROT-PMM to enable cost-effective PdM by predicting equipment failures in advance. Additionally, the model’s consistent alarm generation after detecting the anomaly indicates high reliability and performance. Figure 11 illustrates the actual failure situation and the result of ROT-PMM’s prior detection of equipment anomalies.

Previous studies on PM and PdM utilizing AI are discussed in Section 2.1 and Section 2.2. Most of these studies applied AI technologies primarily for defect detection rather than failure prediction. This study presents several distinguishing features.

Although this study also uses the LSTM-AE algorithm for the ROT-PMM, its application for anomaly prediction, hyper-parameter tuning, and threshold setting differs from those in prior research. In the context of applications within the steel industry, Choi et al.’s research focused on distinguishing between normal and abnormal states of a centering device’s shaft, utilizing data from multiple servo motors to ensure strip centering [47]. However, this study directly uses motor current data to assess anomalies in 366 motors, developing a model capable of independently evaluating each motor’s condition. For hyper-parameter selection, this paper utilized three motors with actual failure data and data communication anomalies to create various model configurations, ultimately selecting the most suitable hyper-parameters. Regarding threshold setting, this study calculated the anomaly ratio for each coil production, varying the anomaly ratio from 1–100% to determine the threshold that yielded the highest model performance. Unlike previous studies that simply detected anomaly signals and triggered alarms upon exceeding a set threshold, this research employed a Python analysis program to evaluate model performance across different anomaly ratios, ensuring optimal performance. Furthermore, validation using actual failure data revealed that the model detected failures 40 h prior to operator recognition. This demonstrates the model’s efficacy as a predictive tool that can be directly implemented in the field, rather than merely serving as a PoC.

This study demonstrated that PdM enables the real-time monitoring of equipment conditions and early failure prediction in steelmaking and hot rolling processes, thereby minimizing unexpected downtime, reducing maintenance costs, and extending equipment lifespan. This approach contributes to optimized energy consumption, reduced energy costs, and improved operational efficiency, ultimately leading to lower carbon emissions and the realization of sustainable manufacturing.

A well-implemented PdM strategy can significantly reduce energy consumption in industrial and manufacturing settings. Although the primary focus of PdM is on ensuring reliability and reducing downtime, its impact on energy efficiency is significant. Conventional failure prediction systems for PdM are widely used in general industries. However, the harsh conditions of steelmaking facilities, including dust, splashing coolant, and high temperatures, limit the applicability of such systems. Therefore, developing a failure prediction model tailored to the environment and operational conditions of steelmaking equipment is crucial for optimizing maintenance schedules, minimizing the consumption of parts and materials, and reducing costs. It is anticipated that the ROT-PMM developed in this study will contribute to more sustainable and energy-conscious production operations by ensuring optimal machine performance, reducing waste, and minimizing downtime.

6.2. System Application

The authors worked on system development to apply the ROT-PMM in practice. Currently, sensor data from the field are collected by a DAQ device called Iba, and process data are also transmitted to the DAQ system from the PLC [69]. The analog data collected in the DAQ server are converted into units and stored in the database. The stored data are transmitted to the AI development server through TCP/IP communication, and the transmitted data are converted into data that can be learned by internal preprocessing (e.g., data classification by coil, data type conversion) and used to train AI models. The trained model is shared with the AI model operation server to predict the faults of the ROT drive by coil based on real-time equipment operation data, and the result is sent to the DAQ server. Based on the received result data value, the monitor data are output to the DAQ so that the operator or mechanic can see them through the Q-panel. Figure 12 shows the hierarchical structure of the ROT-PMM.

The ROT-PMM model development and execution server was installed, and Python 3.8.1 was installed and used on the Linux operating system. The PLC for process control installed at the site was a TMEIC nV PLC model. The DAQ system used the Window 10 operating system.

7. Economic Benefits Analysis

7.1. Economic Benefits Analysis of ROT-PMM

ROT-PMM is a method for predicting and preventing equipment faults in advance, and it offers two major economic effects. First, it can reduce unplanned downtime due to equipment faults and increase production, leading to a reduction in fixed costs. Second, by predicting equipment conditions in real-time and replacing parts at optimal times, it is possible to extend the replacement cycle of TBM components and reduce maintenance costs. Table 16 provides foundational data from the C hot rolling mill for conducting the benefit analysis. The cost of lost production opportunity was calculated based on failure data from the C hot rolling mill (2014–2021 average), and the production downtime caused by failures was 2.3 h per year. The TBM cost for each major ROT component was calculated based on the current replacement cycle to estimate the annual maintenance expenses.

Table 17 summarizes the result of the economic analysis of the ROT-PMM. The cost of lost production opportunity is estimated at USD 26 K/year, the reduction in facility management cost is USD 51 K/year, and the economic effect of realizing PdM of ROT equipment is USD 77 K/year. Since the economic benefit was calculated based on a single C hot rolling mill, the projected economic effect could reach approximately USD 308 K/year if the system is expanded to all three hot rolling mills of Company P.

7.2. Comparative Analysis of Key Operational Indicators Between PM and PdM

This study conducted a comparative analysis of key operational metrics between PM and PdM in steel manufacturing processes. A commonly employed approach to evaluate the robustness and effectiveness of the proposed model involves conducting ablation studies and comparative experiments. An ablation study is an experimental method that involves removing certain components of a system to quantify their impact and identify the contribution of each element [94]. It is commonly used in AI and software domains to validate the effectiveness of model components. However, in heavy industries, such as steel plants, field equipment is tightly integrated through complex process interlocks and continuous 24 h operations. As a result, isolating individual components for ablation studies is impractical, as it would inevitably cause production line downtime [95]. Most of the existing PdM literature focuses on model-centric studies that compare performance using public benchmarks or laboratory data, and practical system-level validation cases remain rare [96]. Nevertheless, this study validated the model using statistical verification derived from operational data and expert evaluations from field engineers, thereby securing sufficient grounds for validation without conducting ablation experiments.

The data used in the analysis reflect the actual outcomes of the case study in which the ROT-PMM was implemented. The ROT-PMM demonstrates that, compared to conventional PM, PdM can significantly improve energy efficiency by optimizing motor performance through early anomaly detection, reducing unplanned downtime by 80%, and lowering annual maintenance costs by 14%. Table 18 presents the results of Company P’s case analysis on the implementation of ROT-PMM, quantitatively comparing the effects of PdM with those of PM.

PdM has demonstrated superior performance over PM across all key maintenance indicators, including failure rate, cost, and downtime. In particular, PM may result in the unnecessary consumption of labor and spare parts, as it schedules maintenance regardless of the actual condition of the equipment. In contrast, PdM, driven by real-time data analysis, performs maintenance only when needed, thereby reducing redundant tasks and minimizing total resource expenditure. Furthermore, PdM’s ability to detect abnormal signals in real time allows for prefailure interventions, a level of responsiveness that conventional maintenance strategies cannot achieve. This not only highlights PdM’s superiority as a maintenance strategy but also justifies its adoption as an AI-driven operational paradigm.

In addition to the quantitatively estimated economic effects, steel plants contain numerous roll-driven facilities that use motors, suggesting a high potential for applying AI technologies, such as ROT-PMM, to similar systems. This can contribute to more scientific facility management in steel plants and facilitate the implementation of predictive maintenance. Furthermore, the real-time monitoring of equipment faults is expected to eliminate the need for inspections during operation and reduce on-site interventions caused by equipment failures, thereby minimizing safety risks for engineers.

8. Conclusions

8.1. Summary and Contribution

This study was conducted to develop a PdM technique for ROT drive equipment, which transports strips through a laminar flow cooling section before coiling in the hot strip mills of steel plants. To predict the anomalies in ROT drive equipment, the ROT-PMM was developed using an LSTM-AE algorithm, one of the AI techniques.

For this purpose, approximately one month of ROT operation data from Company P’s hot rolling mill were collected and preprocessed, including removing missing values and outliers and performing data scaling. Considering the time-series characteristics of ROT equipment data, the authors selected an unsupervised learning algorithm based on LSTM-AE, which integrates the advantages of autoencoders and LSTM, to structure the operational framework of ROT-PMM. To optimize the developed model’s performance, the authors trained it with normal data, tested it with abnormal data, and conducted hyper-parameter tuning. Subsequently, for practical application, the anomaly ratio was defined as the percentage of MSE values exceeding a threshold during coil production, and the anomaly ratio with the highest accuracy, determined by confusion matrix analysis, was established as the criterion for anomaly detection. Test results demonstrated an accuracy of 91%, precision of 96%, recall of 86%, and F1 score of 91%, confirming that the model detected faults approximately 40 h prior to operator recognition.

The ROT-PMM was developed as a fully operational model suitable for practical deployment in actual steel mill processes, moving beyond a mere PoC. Additionally, by inputting current data from 366 ROT motors into a single AI model for real-time individual deviation analysis, the ROT-PMM was designed to enable predictive maintenance across multiple units without incurring additional server costs.

The ROT-PMM developed in this study can enhance productivity and equipment reliability in industries requiring large-scale facilities, such as the steel industry, by proactively detecting faults and reducing unnecessary maintenance through PdM. Moreover, the application of advanced technologies, such as AI, is expected to contribute to preventing repetitive and similar failures in steel production equipment, as well as broadening technological advancements and practical applicability across other industrial sectors. Consequently, this approach can establish a stable operational environment, minimize disruptions caused by unexpected maintenance, enhance corporate competitiveness, and ultimately promote sustainable development across the industry.

8.2. Limitations and Further Study

In this study, based on the characteristics of the operation data of hot rolling ROT equipment (normal data skewness, time series data), the authors developed the ROT-PMM to predict the failure of ROT equipment using the reliable LSTM-AE algorithm among unsupervised learning methods. The authors investigated the performance and effectiveness of the fault prediction model using actual fault data, but the main limitations of the model and further investigation are as follows.

Second, although the ROT-PMM model developed in this study is effective in detecting anomalies, it has limitations in quantitatively explaining or classifying the root causes of the anomalies. To address this, future research should apply explainable AI techniques, such as Shapley additive explanations (SHAP) and local interpretable model-agnostic explanations (LIME), to clarify the relationships between input features and prediction results, thereby enhancing model reliability. Additionally, a dual-structured prediction model that integrates LSTM-AE with decision tree or random forest classifiers should be developed to accurately identify the underlying causes of anomalies.

Third, while the current model is based on single-modality current data, multivariate analysis is required to interpret a wider range of fault patterns. Future studies should incorporate additional sensor signals related to equipment failure, such as vibration, temperature, rotational speed, and bearing temperature, and optimize input features through correlation analysis and principal component analysis (PCA) to better reflect fault-specific characteristics and improve prediction accuracy and model generalization.

Fourth, implementing a PdM system within existing steel manufacturing infrastructure faces several challenges, including ensuring data quality, integrating sensors, training personnel, and securing initial investment. To address these obstacles, strategic approaches, such as phased PdM deployment, pilot project execution, and strengthened stakeholder collaboration, are required.

Author Contributions

Conceptualization, J.-W.Y., S.-W.C., and E.-B.L.; methodology, J.-W.Y. and S.-W.C.; validation, J.-W.Y. and S.-W.C.; formal analysis, J.-W.Y. and S.-W.C.; investigation, J.-W.Y. and S.-W.C.; resources, J.-W.Y.; data curation, J.-W.Y. and S.-W.C.; writing—original draft preparation, J.-W.Y.; writing—review and editing, J.-W.Y. and S.-W.C.; visualization, J.-W.Y. and S.-W.C.; supervision, S.-W.C. and E.-B.L.; project administration, E.-B.L.; funding acquisition, E.-B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was sponsored by the Korea Ministry of Trade Industry and Energy (MOTIE) and the Korea Evaluation Institute of Industrial Technology (KEIT) through the Technology Innovation Program funding for “Development of optimization technology for pipe-cable auto-routing design linked to carbon reduction model” project (Grant No. RS-2022-00143619).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to give special thanks to Geon-Woo Kim (a Master course student at Pohang University of Science and Technology) for his technical support of this study. The views expressed in this paper are solely those of the authors and do not represent those of any official organization or research sponsor.

Conflicts of Interest

Author Ju-Woong Yun was employed by the company Pohang Iron and Steel Company (POSCO). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Abbreviations

The following abbreviations and parameters are used in this paper:

AE	autoencoder
AI	artificial intelligence
CBM	condition-based maintenance
CMS	condition monitoring system
CNN	convolutional neural network
GAN	generative adversarial network
GRU	gated recurrent unit
LSTM	Long Short-Term Memory
LSTM-AE	Long Short-Term Memory Autoencoder
ML	machine learning
MLP	multi-layer perceptron
MSE	mean squared error
PC	Process Computer
PdM	predictive maintenance
PLC	Programmable Logic Controller
PM	preventive maintenance
RNN	recurrent neural network
ROT	Run-Out Table
ROT-PMM	Run-Out Table Predictive Maintenance Model
SDAE	stacked denoising autoencoder
SVM	support vector machine
TBM	time-based maintenance

References

Ahmad, R.; Kamaruddin, S. An overview of time-based and condition-based maintenance in industrial application. Comput. Ind. Eng. 2012, 63, 135–149. [Google Scholar] [CrossRef]
Pech, M.; Vrchota, J.; Bednář, J. Predictive maintenance and intelligent sensors in smart factory. Sensors 2021, 21, 1470. [Google Scholar] [CrossRef]
Achouch, M.; Dimitrova, M.; Ziane, K.; Sattarpanah Karganroudi, S.; Dhouib, R.; Ibrahim, H.; Adda, M. On predictive maintenance in industry 4.0: Overview, models, and challenges. Appl. Sci. 2022, 12, 8081. [Google Scholar] [CrossRef]
Artesis. How Can Predictive Maintenance Contribute To Energy Efficiency? Available online: https://artesis.com/how-can-predictive-maintenance-contribute-to-energy-efficiency/ (accessed on 14 October 2024).
Zanoli, S.M.; Pepe, C.; Orlietti, L. Multi-Mode Model Predictive Control Approach for Steel Billets Reheating Furnaces. Sensors 2023, 23, 3966. [Google Scholar] [CrossRef] [PubMed]
Volta Insite. Steel & Metal Products—Predictive Maintenance. Available online: https://voltainsite.com/steel-metal-products.html (accessed on 20 March 2025).
Ruiz-Sarmiento, J.-R.; Monroy, J.; Moreno, F.-A.; Galindo, C.; Bonelo, J.-M.; Gonzalez-Jimenez, J. A predictive model for the maintenance of industrial machinery in the context of industry 4.0. Eng. Appl. Artif. Intell. 2020, 87, 103289. [Google Scholar] [CrossRef]
DAEJI STEEL. Steel Product Production Process. Available online: http://www.daejisteel.com/html/material/sub01.htm (accessed on 20 October 2024).
POSCO. Introduction to the Steel Manufacturing Process. Available online: http://swpecm.posco.net:7091/ECM/swp_interface.jsp?ACTID=viewlink&OBJECTIO=30393030626634626132623833623136&DOCID=646f6330393030626634623963633039303062&SYSID=45434d (accessed on 11 November 2024).
Choi, I.; Rossiter, J.; Fleming, P. Looper and tension control in hot rolling mills: A survey. J. Process Control 2007, 17, 509–521. [Google Scholar] [CrossRef]
POSCO. Analysis of ROT Failure. Available online: http://swpecm.posco.net:7091/ECM/swp_interface.jsp?ACTID=viewlink&OBJECTIO=30393030626634626134303834366132&DOCID=646f6330393030626634626133633039303062&SYSID=45434d (accessed on 14 November 2024).
Mehamud, I.; Marklund, P.; Björling, M.; Shi, Y. Machine condition monitoring enabled by broad range vibration frequency detecting triboelectric nano-generator (TENG)-based vibration sensors. Nano Energy 2022, 98, 107292. [Google Scholar] [CrossRef]
ISO 10836; Iron Ores—Method of Sampling and Sample Preparation for Physical Testing. International Organization for Standardization: Geneva, Switzerland, 1994.
Tran Anh, D.; Dąbrowski, K.; Skrzypek, K. The Predictive Maintenance Concept in the Maintenance Department of the “Industry 4.0” Production Enterprise. Found. Manag. 2018, 10, 283–292. [Google Scholar] [CrossRef]
Farooq, U.; Ademola, M.; Shaalan, A. Comparative Analysis of Machine Learning Models for Predictive Maintenance of Ball Bearing Systems. Electronics 2024, 13, 438. [Google Scholar] [CrossRef]
Krupitzer, C.; Wagenhals, T.; Züfle, M.; Lesch, V.; Schäfer, D.; Mozaffarin, A.; Edinger, J.; Becker, C.; Kounev, S. A survey on predictive maintenance for industry 4.0. arXiv 2020, arXiv:2002.08224. [Google Scholar] [CrossRef]
Wang, K.-S.; Tsai, Y.-T.; Lin, C.-H. A study of replacement policy for components in a mechanical system. Reliab. Eng. Syst. Saf. 1997, 58, 191–199. [Google Scholar] [CrossRef]
Satow, T.; Teramoto, K.; Nakagawa, T. Optimal replacement policy for a cumulative damage model with time deterioration. Math. Comput. Model. Dyn. Syst. 2000, 31, 313–319. [Google Scholar] [CrossRef]
Chan, G.; Asgarpoor, S. Optimum maintenance policy with Markov processes. Electr. Power Syst. Res. 2006, 76, 452–456. [Google Scholar] [CrossRef]
Crowder, M.; Lawless, J. On a scheme for predictive maintenance. Eur. J. Oper. Res. 2007, 176, 1713–1722. [Google Scholar] [CrossRef]
Panagiotidou, S.; Tagaras, G. Optimal preventive maintenance for equipment with two quality states and general failure time distributions. Eur. J. Oper. Res. 2007, 180, 329–353. [Google Scholar] [CrossRef]
Jayaswal, P.; Wadhwani, A.; Mulchandani, K. Machine fault signature analysis. Int. J. Rotating Mach. 2008, 2008, 583982. [Google Scholar] [CrossRef]
Márquez, F.P.G.; Tobias, A.M.; Pérez, J.M.P.; Papaelias, M. Condition monitoring of wind turbines: Techniques and methods. Renew. Energy 2012, 46, 169–178. [Google Scholar] [CrossRef]
Bagavathiappan, S.; Lahiri, B.B.; Saravanan, T.; Philip, J.; Jayakumar, T. Infrared thermography for condition monitoring—A review. Infrared Phys. Technol. 2013, 60, 35–55. [Google Scholar] [CrossRef]
Zhu, X.; Zhong, C.; Zhe, J. Lubricating oil conditioning sensors for online machine health monitoring–A review. Tribol. Int. 2017, 109, 473–484. [Google Scholar] [CrossRef]
Wang, J.; Qiu, Q.; Wang, H.; Lin, C. Optimal condition-based preventive maintenance policy for balanced systems. Reliab. Eng. Syst. Saf. 2021, 211, 107606. [Google Scholar] [CrossRef]
Akl, A.M.; El Sawah, S.; Chakrabortty, R.K.; Turan, H.H. A joint optimization of strategic workforce planning and preventive maintenance scheduling: A simulation–Optimization approach. Reliab. Eng. Syst. Saf. 2022, 219, 108175. [Google Scholar] [CrossRef]
Lolli, F.; Coruzzolo, A.M.; Peron, M.; Sgarbossa, F. Age-based preventive maintenance with multiple printing options. Int. J. Prod. Econ. 2022, 243, 108339. [Google Scholar] [CrossRef]
Shi, Y.; Lu, Z.; Huang, H.; Liu, Y.; Li, Y.; Zio, E.; Zhou, Y. A new preventive maintenance strategy optimization model considering lifecycle safety. Reliab. Eng. Syst. Saf. 2022, 221, 108325. [Google Scholar] [CrossRef]
Su, J.; Huang, J.; Adams, S.; Chang, Q.; Beling, P.A. Deep multi-agent reinforcement learning for multi-level preventive maintenance in manufacturing systems. Expert Syst. Appl. 2022, 192, 116323. [Google Scholar] [CrossRef]
Dui, H.; Xu, H.; Zhang, L.; Wang, J. Cost-based preventive maintenance of industrial robot system. Reliab. Eng. Syst. Saf. 2023, 240, 109595. [Google Scholar] [CrossRef]
Li, Y.; Xia, T.; Chen, Z.; Pan, E. Multiple degradation-driven preventive maintenance policy for serial-parallel multi-station manufacturing systems. Reliab. Eng. Syst. Saf. 2023, 230, 108905. [Google Scholar] [CrossRef]
An, Y.; Chen, X.; Gao, K.; Zhang, L.; Li, Y.; Zhao, Z. A hybrid multi-objective evolutionary algorithm for solving an adaptive flexible job-shop rescheduling problem with real-time order acceptance and condition-based preventive maintenance. Expert Syst. Appl. 2023, 212, 118711. [Google Scholar] [CrossRef]
Wu, C.; Pan, R.; Zhao, X.; Wang, X. Designing preventive maintenance for multi-state systems with performance sharing. Reliab. Eng. Syst. Saf. 2024, 241, 109661. [Google Scholar] [CrossRef]
Zhang, C.; Fang, Z.; Dong, W. Preventive maintenance strategy for multi-component systems in dynamic risk assessment. Reliab. Eng. Syst. Saf. 2025, 254, 110611. [Google Scholar] [CrossRef]
Çınar, Z.M.; Abdussalam Nuhu, A.; Zeeshan, Q.; Korhan, O.; Asmael, M.; Safaei, B. Machine learning in predictive maintenance towards sustainable smart manufacturing in industry 4.0. Sustainability 2020, 12, 8211. [Google Scholar] [CrossRef]
Carvalho, T.P.; Soares, F.A.; Vita, R.; Francisco, R.d.P.; Basto, J.P.; Alcalá, S.G. A systematic literature review of machine learning methods applied to predictive maintenance. Comput. Ind. Eng. 2019, 137, 106024. [Google Scholar] [CrossRef]
Orru, P.F.; Zoccheddu, A.; Sassu, L.; Mattia, C.; Cozza, R.; Arena, S. Machine Learning Approach Using MLP and SVM Algorithms for the Fault Prediction of a Centrifugal Pump in the Oil and Gas Industry. Sustainability 2020, 12, 4776. [Google Scholar] [CrossRef]
Hsu, J.-Y.; Wang, Y.-F.; Lin, K.-C.; Chen, M.-Y.; Hsu, J.H.-Y. Wind turbine fault diagnosis and predictive maintenance through statistical process control and machine learning. IEEE Access 2020, 8, 23427–23439. [Google Scholar] [CrossRef]
Cheliotis, M.; Lazakis, I.; Theotokatos, G. Machine learning and data-driven fault detection for ship systems operations. Ocean Eng. 2020, 216, 107968. [Google Scholar] [CrossRef]
Serradilla, O.; Zugasti, E.; Ramirez de Okariz, J.; Rodriguez, J.; Zurutuza, U. Adaptable and explainable predictive maintenance: Semi-supervised deep learning for anomaly detection and diagnosis in press machine data. Appl. Sci. 2021, 11, 7376. [Google Scholar] [CrossRef]
Khalid, S.; Hwang, H.; Kim, H.S. Real-world data-driven machine-learning-based optimal sensor selection approach for equipment fault detection in a thermal power plant. Mathematics 2021, 9, 2814. [Google Scholar] [CrossRef]
Shi, L.; Zhu, Y.; Zhang, Y.; Su, Z. Fault diagnosis of signal equipment on the lanzhou-xinjiang high-speed railway using machine learning for natural language processing. Complexity 2021, 2021, 9126745. [Google Scholar] [CrossRef]
Oh, M.-J.; Choi, E.-S.; Roh, K.-W.; Kim, J.-S.; Cho, W.-S. A Study on the design of supervised and unsupervised learning models for fault and anomaly detection in manufacturing facilities. J. Big Data 2021, 6, 23–35. [Google Scholar] [CrossRef]
Ghazali, N.; Seman, F.; Isa, K.; Ramli, K.; Abidin, Z.; Mustam, S.; Haek, M.; Abidin, A.; Asrokin, A. Twisted pair cable fault diagnosis via random forest machine learning. Comput. Mater. Contin. 2022, 71, 5427–5440. [Google Scholar] [CrossRef]
Choi, S.-W.; Seo, B.-G.; Lee, E.-B. Machine Learning-Based Tap Temperature Prediction and Control for Optimized Power Consumption in Stainless Electric Arc Furnaces (EAF) of Steel Plants. Sustainability 2023, 15, 6393. [Google Scholar] [CrossRef]
Choi, J.-S.; Choi, S.-W.; Lee, E.-B. Modeling of Predictive Maintenance Systems for Laser-Welders in Continuous Galvanizing Lines Based on Machine Learning with Welder Control Data. Sustainability 2023, 15, 7676. [Google Scholar] [CrossRef]
Hadi, R.H.; Hady, H.N.; Hasan, A.M.; Al-Jodah, A.; Humaidi, A.J. Improved Fault Classification for Predictive Maintenance in Industrial IoT Based on AutoML: A Case Study of Ball-Bearing Faults. Processes 2023, 11, 1507. [Google Scholar] [CrossRef]
Chandu, H.S. Enhancing Manufacturing Efficiency: Predictive Maintenance Models Utilizing IoT Sensor Data. Int. J. Sci. Res. Technol. (IJSART) 2024, 10, 58–66. [Google Scholar]
Bampoula, X.; Nikolakis, N.; Alexopoulos, K. Condition Monitoring and Predictive Maintenance of Assets in Manufacturing Using LSTM-Autoencoders and Transformer Encoders. Sensors 2024, 24, 3215. [Google Scholar] [CrossRef]
Faizanbasha, A.; Rizwan, U. Optimizing burn-in and predictive maintenance for enhanced reliability in manufacturing systems: A two-unit series system approach. J. Manuf. Syst. 2025, 78, 244–270. [Google Scholar] [CrossRef]
Chapelin, J.; Voisin, A.; Rose, B.; Iung, B.; Steck, L.; Chaves, L.; Lauer, M.; Jotz, O. Data-driven drift detection and diagnosis framework for predictive maintenance of heterogeneous production processes: Application to a multiple tapping process. Eng. Appl. Artif. Intell. 2025, 139, 109552. [Google Scholar] [CrossRef]
Zhang, L.; Shi, Y.; Wang, D. A Real-Time Lightweight Perceptron for Cloud–Edge Collaborative Predictive Maintenance of Online Service Systems. IEEE Internet Things J. 2025, 1, 12640–12657. [Google Scholar] [CrossRef]
Guo, J.; Wang, Z.; Li, H.; Yang, Y.; Huang, C.-G.; Yazdi, M.; Kang, H.S. A hybrid prognosis scheme for rolling bearings based on a novel health indicator and nonlinear Wiener process. Reliab. Eng. Syst. Saf. 2024, 245, 110014. [Google Scholar] [CrossRef]
Yang, X.; He, Y.; Liao, R.; Cai, Y.; Dai, W. Mission reliability-centered opportunistic maintenance approach for multistate manufacturing systems. Reliab. Eng. Syst. Saf. 2024, 241, 109693. [Google Scholar] [CrossRef]
Huai, W.; Gao, M.; Liu, S.; Sheng, L. Fault prognosis for linear stochastic systems with intermittent fault and strong noise via health indicator extraction approach. Qual. Reliab. Eng. Int. 2024, 40, 2456–2472. [Google Scholar] [CrossRef]
Liu, C.; Zhang, L.; Zheng, Y.; Jiang, Z.; Zheng, J.; Wu, C. Online industrial fault prognosis in dynamic environments via task-free continual learning. Neurocomputing 2024, 598, 127930. [Google Scholar] [CrossRef]
Kim, M.; Seong, H.; Kim, D. Deep Learning-Based Prognostics and Health Management Model for Pilot-Operated Cryogenic Safety Valves. Sensors 2024, 24, 1814. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Jiang, W.; Shi, L.; Zhang, L. Rolling bearing remaining useful life prediction using deep learning based on high-quality representation. Sci. Rep. 2025, 15, 8228. [Google Scholar] [CrossRef]
Li, H.J.; Li, L.G.; Li, Y.L.; Wang, G.D. Online Monitor and Control of Cooling Temperature on Run-out Table of Hot Strip Mill. Steel Res. Int. 2015, 86, 1225–1233. [Google Scholar] [CrossRef]
Sugihara, H.; Ueoka, S.; Hino, Y.; Kijima, H.; Nakata, N. Quantitative Evaluation of Stability of Water Flow Injected from Pipe Laminar Nozzle. ISIJ Int. 2015, 55, 235–240. [Google Scholar] [CrossRef]
Woo, Y.Y.; Han, S.W.; Cho, J.R.; Moon, Y.H. Air jet impingement to reduce hot strip wave on a run-out table. Mech. Ind. 2018, 19, 601. [Google Scholar] [CrossRef]
Aoe, S.; Ohara, Y.; Miyake, M.; Kabeya, K. Simulation of Unstable Strip Running on Hot Run-Out-Table. ISIJ Int. 2019, 59, 496–503. [Google Scholar] [CrossRef]
Luk’yanov, S.I.; Shvidchenko, N.V.; Krasilnikov, S.S.; Pishnograev, R.S.; Shvidchenko, D.V.; Konovalov, M.V. Optimizing speed of a run-out table of the hot strip mill. Int. J. Adv. Manuf. Technol. 2019, 105, 1675–1684. [Google Scholar] [CrossRef]
Tatebe, K.; Shioiri, Y.; Fujita, S.; Fujimoto, H. Development of a Method for Evaluating Heat Transfer Characteristics of a Circular Water Jet Impinging on a Moving Flat Plate. Tetsu-To-Hagane/J. Iron Steel Inst. Jpn. 2022, 108, 823–834. [Google Scholar] [CrossRef]
Wu, H.; Sun, J.; Peng, W.; Zhang, D. Residual stress control of hot-rolled strips during run-out table cooling. Int. J. Adv. Manuf. Technol. 2023, 125, 3205–3227. [Google Scholar] [CrossRef]
Yazdani, S.; Tavakoli, M.R.; Niroomand, M.R.; Forouzan, M.R. Cooling pattern on the run-out table of a hot rolling mill for an HSLA steel: A finite element analysis. Int. J. Adv. Manuf. Technol. 2024, 132, 2381–2393. [Google Scholar] [CrossRef]
Jena, M.; Mishra, P.C.; Sahoo, S.S. Comparative performance of jet and spray impingement cooling in steel strip run-out table: Experimental results. Aust. J. Mech. Eng. 2024, 22, 109–122. [Google Scholar] [CrossRef]
Iba. Iba Software & Hardware. Available online: https://www.iba-ag.com/en/iba-system (accessed on 18 December 2024).
García, S.; Ramírez-Gallego, S.; Luengo, J.; Benítez, J.M.; Herrera, F. Big data preprocessing: Methods and prospects. Big Data Anal. 2016, 1, 9. [Google Scholar] [CrossRef]
Mehala, N.; Dahiya, R. Motor current signature analysis and its applications in induction motor fault diagnosis. Int. J. Syst. Appl. Eng. Dev. 2007, 2, 29–35. [Google Scholar]
Lee, H.; Yun, S. Strategies for imputing missing values and removing outliers in the dataset for machine learning-based construction cost prediction. Buildings 2024, 14, 933. [Google Scholar] [CrossRef]
Hermansa, M.; Kozielski, M.; Michalak, M.; Szczyrba, K.; Wróbel, Ł.; Sikora, M. Sensor-Based Predictive Maintenance with Reduction of False Alarms—A Case Study in Heavy Industry. Sensors 2022, 22, 226. [Google Scholar] [CrossRef] [PubMed]
Cofre-Martel, S.; Lopez Droguett, E.; Modarres, M. Big Machinery Data Preprocessing Methodology for Data-Driven Models in Prognostics and Health Management. Sensors 2021, 21, 6841. [Google Scholar] [CrossRef]
Dash, C.S.K.; Behera, A.K.; Dehuri, S.; Ghosh, A. An outliers detection and elimination framework in classification task of data mining. Decis. Anal. J. 2023, 6, 100164. [Google Scholar] [CrossRef]
Yaro, A.S.; Maly, F.; Prazak, P. Outlier Detection in Time-Series Receive Signal Strength Observation Using Z-Score Method with S n Scale Estimator for Indoor Localization. Appl. Sci. 2023, 13, 3900. [Google Scholar] [CrossRef]
Ahsan, M.M.; Mahmud, M.P.; Saha, P.K.; Gupta, K.D.; Siddique, Z. Effect of data scaling methods on machine learning algorithms and model performance. Technologies 2021, 9, 52. [Google Scholar] [CrossRef]
Maleki, S.; Maleki, S.; Jennings, N.R. Unsupervised anomaly detection with LSTM autoencoders using statistical data-filtering. Appl. Soft Comput. 2021, 108, 107443. [Google Scholar] [CrossRef]
Al-amri, R.; Murugesan, R.K.; Man, M.; Abdulateef, A.F.; Al-Sharafi, M.A.; Alkahtani, A.A. A review of machine learning and deep learning techniques for anomaly detection in IoT data. Appl. Sci. 2021, 11, 5320. [Google Scholar] [CrossRef]
Li, P.; Pei, Y.; Li, J. A comprehensive survey on design and application of autoencoder in deep learning. Appl. Soft Comput. 2023, 138, 110176. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
Wen, L.; Su, S.; Li, X.; Ding, W.; Feng, K. GRU-AE-wiener: A generative adversarial network assisted hybrid gated recurrent unit with Wiener model for bearing remaining useful life estimation. Mech. Syst. Signal Process. 2024, 220, 111663. [Google Scholar] [CrossRef]
Mak, H.W.L.; Han, R.; Yin, H.H. Application of variational autoEncoder (VAE) model and image processing approaches in game design. Sensors 2023, 23, 3457. [Google Scholar] [CrossRef]
Boppana, T.K.; Bagade, P. GAN-AE: An unsupervised intrusion detection system for MQTT networks. Eng. Appl. Artif. Intell. 2023, 119, 105805. [Google Scholar] [CrossRef]
Vos, K.; Peng, Z.; Jenkins, C.; Shahriar, M.R.; Borghesani, P.; Wang, W. Vibration-based anomaly detection using LSTM/SVM approaches. Mech. Syst. Signal Process. 2022, 169, 108752. [Google Scholar] [CrossRef]
Priya Varshini, A.; Anitha Kumari, K. Predictive analytics approaches for software effort estimation: A review. Indian J. Sci. Technol. 2020, 13, 2094–2103. [Google Scholar] [CrossRef]
Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
Kil, R.M.; Park, S.H.; Kim, S. Optimum window size for time series prediction. In Proceedings of the 19th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. ‘Magnificent Milestones and Emerging Opportunities in Medical Engineering’ (Cat. No. 97CH36136), Chicago, IL, USA, 30 October–2 November 1997; pp. 1421–1424. [Google Scholar]
Rafique, F.; Fu, L.; Mai, R. LSTM autoencoders based unsupervised machine learning for transmission line protection. Electr. Power Syst. Res. 2023, 221, 109432. [Google Scholar] [CrossRef]
Choi, D.; Shallue, C.J.; Nado, Z.; Lee, J.; Maddison, C.J.; Dahl, G.E. On empirical comparisons of optimizers for deep learning. arXiv 2019, arXiv:1910.05446. [Google Scholar] [CrossRef]
Zuo, X.; Chen, Z.; Yao, H.; Cao, Y.; Gu, Q. Understanding Train-Validation Split in Meta-Learning with Neural Networks. In Proceedings of the The Eleventh International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 1–5 May 2023; pp. 1–56. [Google Scholar]
AI Wiki. Epochs, Batch Size, & Iterations. Available online: https://machine-learning.paperspace.com/wiki/epoch (accessed on 6 December 2024).
Choi, S.-W.; Lee, E.-B.; Kim, J.-H. The Engineering Machine-Learning Automation Platform (EMAP): A Big-Data-Driven AI Tool for Contractors’ Sustainable Management Solutions for Plant Projects. Sustainability 2021, 13, 10384. [Google Scholar] [CrossRef]
Orka, N.A.; Awal, M.A.; Liò, P.; Pogrebna, G.; Ross, A.G.; Moni, M.A. Quantum deep learning in neuroinformatics: A systematic review. Artif. Intell. Rev. 2025, 58, 134. [Google Scholar] [CrossRef]
Chen, X.; Van Hillegersberg, J.; Topan, E.; Smith, S.; Roberts, M. Application of data-driven models to predictive maintenance: Bearing wear prediction at TATA steel. Expert Syst. Appl. 2021, 186, 115699. [Google Scholar] [CrossRef]
Nunes, P.; Santos, J.; Rocha, E. Challenges in predictive maintenance—A review. CIRP J. Manuf. Sci. Technol. 2023, 40, 53–67. [Google Scholar] [CrossRef]

Figure 1. Configuration of Run-Out Table in hot rolling plant.

Figure 2. Research process for this study.

Figure 3. Development of an AI-based ROT-PMM model using large-scale sensor data in the Iba platform. ¹ DAQ: Data Acquisition, ² PLC: Programmable Logic Controller, ³ MES: Manufacturing Execution System.

Figure 4. Collected raw data (csv file) from DAQ.

Figure 5. Autoencoder architecture for anomaly detection.

Figure 6. LSTM-AE architecture for ROT-PMM.

Figure 7. ROT-PMM operation mechanism.

Figure 8. Train and validation loss curves by window size: (a) window size 5; (b) window size 10; (c) window size 20; (d) window size 30.

Figure 9. Detection results of abnormal motor by model: (a) model #1; (b) model #3; (c) model #7; (d) model #8.

Figure 10. Test results based on the anomaly ratio.

Figure 11. Actual anomaly detection performance with failure data.

Figure 12. The hierarchy of the ROT-PMM system.

Table 1. Failure frequency and downtime by the main components of ROT.

Failure Type	Failure Image	Failure Frequency	Downtime
Roller bearing damage		4 cases/year	6 h/year
Motor insulation failure		3 cases/year	3 h/year
Coupling breakage		1 case/year	1 h/year
Etc. (power, drive)	-	1 case/year	2 h/year

Table 2. Replacement cycle by the main components of ROT.

Components of Rot	State	Replacement Cycle	Note
Roller	new	5 year	-
	1 reuse	5 year	Bearing replacement repair
	2 reuse	5 year	Bearing replacement repair
Motor	new	5 year	-
	1 reuse	3 year	Motor winding
	2 reuse	2 year	Motor winding
Coupling	new	5 year	-

Table 3. Overview of previous studies.

Category	Methods and Development	Year	Authors
Preventive maintenance	Methodology for Establishing Replacement Schedule through Hierarchical Structure Construction	1997	Wang et al. [17]
	Development of a Mathematical Cumulative Damage Model Based on Impact Values on Facilities	2000	Satow et al. [18]
	Optimal Maintenance Interval Determination Method based on Markov Decision Process	2006	Chan and Asgarpoor [19]
	Economic Evaluation of Replacement Timing based on Deterioration Life Distribution	2007	Crowder and Lawless [20]
	Methodology for Deriving Optimal Solutions Considering Failure Rate, Profit, and Failure Probability	2007	Panagiotidou and Tagaras [21]
	Analysis of Rolling Bearing Fault Diagnosis Techniques through Vibration Analysis	2008	Jayaswal et al. [22]
	Condition Monitoring of Wind Turbines through Vibration, Acoustic, Lubrication, and Temperature Analysis	2012	Márquez et al. [23]
	Analysis of Principles and Application Cases of Non-contact Infrared Thermal Camera Technology for Detecting Abnormal Temperatures in Devices	2013	Bagavathiappan et al. [24]
	Analysis of Latest Online Sensor Technologies for Measuring Lubricant Characteristics	2017	Zhu et al. [25]
	Optimal Condition-based Preventive Maintenance Policy for Balanced Systems	2021	Wang et al. [26]
	A Joint Optimization of Strategic Workforce Planning and Preventive Maintenance Scheduling: A Simulation–Optimization Approach	2022	Akl et al. [27]
	Age-based Preventive Maintenance with Multiple Printing Options	2022	Lolli et al. [28]
	A New Preventive Maintenance Strategy Optimization Model considering Lifecycle Safety	2022	Shi et al. [29]
	Deep Multi-agent Reinforcement Learning for Multi-level Preventive Maintenance in Manufacturing Systems	2022	Su et al. [30]
	Cost-based Preventive Maintenance of Industrial Robot System	2023	Dui et al. [31]
	Multiple Degradation-driven Preventive Maintenance Policy for Serial-parallel Multi-station Manufacturing Systems	2023	Li et al. [32]
	A Hybrid Multi-Objective Evolutionary Algorithm for Solving an Adaptive Flexible Job-Shop Rescheduling Problem with Real-time Order Acceptance and Condition-Based Preventive Maintenance	2023	An et al. [33]
	Designing Preventive Maintenance for Multi-state Systems with Performance Sharing	2024	Wu et al. [34]
	Preventive Maintenance Strategy for Multi-component Systems in Dynamic Risk Assessment	2025	Zhang et al. [35]
Predictive maintenance	Fault Prediction Model for Pumps based on SVM and MLP Algorithms using Temperature, Pressure, and Vibration Data	2020	Orru et al. [38]
	Decision Tree and Random Forest Models for Predicting Anomalous Conditions in Wind Turbines	2020	Hsu et al. [39]
	Regression Model for Detecting Defects in Ship Engines	2020	Cheliotis et al. [40]
	Anomaly Detection Model for Press Machines using CNN and Autoencoder Algorithms	2021	Serradilla et al. [41]
	Bearing Fault Detection in Pump Motor using SVM, k-NN, and Naive Bayes	2021	Khalid et al. [42]
	Fault Detection in Railway Signaling Equipment using NLP and SVM	2021	Shi et al. [43]
	A Study on the design of supervised and unsupervised learning models for fault and anomaly detection in manufacturing facilities	2021	Oh et al. [44]
	Fault Detection of Copper Cable using Random Forest and Digital Twin	2022	Ghazali et al. [45]
	Prediction Model for Tap Temperature using Support Vector Regression Algorithm	2023	Choi et al. [46]
	Failure Prediction Model for Laser Welder using LSTM-AE Algorithm	2023	Choi et al. [47]
	Improved Fault Classification for PdM in Industrial IoT Based on AutoML	2023	Hadi et al. [48]
	Predictive Maintenance Models Utilizing IoT Sensor Data	2024	Chandu [49]
	Condition Monitoring and Predictive Maintenance of Assets in Manufacturing Using LSTM-Autoencoders and Transformer Encoders	2024	Bampoula et al. [50]
	Optimizing Burn-in and Predictive Maintenance for Enhanced Reliability in Manufacturing Systems	2025	Faizanbasha and Rizwan [51]
	Data-driven Drift Detection and Diagnosis Framework for Predictive Maintenance of Heterogeneous Production Processes	2025	Chapelin et al. [52]
	A Real-Time Lightweight Perceptron for Cloud–Edge Collaborative Predictive Maintenance of Online Service Systems	2025	Zhang et al. [53]
Fault prognosis	A Hybrid Prognosis Scheme for Rolling Bearings Based on a Novel Health Indicator and Nonlinear Wiener Process	2024	Guo et al. [54]
	Mission Reliability-Centered Opportunistic Maintenance Approach for Multistate Manufacturing Systems	2024	Yang et al. [55]
	Fault Prognosis for Linear Stochastic Systems with Intermittent Fault and Strong Noise via Health Indicator Extraction Approach	2024	Huai et al. [56]
	Online Industrial Fault Prognosis in Dynamic Environments via Task-Free Continual Learning	2024	Liu et al. [57]
	Deep Learning-Based Prognostics and Health Management Model for Pilot-Operated Cryogenic Safety Valves	2024	Kim et al. [58]
	Rolling Bearing Remaining Useful Life Prediction Using Deep Learning Based on High-Quality Representation	2025	Wang et al. [59]
Research topics for ROT	Temperature Monitoring Technology for Cooling based on Fourier’s Heat Equation	2015	Li et al. [60]
	Proposal of an Index for Quantitative Evaluation of Layer Stability under Various Conditions	2015	Sugihara et al. [61]
	Air Jet Collision System for Decreasing Strip Waves	2018	Woo et al. [62]
	Numerical Simulation based on Multi-Body Dynamics	2019	Aoe et al. [63]
	Proposal of a Method for Adjusting Drive Process Requirements to Minimize Maintenance Costs for Equipment	2019	Luk’yanov et al. [64]
	Thermal Flux Evaluation Method based on Heat Conduction Equation and Experimental Measurements	2022	Tatebe et al. [65]
	Residual Stress Control of Hot-Rolled Strips during Run-Out Table Cooling	2023	Wu et al. [66]
	Cooling Pattern on the Run-Out Table of a Hot Rolling Mill for an HSLA Steel	2024	Yazdani et al. [67]
	Comparative Performance of Jet and Spray Impingement Cooling in Steel Strip Run-Out Table	2024	Jena et al. [68]

Table 4. Data acquisition configuration and data volume for ROT equipment.

Item	Description
Data acquisition environment	Sensor data from the on-site ROT equipment and process data from the PLC were transmitted to and stored on a DAQ (Data Acquisition) system.
Sampling interval	100 ms (0.1 s)
Sampling rate	3660 data points per second
Number of variables collected	431 variables
Collection period	20 March 2023–19 April 2023 (31 consecutive days)
Daily data volume	3660 × 86,400 s = 316,224,000 data points
Total data volume	316,224,000 × 31 days = 9,802,944,000 data points
Estimated data size	≈70 MB per hour
	≈1.7 GB per day
	≈40.3 GB per month

Table 5. Classification of collected ROT equipment operation data.

Category	Type	No. of Item	Remark
Time information	Date	1	-
Time information	Time	2	-
Production information	Coil no.	1	set value
	Steel grade	1	set value
	Thickness	1	set value
	Width	1	set value
	Weight	1	set value
Equipment operation information	Motor current	366	process value
	Rot speed	1	set value
	Dcctl_mode	1	process value
	Dcpolisher	1	process value
	Inverter	24	set value
	Rotspd	25	set value
	Ai_flag	1	process value
	F7 state	1	process value
	DC state	3	process value
Total	16	431

Table 6. Alignment between ROT data characteristics and recommended algorithmic features.

Aspect	ROT Data Characteristics	Recommended Algorithm/Property
Availability of Labels	Absent	Unsupervised learning required
Data Structure	Time-series, high-frequency	LSTM-based architecture
Operating Environment	High temperature, moisture, mechanical impacts	Algorithmic robustness to harsh industrial conditions
Operational Requirement	Concurrent analysis of 366 motors	Scalability for large-scale parallel processing
Real-time Deployment	Mandatory	Reconstruction-error-based anomaly detection

Table 7. Abnormal motor symptoms by the ROT drive equipment.

Cause of Failure		Abnormal Phenomenon
Mechanic	roller bearing	overload current
	coupling	low current, fluctuation
	base bolt	vibration
Electrical	motor bearing	overload current, fluctuation
	motor insulation failure	insulation resistance degradation
	motor cable insulation failure	insulation resistance degradation

Table 8. Number and percentage of motor current data points eliminated at each preprocessing stage.

Stage	Target for Elimination/Explanation	Eliminated Data Points	Total Data Points	Ratio (%)
0. Raw Data	3660 points per second collected over 31 days	0	9,802,944,000	0.00
1. Missing/Negative Values	Null and negative values due to communication errors	1,960,589	9,800,983,411	0.02
2. Failure Buffer Interval	±1 h window before and after coupling failure on 6 January 2023	26,352,000	9,774,631,411	0.27
3. Head/Tail Shock	Head/tail intervals during Finishing Mill ↔ Down Coiler transitions (10 s per coil)	79,422,000	9,695,209,411	0.81
4. IQR Outliers	Values below Q1–1.5 × IQR or above or Q3 + 1.5 × IQR	68,620,608	9,626,588,803	0.70
Total Data Points Elimination	(Sum of stages 1–4)	176,355,197	9,626,588,803	1.80
Remained Normal Data	–		9,626,588,803	98.20
Normalization (Standard Scaler)	Standard Scaler applied (mean = 0, variance = 1)		9,626,588,803
Train/Test Split	80:20 ratio split of remained data		9,626,588,803
Train Set	Allocated 80% ratio for train		7,701,271,042	80.00
Test Set	Allocated 20% ratio for test		1,925,317,761	20.00

Table 9. Anomaly detection classification based on the availability of labels.

Category	Label	Data	Techniques	Remark
Supervised anomaly detection	Used	Normal/Abnormal	CNN, KNN, Naive Bayses, SVM	High accuracy, a limitation in field application due to labeling
Semi-supervised anomaly detection	Used	Normal	AE, CNN, KNN, RNN, GAN, CNN-SVM	Moderate accuracy, necessary to judge normal and abnormal (ambiguous)
Unsupervised anomaly detection	Unused	Normal	AE, RNN, LSTM, GAN, AAE	Moderate accuracy, high field applicability due to labeling-free

Table 10. Comparative evaluation of anomaly detection algorithms for time-series-based PdM in industrial environments.

Category	Algorithm	Temporal Learning Capability	Label Dependency	Anomaly Detection Accuracy	Noise Robustness	Computational Complexity	Remarks
Unsupervised	LSTM-AE	Very High	None	Very High	High	High	Effective for long-term dependencies; reconstruction-based anomaly detection
Unsupervised	GRU-AE	High	None	Moderate	Moderate	Moderate	Faster training than LSTM, but lower detection accuracy
Unsupervised	RNN-AE	Moderate	None	Low	Low	Moderate	Struggles with long-sequence memory (vanishing gradient issue)
Unsupervised	VAE	Low	None	Moderate	Low	High	Probabilistic; limited for temporal anomaly detection
Generative Learning	GAN-AE	Moderate	None	Moderate to High	Low	Very High	Unstable training dynamics
Supervised	SVM	None	Required	High (with labels)	Low	Low	Not suitable for unlabeled or streaming industrial data

Table 11. Hyper-parameters of ROT-PMM.

Hyper-Parameters	Values
Window size	5
Hidden layer	8
Unit	512/258/128/64
Optimizer	adam
Loss Function	Mean Squared Error
Train/Validation Split	8:2 ratio
Epochs	1,000,000
Batch Size	1000

Table 12. Train and validation loss data by window size.

Window Size	Data Shape	Train Loss	Validation Loss
5	52,088, 5, 366	0.0910	0.0946
10	52,084, 10, 366	0.1024	0.1073
20	52,076, 20, 366	0.1148	0.1152
30	52,068, 30, 366	0.9792	0.9675

Table 13. Evaluation of abnormal motor detection performance for different models.

Model	Window Size	Hidden Layer	Unit	Train Data All Motor Normal					Test Data 3ea Motor Abnormal			Model Detection Result	Verification
				Train Loss	Valid Loss	R.E.			R.E.
				Train Loss	Valid Loss	M0901	M1814	M2111	M0901	M1814	M2111
1	5	4	64/32	0.0976	0.0867	0.1359	0.1009	0.0754	1.7292	74.9149	10.2821	2	1ea miss
2	5	4	128/64	0.0822	0.0733	0.1165	0.0807	0.070	1.7442	62.2548	9.7480	2	1ea miss
3	5	4	256/128	0.0719	0.0642	0.0839	0.0703	0.0674	0.4982	51.5812	9.058	2	1ea miss
4	5	4	512/128	0.0688	0.0647	0.8143	0.0657	0.0655	0.4233	51.6788	8.8823	5	1ea miss 3ea error
5	5	6	128/64/32	0.0875	0.0826	0.1233	0.0880	0.0711	1.7001	74.9955	11.5548	2	1ea miss
6	5	6	256/128/64	0.0770	0.0722	0.1037	0.0736	0.0680	1.2695	66.3028	9.9898	2	1ea miss
7	5	6	512/256/128	0.0677	0.0664	0.0821	0.0645	0.0647	0.7577	55.6140	9.7044	3	1ea miss 1ea error
8	5	8	512/256/128/64	0.0724	0.0787	0.1001	0.0666	0.0656	0.6403	72.9670	10.6373	3	correct

Table 14. Confusion matrix for ROT-PMM.

		Predicted
		True	False
Actual	True	True Positive (TP)	False Positive (FP)
Actual	False	False Negative (FN)	True Negative (TN)

Table 15. The results of ROT-PMM performance.

Anomaly Ratio (%)	Classification Elements of Confusion Matrix				Performance Evaluation (%)
Anomaly Ratio (%)	TP	FN	FP	TN	Accuracy	Precision	Recall	F1 Score
27	259	41	12	288	91	96	86	91

Table 16. Data for profitability analysis.

Category	Data	Remark
ROT Failure	2.4 cases/year 2.3 h/year	Number of failures and downtime per mill
TBM Cost	USD 551 K/year	The use of parts on a TBM cycle is assumed

Table 17. The result of benefit analysis.

Category	Economic effect	Remark
Opportunity Cost	USD ¹ 26 K/year	2.3 h/year × 762 ton/h × USD 15/ton
Maintenance Cost	USD 51 K/year	USD 551K/year–USD 500 K/year (10% saving)

¹ The exchange rate (USD/KRW): KRW 1357 as of 2024 average (source: Exchange Rates UK).

Table 18. Comparative analysis of key operational metrics between preventive maintenance (PM) and predictive maintenance (PdM) based on the ROT-PMM case study at Company P.

Metric	Preventive Maintenance (PM)	Predictive Maintenance (PdM)
Failure Rate (per year)	9 failures/year	2–3 failures/year
Maintenance Cost	USD 551 K/year	USD 474 K/year (↓14%)
Downtime Hours	11.8 h/year	2.3 h/year (↓80%)
Spare Part Usage	Full cycle, every 5 years	Extended by ~20% via condition monitoring
Energy Efficiency Impact	Overloads frequent due to wear	Reduced by early anomaly detection
Resource Utilization Efficiency	Moderate	High (maintenance on demand)
Adaptability to Real-Time Conditions	Absent	Full integration via LSTM-AE

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yun, J.-W.; Choi, S.-W.; Lee, E.-B. Study on Energy Efficiency and Maintenance Optimization of Run-Out Table in Hot Rolling Mills Using Long Short-Term Memory-Autoencoders. Energies 2025, 18, 2295. https://doi.org/10.3390/en18092295

AMA Style

Yun J-W, Choi S-W, Lee E-B. Study on Energy Efficiency and Maintenance Optimization of Run-Out Table in Hot Rolling Mills Using Long Short-Term Memory-Autoencoders. Energies. 2025; 18(9):2295. https://doi.org/10.3390/en18092295

Chicago/Turabian Style

Yun, Ju-Woong, So-Won Choi, and Eul-Bum Lee. 2025. "Study on Energy Efficiency and Maintenance Optimization of Run-Out Table in Hot Rolling Mills Using Long Short-Term Memory-Autoencoders" Energies 18, no. 9: 2295. https://doi.org/10.3390/en18092295

APA Style

Yun, J.-W., Choi, S.-W., & Lee, E.-B. (2025). Study on Energy Efficiency and Maintenance Optimization of Run-Out Table in Hot Rolling Mills Using Long Short-Term Memory-Autoencoders. Energies, 18(9), 2295. https://doi.org/10.3390/en18092295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Study on Energy Efficiency and Maintenance Optimization of Run-Out Table in Hot Rolling Mills Using Long Short-Term Memory-Autoencoders

Abstract

1. Introduction

1.1. Background of Study

1.1.1. Introduction to Maintenance and Energy Efficiency in Steel Manufacturing

1.1.2. Hot Rolling Plant Process and Run-Out Table (ROT) Failure

1.2. Problem Statement and Research Objectives

1.3. Overall Research Process

2. Literature Review

2.1. Studies on Preventive Maintenance Optimization

2.2. Predictive Maintenance Using AI

2.3. Recent Studies on Fault Prognosis

2.4. Research Trends for Run-Out Tables

2.5. Limitations of Previous Studies

3. Data Preparation

3.1. Data Collection

3.2. Data Preprocessing

4. Modeling and Training for Run-Out Table

4.1. Model Selection

4.2. ROT-PMM

4.3. Model Training and Fine Tuning

5. Test and Validation

5.1. Test Setup

5.2. Test Results

6. Case Study and System Application

6.1. Case Study

6.2. System Application

7. Economic Benefits Analysis

7.1. Economic Benefits Analysis of ROT-PMM

7.2. Comparative Analysis of Key Operational Indicators Between PM and PdM

8. Conclusions

8.1. Summary and Contribution

8.2. Limitations and Further Study

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI