Next Article in Journal
Magnesium Dross and Ground Granulated Blast Furnace Slag Utilisation for Phosphate Elimination from Water
Next Article in Special Issue
A Deep Learning Framework for Full-Field Thermal Field Distribution Prediction from Digital Image Correlation Strain Measurements
Previous Article in Journal
Research on the Precise Positioning of Mining Working Faces and an Inversion Method for Characteristic Working Face Parameters Based on a Robust Genetic Algorithm
Previous Article in Special Issue
From First Life to Second Life: Advances and Research Gaps in Prognosis Techniques for Lithium-Ion Batteries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Artificial Intelligence-Based Anomaly Detection Technology for Equipment Condition Monitoring in Smart Farms

1
Low-Carbon Agriculture-Based Smart Distribution Research Center, Sunchon National University, Suncheon 57922, Republic of Korea
2
Department of Convergence Biosystems Mechanical Engineering, Sunchon National University, Suncheon 57922, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(23), 12843; https://doi.org/10.3390/app152312843
Submission received: 29 October 2025 / Revised: 2 December 2025 / Accepted: 2 December 2025 / Published: 4 December 2025
(This article belongs to the Special Issue AI-Based Machinery Health Monitoring)

Abstract

In Korea, agricultural policy increasingly promotes high-efficiency digital agriculture; however, insufficient sensor reliability and data accuracy continue to limit the practical adoption of smart farm technologies. To address these limitations, this study aims to develop and field-validate an AI-based Prognostics and Health Management (PHM) framework for anomaly detection and remaining useful life (RUL) estimation of sensors and actuators in commercial smart farms. To collect smart farm data, we developed a switch voltage and current data acquisition system and selected problematic switches and environmental sensors in operating greenhouses as PHM targets. Using PHM techniques, we implemented mathematical and artificial intelligence (AI)-based anomaly detection and failure prediction algorithms. In experiments, sensor behavior was predicted with mathematical and AI models, achieving over 90% predictive accuracy compared with observations. Based on these predictions, thresholds were estimated and the remaining useful life (RUL) of sensors was predicted up to 80 h in advance. For switches, vibration, noise, and voltage data were collected to detect anomalies. Actuator anomaly detection employed thresholds derived from statistical indicators and machine learning; a hybrid approach combining interquartile range, Z-score, and Isolation Forest leveraged the strengths of both paradigms to provide robust and adaptive detection. Deviation features were then combined with environmental factors to construct an RUL model, and the remaining life of devices in operation was estimated using a k-nearest neighbors approach. In field validation, the lifetime of four switches was predicted, yielding a mean RUL of 1655 d. Finally, we implemented a web-based platform that enables farms to monitor and manage equipment health. Compared with prior studies, the key novelty of this work lies in integrating sensor-and-actuator PHM, providing real-field validation in operating greenhouses, and delivering an operational web platform that supports practical smart farm maintenance. By integrating these methods, the study aims to improve system efficiency, reduce energy consumption, and extend the operating life of smart farm components. We anticipate substantial benefits as the proposed approach is applied to smart farm equipment, enabling more reliable data acquisition and stable maintenance in practice.

1. Introduction

In recent years, the Republic of Korea has promoted high-efficiency smart agriculture by integrating information and communication technology (ICT) into horticulture, fruit production, and livestock systems [1]. Smart farming is increasingly recognized as a key framework for digital transformation in agriculture, supporting data-driven production and precision management [2]. National policies have therefore emphasized the development and dissemination of big-data-based smart farm technologies to enhance productivity and operational efficiency [3].
As shown in Table 1, Smart farms in Korea have evolved from first-generation systems that provided simple remote control to second-generation systems offering automated environmental management through integrated ICT platforms [4]. Future third-generation systems aim to incorporate artificial intelligence and robotics for fully autonomous cultivation. As these systems become more complex, ensuring data reliability and preventing failures in sensors and ICT equipment have emerged as critical challenges for stable smart farm operation [5]. To fully leverage smart farm technologies, it is essential to ensure data validity and to establish measures that minimize losses caused by electrical or mechanical malfunctions of agricultural ICT equipment [6,7].
In early smart farm systems, failures occurred frequently because data collection methods were not standardized and equipment was not validated. Typical problems included (i) sensor failures, where sensors acquired incorrect or no data and caused serious control issues, (ii) communication disruptions, in which errors or server failures prevented data from being transmitted or analyzed correctly, (iii) physical failures of actuators operating in harsh environments that resulted in a loss of controllability, and (iv) software errors, where malfunctions in data collection, analysis, or control processes led to significant losses [8,9,10].
Therefore, to build a prognostics and health management (PHM) system for smart farm equipment, we first collect sensor and actuator control data, followed by preprocessing for analysis. During preprocessing, the collected data are integrated, cleansed, and transformed into a structured format for anomaly detection and predictive analysis. In the analysis stage, rather than examining a single data stream, we conduct comparative multivariate analysis that jointly considers external and internal environments, control information, and sensor data to derive correlations. Based on these correlations, we compute key parameter values that determine the timing of predictive maintenance for smart farm equipment. We then apply AI models and compare the proposed system, which implements anomaly detection for smart farm equipment, with an existing smart farm system to evaluate the effectiveness of the PHM framework [11,12,13,14].
Despite the rapid adoption of smart farm technology in Korea, current research lacks an integrated prognostics and health management (PHM) framework that simultaneously addresses sensor reliability and actuator performance degradation under real-world operating conditions. Existing studies primarily focus on sensor anomaly detection or individual device diagnosis, with few validating PHM models in commercial greenhouses using multi-device data.
To address this research gap, this study proposes a comprehensive, field-validated PHM approach tailored to commercial smart farm environments. Specifically, this study aims to achieve three objectives for anomaly detection and remaining useful life (RUL) estimation of key smart farm sensors and actuators:
(1)
Design an integrated PHM framework for environmental sensors and switching devices;
(2)
Implement mathematical and machine learning-based models for anomaly detection and RUL estimation;
(3)
Validate the proposed framework using real-world data collected from an operating smart farm greenhouse.
The remainder of this paper is organized as follows. Section 2 reviews the architecture of smart farm systems and the current state of fault detection technologies relevant to this study. It also compares PHM with anomaly detection approaches in other domains to highlight the distinctive contributions of this work. Section 3 describes the data types and collection methods for anomaly detection, presents the preprocessing steps performed for analysis, derives correlations through data analysis, and applies various AI techniques to identify the optimal anomaly detection and prediction model. The model is then tested and evaluated through field trials. Finally, Section 4 concludes the paper.

2. Related Research

2.1. Smart Farm Systems and Failure Prediction Cases

2.1.1. Smart Farm Overview

In addition to increasing agricultural output and reducing labor requirements, smart farms improve working conditions through the integration of big data technologies that enable optimized decision-making processes for both production and management. By providing a highly controlled growth environment, smart farms make it possible not only to predict harvest times and yields but also to improve product quality and overall productivity.
Environmental conditions are maintained through smart farm software (BANDIBURRI V2 1.0.8), which regulates variables such as temperature, humidity, and CO2 levels in greenhouses and livestock facilities. Monitoring systems automatically collect key data, including temperature, humidity, solar radiation, and CO2 concentrations, to characterize the growth environment. Based on this data, automatic and remote environmental control allows users to operate heating and cooling systems, open and close windows, adjust CO2 enrichment, and manage the supply of nutrients or feed [15,16].

2.1.2. Smart Farm Configuration

As shown in Figure 1 and Table 2, the architecture of smart farms for protected horticulture consists of several critical components: sensor nodes that collect indoor and outdoor environmental data from facilities and crops, controller nodes that regulate facilities and equipment, smart imaging devices that provide video-based observation and monitoring of greenhouses, a smart link that connects greenhouse-level standalone controllers and supervisory control signals to the internet, a farm-level information management system that monitors and configures multiple greenhouse controllers, and various auxiliary devices [17,18].

2.1.3. Smart Farm Failure Prediction Case

Figure 2 shows the Telecommunications Technology Association (TTA) standard TTAK.KO-10.1090 [19], titled “Service Interface for Responding to Equipment Malfunctions in Cloud based Smart Farm Greenhouses.” This standard defines an interface developed to support services required for addressing malfunctions of equipment, such as sensors and actuators, installed in cloud-operated smart farm greenhouses. The interface enables functions including information collection, analysis, decision-making, and alarm notification. By providing such capabilities, the diversity of services available for cloud-based smart farms is expanded, and agricultural information can be delivered efficiently through smart devices.
The scope of this standard is limited to deployments in which the cloud provides error detection functions such as rule-based and AI-based malfunction detection. It operates in conjunction with device configuration and data management services, smart farm monitoring services, smart farm control services, and smart farm operation services, as well as devices installed in greenhouses, including sensor nodes, actuator nodes, hybrid nodes, and gateways.
Notably, this standard does not provide methodologies for anomaly quantification, RUL estimation, or equipment-level condition prediction; rather, it focuses on defining service interfaces and communication structures for malfunction response.
The present study extends beyond the TTA standard in three key aspects. First, it simultaneously addresses sensors and actuators as interconnected components rather than isolated devices. Second, it incorporates data-driven anomaly detection and RUL estimation, enabling proactive rather than reactive maintenance. Third, it validates the framework using operational greenhouse data and implements it in a web-based platform, demonstrating practical applicability beyond interface specifications.
This approach complements the TTA standard by adding predictive maintenance capabilities essential for commercial smart farm reliability.

2.2. Predictive Health Management and Anomaly Detection Methods

2.2.1. Experience-Based Approach

The experience-based method shown in Figure 3 uses failure (lifetime) data obtained during testing or operation, applies statistical processing, and then uses the results to predict system lifetime. After collecting failure data, the information is statistically processed and fitted to an appropriate probability distribution. Among the distributions commonly used for failure prediction, the Weibull distribution is the most widely applied. In the Weibull model, the shape parameter β (beta) characterizes failure behavior, with different values of β corresponding to different failure modes. Once the data are fitted to a Weibull distribution, metrics such as B10 life (the time by which 10% of the population is expected to fail) and the mean time between failures (MTBF) can be derived. These metrics allow for the quantification of system reliability and the planning of maintenance intervals and operating strategies [20,21].
Such methods are valuable because they allow reliability to be evaluated through statistical processing alone. However, they are limited by reduced confidence when data are scarce or of poor quality. To address these limitations, preprocessing and appropriate distribution fitting are essential.

2.2.2. Data-Driven Approach

Data-driven fault prediction methods use machine learning techniques (such as neural networks, Gaussian process models, and relevance vector machines) to learn the relationship between loads (inputs) and damage, and then apply the trained models to monitoring systems to predict future failures.
Machine learning and deep learning approaches for fault prediction and anomaly detection typically begin by collecting sensor data (e.g., temperature, vibration, pressure) along with historical records of faults and normal operation (e.g., failure cases, maintenance history, and operational states). The data are then preprocessed through steps such as handling missing values, normalization, and labeling. From these preprocessed data, mathematical and statistical features, including moving averages, maxima, and minima, are computed to capture correlations between fault data and normal operation data. After feature extraction, an appropriate model is selected and trained.
Common machine learning methods include random forest, support vector machine, and gradient boosting, which can effectively learn nonlinear relationships and are relatively interpretable. For deep learning, recurrent neural networks, long short-term memory networks, and gated recurrent units are widely used for time-series data, making them particularly suitable for fault prediction and anomaly detection [22].
The advantages of data-driven fault prognosis are twofold. First, even without relying on physics-of-failure (PoF) models, the approach can be applied whenever sufficient data are available, making it widely applicable across industries. Second, because the predictive model learns directly from data, it does not significantly alter the structure or operation of the existing system, providing flexibility. However, the approach also has limitations: large amounts of data are required for training; in the early stages, collecting fault and normal data may be time-consuming; and model performance can degrade depending on the quality and diversity of the data. Furthermore, when usage conditions differ, indirect prediction may be feasible, but accurate prediction requires rebuilding and retraining the model [23,24].

2.2.3. Model-Based Approach

As shown in Figure 4, the model-based approach uses a degradation model based on PoF principles, updated in real time with health data, to predict future failures. This approach is built on the physical principles governing how sensors and systems degrade under specific conditions such as stress, temperature, and vibration. By continuously updating the damage model with real-time system state data, it enables accurate predictions of future faults.
The model-based approach offers significant advantages. Because it incorporates physical information, it achieves high accuracy and requires only limited historical failure data; thus, even when such data are scarce, it can outperform experience-based and data-driven methods. Moreover, because it relies on physical characteristics rather than purely statistical trends, it can support long-term prediction of failures. However, this approach requires substantial domain expertise, and its applicability is restricted to fields where relevant PoF models are available [25].
The anomaly detection algorithms used in this study—Local Outlier Factor (LOF), Isolation Forest (IF), One-Class SVM, and a variance-based statistical detector—were selected because they represent three complementary paradigms widely recognized in recent PHM literature: density-based, tree-based, and boundary-based anomaly detection. LOF is effective for capturing subtle local density changes frequently observed in drifting environmental sensors. Isolation Forest isolates abnormal patterns efficiently in high-dimensional greenhouse datasets while maintaining low computational cost. One-Class SVM provides a nonlinear decision boundary for distinguishing normal operational states from abnormal behaviors. Recent research also demonstrates that the proposed combination of anomaly detection methods is robust to noise and drift in smart farm environments [26,27,28,29].
We reviewed three approaches in related research to determine the most appropriate method for this study. First, the experience-based approach is unsuitable because empirical models for smart farm sensors are unavailable and sufficient failure data are lacking. Second, the data-driven approach is deemed suitable, as various machine learning techniques can be applied to sensor data, and a substantial amount of historical sensor data is already available. Third, although the model-based approach is accurate from the perspective of failure mechanisms, it is unsuitable here because physical health data are insufficient and established PoF models do not align well with the present case. Therefore, we selected the data-driven approach and conducted data analysis and experiments accordingly [30,31].

2.2.4. Comparative Discussion with Existing PHM Approaches

Conventional PHM and anomaly detection approaches in smart farming environments typically rely on either single-model classifiers or rule-based threshold methods. These methods can detect simple deviations but often struggle with nonlinear sensor behavior, transient spikes, seasonal variations, and gradual drift caused by environmental fluctuations [32,33,34]. Furthermore, most existing studies evaluate models under laboratory or small testbed conditions, providing limited evidence of robustness in real greenhouse operations [35].
In contrast, the proposed hybrid anomaly detection framework integrates density-based models (LOF, Isolation Forest) with a variance-based detector, enabling the system to capture both local density anomalies and abrupt short-term fluctuations. This complementary design provides stronger resilience against noise and irregular sensor behavior. Additionally, unlike prior work, our framework was validated using long-term field data from commercial smart farms, demonstrating superior performance in detecting early degradation signals and identifying subtle anomalies that traditional PHM techniques often miss. These advantages highlight the practical applicability of the proposed method in real operational environments.

3. Data Collection and Analysis

This section describes the data collection methods used for PHM and for anomaly detection of smart farm equipment. It also presents the preprocessing steps, the anomaly detection model, and the evaluation of model performance through field validation.

3.1. Smart Farm Equipment Health Prediction Management: Data Collection

Prior to data collection, we identified the equipment to be used for prognostics, health management, and anomaly detection. Using statistical analysis of information from smart farm members registered in FarmNote by Narae Trend Co., Ltd. (Bucheon City, Gyeonggi-do, Republic of Korea). [36], we determined the devices for this study. The selected devices included the humidity sensor and CO2 sensor in the sensor category, and the switch in the actuator category.
As shown in Table 3, the humidity sensor and the CO2 sensor exhibited a higher proportion of invalid data compared with other sensors. This reduces sensor reliability and limits the availability of usable data. Consequently, these two sensors were selected as target devices for prognostics, health management, and anomaly detection.
As shown in Table 4, the switch was selected from the actuator category because it is the most widely used device in greenhouses and plays a central role in smart farm control operations.

3.1.1. Smart Farm Data Collection Methods

Methods for collecting smart farm data can be divided into two categories: sensor data collection and actuator data collection. Sensor data are typically integrated into basic smart farm systems and are relatively easy to obtain, whereas actuator data, particularly control data and switch state data, are more difficult to capture. To address this, we developed a system that measures switch states and collects the corresponding data.
The sensor data used in this study were collected using the Firefly product of Narae Trend Co., Ltd. (Bucheon City, Gyeonggi-do, Republic of Korea). Figure 5 shows the configuration of an integrated environmental control smart farm based on the Firefly system [36].
The system collects indoor and outdoor environmental data in the greenhouse through a variety of sensors. Data from environmental sensors connected to sensor nodes are transmitted through the main controller terminal and can be monitored on a central server database (DB) or via the web. Users can also access real-time data through smartphones and check greenhouse conditions via closed-circuit television. The collected sensor data are summarized in Table 5, Table 6 and Table 7.
The detailed specifications of the humidity sensor and the CO2 sensor used for prognostics, health management, and anomaly detection in this study are provided in Table 8 and Table 9.
Because actuator data collection vendors for existing smart farm systems are difficult to identify, we developed a proprietary actuator data collection system through joint research with Narae Trend Co., Ltd. We derived datasets required to diagnose switch states among actuator devices; these datasets are listed in Table 10, Table 11, Table 12 and Table 13.
In addition, we developed software to collect measurement data and conducted voltage and current measurement tests. Figure 6, Figure 7 and Figure 8 show the switch voltage and current measurement test program. The actuator under test is selected and connected to the developed board at the hardware level. After configuring communication between the connected board and the main server, the actuator is operated. Current values vary depending on actuator operation, and when the actuator is subjected to excessive load compared with normal conditions, the current exhibits a pronounced increase.
For testing, switch data were collected on the Narae Trend testbed. An installation photograph is provided in Figure 9.
Finally, to ensure complete reproducibility of the experiments, the detailed data acquisition setup used in this study is summarized in Table 14. Environmental sensor data (humidity, CO2) and actuator operational data (switch voltage and current) were collected from multiple operating greenhouses over a two-year period. All measurements were recorded uniformly at one-minute intervals. Based on their high failure frequency and operational criticality, three humidity sensors, three CO2 sensors, and two switching actuators were selected for PHM testing.
The two-year dataset provides sufficient temporal depth to analyze both long-term degradation patterns and short-term anomalies. With a one-minute sampling interval, approximately 1.05 million observations were collected for each sensor and actuator, enabling robust modeling of both gradual drift and sudden failure events. The selected sensors and actuators were distributed across three commercial greenhouse sites, ensuring environmental diversity in temperature, humidity, crop load, and equipment aging conditions.
To ensure that the experimental design accurately reflects real greenhouse operating conditions, we incorporated data from multiple device types (temperature–humidity sensors, CO2 sensors, and switches) and diverse environmental scenarios, including rapid humidity changes, CO2 fluctuations, ventilation cycles, and day–night transitions. These variations were intentionally included to capture the heterogeneous behavior of sensors under different operational loads and environmental conditions. The dataset therefore contains a wide range of normal and abnormal patterns caused by seasonal effects, mechanical wear, and environmental disturbances, allowing the proposed models to learn realistic degradation and anomaly characteristics.

3.1.2. Smart Farm Data Preprocessing

Based on the data collection methods described in the previous section, we performed preprocessing for prognostics, health management, and anomaly detection of smart farm equipment. Short-term malfunctions such as sudden spikes, communication dropout, delayed response, and abrupt sensor drift were explicitly addressed through preprocessing and model-level strategies. At the preprocessing stage, we applied rolling-window smoothing, outlier re-checking, and missing-value interpolation to manage transient abnormal points while preserving meaningful failure signatures. At the model level, LOF and Isolation Forest were used to detect sudden density deviations and isolated extreme patterns, while the variance-based detector reacted to abrupt short-term fluctuations. These mechanisms allow the proposed framework to identify sudden failures without being overly sensitive to random noise. Sensor data and actuator data were preprocessed separately, and correlations among the datasets were derived.
The sensor data comprised external weather data, internal environmental data, and soil environmental data, which were organized for analysis. Preprocessing steps included removing unnecessary header rows, deleting missing values, correcting duplicate labels, and converting mixed-type fields to ensure correct identification.
To enhance dataset informativeness, we generated derived features. For example, to highlight environmental differences between internal and external conditions, we calculated the differences between indoor and outdoor temperature and humidity. To extract temporal characteristics, time of day was encoded as morning, afternoon, or evening. To capture interactions, we created a combined feature from soil moisture and soil temperature.
Correlation analysis was then performed, with the Pearson correlation coefficient applied to the humidity sensor data. Figure 10 presents the correlation analysis results for the indoor humidity sensor.
The results show that outdoor humidity has a strong positive correlation with indoor humidity (r = 0.60); the difference between indoor and outdoor humidity shows a moderate positive correlation (r = 0.55); outdoor temperature has a moderate positive correlation (r = 0.53); and soil temperature exhibits a low to moderate positive correlation (r = 0.31).
For the CO2 sensor, missing values in numerical data were imputed using the mean, median, or mode, while time-series data were processed using forward fill or backward fill. Because the data were time series, Equation (1) was applied:
f T m i s s i n g = f T count   of   non missing   f T .
To capture diurnal cycles, we incorporated seasonal and monthly variables and added temporal dependencies through lag features such as f C O 2 t 1 . Based on these preprocessed data, a correlation analysis was performed. Figure 11 presents the correlation analysis results for CO2 concentration.
The figure visualizes correlations among variables with respect to CO2 concentration. The major findings are as follows: (1) the CO2 sensor data show a high correlation with the target variable; (2) temperature, humidity, and solar radiation exhibit moderate correlations; and (3) leaf traits are strongly intercorrelated, indicating dependence between leaf size and leaf count, which were used to predict CO2 concentration.
For actuator data, vibration, noise, voltage, and thermal temperature values were collected for both normal and abnormal switches. Preprocessing included removing redundant headers, deleting missing values, and merging thermal, vibration, noise, and voltage data into a single timestamped dataset.
To facilitate time-based analysis, the date column was converted into date–time format. Basic descriptive statistics (mean, standard deviation, minimum, maximum) were computed for numeric columns and are summarized in Table 15 and Table 16.
We used histograms to compare the distributions of vibration, noise, voltage, and thermal temperature under normal and abnormal conditions.
Figure 12 shows that normal observations were concentrated in lower ranges, while abnormal observations displayed a broader spread, suggesting irregular operation and higher variability.
Figure 13 reveals that noise in normal data was clustered at lower levels, whereas abnormal data showed wider dispersion, indicating noise as a potential anomaly indicator.
Figure 14 shows that voltage readings for normal data were concentrated at lower levels, while abnormal data exhibited greater dispersion and occasional spikes, reflecting instability.
Figure 15 shows that normal data were marginally concentrated near the mean, while abnormal data were more dispersed toward higher temperatures. Because switch operation can be influenced by ambient temperature, feature-wise correlations were further analyzed.
Figure 16 shows correlations among diagnostic sensor features. Strong positive correlations were observed between vibration and noise (r = 0.96) and between noise and voltage (r = 0.99). In contrast, thermal temperature did not exhibit strong correlations with other features. These strong inter-feature correlations suggest potential redundancy in predictive modeling. Additionally, both normal and abnormal thermal temperature trends often paralleled outdoor temperature, indicating that thermal temperature may be an inefficient anomaly detection indicator.
Table 17 presents the mean and standard deviation for the normal and abnormal datasets of each sensor.

3.2. Smart Farm Equipment Health Prediction Management Data Analysis

In this section, we describe the process of detecting anomalies in smart farm sensors by estimating sensor data thresholds that distinguish normal from abnormal observations. For the humidity and CO2 sensors, expected normal values were predicted using mathematical and AI models. Based on these predictions, we defined a normal operating range and treated values outside this range as threshold exceedances, enabling sensor health prediction and estimation of the remaining useful life (RUL).

3.2.1. Calculating the Humidity Sensor Abnormal Data Threshold

Using the data collected and preprocessed from the selected humidity and CO2 sensors, we predicted humidity in the smart farm environment based on physical properties and correlations with surrounding variables. For the humidity sensor, we first computed the dew point temperature (the temperature at which saturated moisture condenses for a given absolute humidity) and then estimated relative humidity as a function of temperature, taking into account changes in absolute water vapor content associated with plant evapotranspiration. Sensor readings that fell within the error tolerance of these estimates were considered valid. The mathematical expression for humidity prediction is given in Equation (2):
γ T , RH = ln RH 100 + b T c + T ;   T dp = c γ T , RH b γ T , RH .
Here, T is the current greenhouse air temperature (°C); RH is the current greenhouse relative humidity (%); and b and c are empirical constants chosen to match psychrometric charts, as follows:
  • b = 17.62 and c = 243.12 °C for −45 °C ≤ T ≤ 60 °C (error rate ±0.35 °C);
  • b = 17.27 and c = 247.7 °C for 0 °C ≤ T ≤ 60 °C (error rate ±0.40 °C);
  • b = 17.368 and c = 238.88 °C for 0 °C ≤ T ≤ 50 °C (error rate ±0.05 °C);
  • b = 17.966 and c = 247.15 °C for −40 °C ≤ T ≤ 0 °C (error rate ±0.06 °C).
For convenience, the simplified dew point formula in Equation (3) was used:
dewPoint = 243.12 ln RH 100 + 17.62 T 243.12   +   T 17.62 ln RH 100 + 17.62 T 243.12   +   T .
We then estimated humidity variation with respect to temperature and classified sensor data within the error tolerance as valid. The relative humidity estimation algorithm is shown in Equation (4):
VRH = 100 MRH VDP MT d T .
Here, VRH denotes the variation in relative humidity, MRH denotes the measured relative humidity, VDP denotes the variation in dew point, MT denotes the measured temperature, and dT denotes the change in temperature.
A predictive model was then developed by combining this mathematical formulation with a machine learning algorithm. We adopted the random forest method, which does not assume a predefined relationship between features and the target variable and can capture complex nonlinear interactions. Because greenhouse humidity depends on multiple nonlinear factors (e.g., temperature, CO2 concentration, soil moisture, external weather), random forest provided robust and stable predictions. The ensemble approach further mitigated overfitting and improved generalization to unseen data. Its scalability and adaptability make it well-suited to this study, which involves large datasets and complex environmental interactions.
Figure 17 compares actual humidity observations with predicted values generated by the combined mathematical–random forest model. Predictions closely track observed data. Figure 18 shows a scatterplot of predicted versus observed values and a residual plot. The scatterplot demonstrates alignment with the ideal fit line (y = x), while the residual plot indicates that errors are distributed randomly around zero, confirming the absence of systematic bias.
Model performance is summarized in Table 18.
Based on these predictions, anomaly thresholds were derived by analyzing the residuals (differences between actual and predicted values), as defined in Equation (5):
R e s i d u a l = A c t u a l   H u m i d i t y P r e d i c t e d   H u m i d i t y .
Residuals close to zero indicate normal operation, while large deviations suggest anomalies. A statistical summary of residuals is provided in Table 19.
The mean residual (~0.48) suggests that, on average, actual humidity readings are marginally higher than predictions. The standard deviation (~1.75) reflects variability. For anomaly detection, thresholds were determined using the three-sigma rule, as given in Equation (6):
Thresholds = μ residual ± 3 σ residual
Here, μ residual is the average residual (~0.48), and 3 σ residual is the standard deviation of the residuals (~1.75).
This yields a lower threshold of −4.76 and an upper threshold of 5.72. Residuals outside this range are classified as anomalies.
Figure 19 plots residuals over time. The red dashed line indicates the upper threshold (5.72), and the blue dashed line indicates the lower threshold (−4.76). Data points exceeding these thresholds are flagged as anomalies (shown in orange).

3.2.2. Calculating CO2 Sensor Abnormal Data Thresholds

For the CO2 sensor, which exhibits complex correlations that can be expressed as functions of solar radiation, crop photosynthesis, temperature, and relative humidity, we derived mathematically defined variables for use in the AI model. Equation (7) predicts the CO2 concentration within the smart farm greenhouse:
Y ^ = f C O 2 = β 0 + β 1 f T + β 2 f RH + β 3 f C O 2 t 1 + β 5 f NL + β 6 f LL + β 7 f WL + ϵ .
Here, Y ^ is the predicted CO2 concentration, β 0 is the intercept, β ι are the coefficients for each predictor, ϵ is the error term, f RH represents relative humidity, f T represents temperature, f C O 2 t 1 is the CO2 concentration at the previous time step, f NL is the number of leaves, f LL is the leaf length, and f WL is the leaf width.
To train the model, we quantified the relationships among the features and estimated the coefficients ( β 0 , β 1 , β 2 , , β n ) by minimizing the prediction error using the mean square error (MSE) objective function. Equation (8) defines MSE:
MSE = 1 N i = 1 N y i y i ^ 2 .
Here, N is the total number of data points, y i is the actual CO2 concentration for the ith observation, and y i ^ is the predicted CO2 concentration for the ith observation.
Expressing the model in vector form yields Equation (9):
y ^ = X β + ε .
Here, y ^ represents the predicted CO2 values (vector of size N × 1), X is the feature matrix (N   ×   (n + 1), where n is the number of features), and ϵ is the error vector (size N   ×   1).
X = 1 X 1,1 X 1 , n 1 X 2,1 X 2 , n 1 X N , 1 X N , n
β represents the coefficient vector (N × (n + 1)):
β = β 0 β 1 β n .
The optimal coefficients β are obtained by minimizing the residual sum of squares, as shown in Equation (12):
β = X X 1 X Y
Here, X is the transpose of the feature matrix, X X 1 is the inverse of the covariance matrix, and Y represents the actual CO2 values (vector of size N × 1).
This formulation enabled us to predict CO2 levels in the greenhouse. Model performance is summarized in Table 20.
The RMSE of 1.87 corresponds to an error of approximately 14.65 ppm. Given that greenhouse CO2 concentrations typically range from 300 to 1000 ppm, this deviation is relatively small and demonstrates the model’s predictive accuracy. The R2 score of 0.815 indicates that the model explains 81.5% of the variance in CO2 concentration, with the remaining 18.5% likely resulting from noise or factors not included in the model, such as ventilation rate or external CO2 inflow.
Figure 20 compares predicted CO2 concentrations with sensor measurements. Points closely aligned with the red dashed line (the ideal fit, y = x) indicate strong predictive accuracy. The residual plot shows errors distributed near zero, confirming the model is unbiased and performs consistently across the dataset (residuals are centered around zero with most errors small).
Based on the results shown in Figure 21, the abnormal threshold for the CO2 sensor was derived in the same manner as for the humidity sensor (Equation (5)). The average residual was close to zero, confirming the agreement between the predicted and observed values. A statistical summary of the residuals is presented in Table 21.
Anomaly thresholds were then defined using the three-sigma rule, as shown previously in Equation (6).
Figure 22 plots the residuals of predicted versus actual CO2 values. The red dashed line indicates the upper threshold (43.99), while the blue dashed line indicates the lower threshold (−44.01). Orange points represent anomalies where the residual exceeded these thresholds.

3.2.3. Calculating Thresholds for Abnormal Data in Smart Farm Drivers

To detect anomalies in smart farm actuators, we first estimated thresholds that separate normal from abnormal observations. Using switch diagnostic data (vibration, noise, voltage, and thermal temperature), we predicted composite abnormal behavior by combining a mathematical formulation with an AI model. Based on these predictions, thresholds were established to identify anomalies, assess actuator health, and estimate the RUL.
For actuator anomaly detection, thresholds were determined using both statistical indicators and machine learning techniques. This hybrid approach combined statistical methods—interquartile range (IQR) and Z-score—with an AI-based anomaly detection model (Isolation Forest), leveraging the strengths of both methodologies for robust and adaptive performance. Table 22 summarizes the advantages and disadvantages of the statistical methods.
Using a statistical approach, we adopted the IQR method. For each feature in the normal dataset, Q1 and Q3 were computed, and the IQR was obtained as IQR = Q3 − Q1. The upper and lower bounds were defined as Q3 + 1.5 × IQR and Q1 − 1.5 × IQR, respectively, to identify values that deviate substantially from normal distributions.
For comparison, we also applied the Z-score method. As shown in Equation (13), the Z-score measures how far a value deviates from the mean in units of standard deviation:
z = x μ σ .
Here, x is the observation value, μ is the mean of the dataset, and σ is the standard deviation of the dataset.
Data points falling outside the threshold range derived from Equation (13) were flagged as outliers. Table 23 shows the thresholds derived from statistical techniques applied to the normal dataset.
Building on the statistical thresholds, we performed a combined analysis using an AI model. We applied the Isolation Forest algorithm, an ensemble machine learning method designed for anomaly detection. This algorithm recursively partitions the data space to isolate outliers, assigning higher anomaly scores to points that are easier to isolate. The model was trained on normalized normal data (non-anomalous samples). Figure 23 shows the code used to perform normalization, train the model, and inspect the anomaly score distribution.
Using the hybrid approach, thresholds were derived for normal and faulty switches. Results are shown in Table 24.
During switch operation, STOP values were generally close to zero and did not differ significantly between normal and faulty switches. For this reason, STOP values were excluded, and thresholds were derived for OPEN and CLOSE states only.
These thresholds are data-dependent; applying the hybrid approach to other datasets may yield different values. For vibration and noise, the thresholds for OPEN and CLOSE were identical, whereas voltage thresholds differed (63 for OPEN and 49 for CLOSE). This difference is attributed to increased mechanical load during switch actuation, which alters the required torque and results in higher voltage draw.
Figure 24, Figure 25, Figure 26, Figure 27, Figure 28, Figure 29, Figure 30, Figure 31, Figure 32, Figure 33, Figure 34 and Figure 35 plot a subsample of 1000 observations after applying the derived thresholds.
As shown in the figures, normal switches did not show observations exceeding the thresholds, whereas faulty switches exhibited multiple exceedances. These results demonstrate that the hybrid thresholding approach can effectively distinguish between normal and faulty switch states. The derived thresholds will be used to estimate the RUL of normal switches.

3.2.4. Model Training and Validation

To ensure reproducibility and transparency, the training and validation procedures for all machine learning models used in this study are summarized in Table 25. The dataset was divided into training, validation, and test subsets using a temporal split to preserve chronological order and prevent data leakage. Hyperparameters were selected using grid search or default recommended values depending on the algorithm. All experiments were conducted in Python 3.10 using scikit-learn 1.3, XGBoost 1.7, and TensorFlow 2.12.
The temporal split ensures that the models are trained only on past information and evaluated on future data, reflecting real operational conditions in smart farms. Hyperparameter tuning was performed using 5-fold cross-validation for non-temporal anomaly detection models, while time-series RUL models were evaluated without random shuffling to maintain chronological consistency. The stacking model integrates the strengths of gradient boosting-based and bagging-based learners for improved predictive accuracy. These settings collectively ensure fair evaluation, reduce overfitting, and support full reproducibility of the modeling process.

3.3. RUL Based on Anomaly Data for Smart Farm Equipment

3.3.1. Prediction of Sensor RUL Using Anomaly Data

In this section, based on the thresholds derived for the humidity and CO2 sensors, we predict the RUL of the sensors. We first conduct a deviation analysis of the constructed dataset to identify trends in performance degradation. We then generate a modeling dataset by combining the deviations with relevant environmental factors. After conducting a comprehensive environmental analysis, we build a prediction model and estimate the sensor RUL.
As shown in Figure 36, the mean deviation for the humidity sensor (0.016%) is close to zero, indicating that, on average, the predictions are not biased relative to the observations. The standard deviation is 1.87%, reflecting the spread of deviations around the mean; a larger standard deviation implies lower reliability of predictions. The deviation range (−19.95% to 20.24%) indicates that observed and predicted values may differ by up to approximately 20%.
The deviation trend fluctuates around zero, with spikes up to ±20%. The IQR (25th–75th percentiles) is tightly clustered between −0.35% and 0.37%, suggesting that most deviations are negligible. Outliers may indicate special operating conditions or early signs of performance degradation.
Figure 37 presents the 30 d moving average and corresponding standard deviation. The blue line shows the daily mean deviation (actual humidity − predicted humidity), while the orange line reflects the long-term trend of prediction accuracy. Short error bars indicate consistent and reliable predictions, whereas long error bars suggest increased variability due to environmental factors, sensor degradation, or model inaccuracy.
Overall, the moving average remains close to zero, suggesting no long-term prediction bias. Occasional deviations likely reflect environmental effects or sensor drift. Periods of larger variability may indicate extreme conditions or early stages of sensor wear or failure.
Based on this analysis, we developed an RUL prediction model for the humidity sensor. Models tested included random forest, XGBoost, gradient boosting, and an ensemble. The results are summarized in Table 26.
The MAE represents the prediction error for sensor lifetime. For random forest, the error is about 281 h (~12 d), while the tuned stacking model reduces the error to ~80 h (~3 d). The stacking model combines the strengths of random forest, XGBoost, and gradient boosting, providing superior performance for estimating RUL. By incorporating rolling statistics, the model accounts for short-term variability and volatility, enabling more accurate degradation prediction.
As shown in Figure 38, normal and abnormal observations were evaluated to infer the equipment state as a function of the number of anomalies. The orange line shows actual humidity readings, the blue line represents predicted values, and red points indicate anomalies. Background shading indicates normal, caution, and critical states. The graph shows that before 23 May 2023, observations remained below thresholds. From May 23, anomalies increased, progressing from caution to warning and finally to a critical state by May 27, where observed readings remained fixed at 100. This indicated the need for sensor replacement after 27 May 2023.
For the CO2 sensor (Figure 39), the mean deviation was −0.011, indicating close agreement between predicted and observed values. However, the deviation range (−46 to 70) was relatively wide, suggesting periods of abnormal operation. The standard deviation was 14.67, and deviations beyond this were considered abnormal.
As shown in Figure 40, the moving average (blue line, rolling window of 20) identifies systematic drift over time, while the variability (red line) reflects deviation fluctuations. Peaks in the moving average suggest periods of sensor instability.
For CO2 sensor degradation modeling, we predicted the time at which deviations exceeded the failure threshold, estimating RUL using the same models as for humidity. Results are summarized in Table 27.
For random forest, the MAE was 251 h (~10 d), whereas the tuned stacking model reduced it to 84 h (~3.5 d).
As shown in Figure 41, the orange line indicates actual CO2 sensor readings, the blue line shows predictions, and red points mark anomalies. Background shading again represents normal, caution, and critical states. Symptoms of abnormality emerged after day 200, followed by anomalies exceeding thresholds. Unlike the humidity sensor, the CO2 sensor exhibited a minor upward drift, with anomalies persisting into the critical state, indicating immediate replacement was required.
Finally, using sensor anomaly data, we predicted RUL and compiled the health index in Table 28.
Traditional Remaining Useful Life (RUL) estimation typically relies on statistical and physics-based degradation models such as linear trend extrapolation, exponential decay, Mean Time Between Failures (MTBF), and Weibull reliability analysis. These approaches assume that degradation progresses in a monotonic, smooth, and predictable manner, allowing model parameters to be derived from failure distributions or predefined life-cycle curves. In many industrial applications, such methods function by fitting a degradation trajectory (e.g., a linear or exponential trend) to historical sensor values and forecasting the point at which the measurement will cross a predefined failure threshold.
However, greenhouse sensors and switch actuators rarely follow a monotonic degradation pattern. Their performance is strongly affected by non-stationary environmental factors such as sudden humidity spikes, CO2 fluctuations, daytime–nighttime thermal cycles, mechanical switching loads, and communication dropouts. These disturbances generate irregular drift, intermittent anomalies, and short-term fluctuations that violate the core assumptions of traditional RUL modeling. As a result, classical statistical methods often fail to capture early signs of degradation or predict abrupt failures accurately.
In contrast, the AI-based RUL estimation framework proposed in this study uses ensemble learning models (Random Forest, Gradient Boosting, XGBoost, and a stacking ensemble) to learn nonlinear deterioration patterns directly from multivariate sensor data. These models can represent complex interactions among environmental variables, handle noise and missing values more robustly, and capture subtle changes that precede failure. The field validation results demonstrated that the proposed AI models achieved lower prediction error and earlier detection of degradation trends compared to traditional approaches. Therefore, the machine-learning–based method is more suitable for smart farm environments characterized by heterogeneous sensor behaviors and dynamic environmental variability.

3.3.2. Prediction of Switch RUL Using Anomaly Data in Smart Farm Equipment

In this section, based on the actuator data thresholds, we predict the RUL of the switch. We first analyze deviations in the constructed dataset to identify trends in performance degradation. We then derive the modeling dataset by combining these deviations with relevant environmental factors. After a comprehensive analysis of condition monitoring data, we build a prediction model and estimate the switch RUL.
For the actuator, we applied a k-nearest neighbors (KNN) approach to estimate the RUL under current operating conditions. The device exhibits a degradation trend along its trajectory (amplitude as a function of time). We compare this trend with historical trajectories (blue lines) and identify the k most similar paths based on the progression of shape and amplitude. By examining the points at which these similar trajectories reached failure, we infer the remaining life of the current device.
As shown in Figure 42, the downward trend of the blue trajectories indicates progressive switch degradation over time. The red line represents the baseline for unit-level performance, while the black points denote observed lifetimes and failures, clustering between 150 and 250 cycles. This cluster marks the region where the system typically fails. The spread of the blue trajectories reflects variability in degradation behavior across systems. Using the thresholds derived in the previous section, this failure cluster is represented as a performance degradation threshold on the y-axis.
We also provided a statistical estimate of the remaining cycles to failure based on the probability density function. Figure 43 shows the switch RUL estimate, that is, the predicted remaining life under normal operating assumptions.
The blue curve denotes the probability density function, showing the probability distribution of the RUL. It begins at a high value and decreases asymmetrically, indicating that a shorter remaining life (fewer cycles) is more probable, while a longer remaining life is less likely but still possible. The yellow shaded region corresponds to the 90% confidence interval, representing the range within which there is a 90% probability that the true remaining life falls. The red line (true RUL) marks the actual remaining life, or the number of cycles left before failure, serving as a benchmark for evaluating estimation accuracy. The green line represents the estimated RUL derived from the expected remaining life prediction model (KNN).

3.3.3. Field Validation of Anomaly Detection Based on PHM for Smart Farm Equipment

In this section, we describe how anomaly detection based on PHM for smart farm equipment was validated through field trials, and how the derived results were verified.
Because sensors and actuators in smart farms are affected by on-site environmental conditions, the performance of pre-trained models may not transfer perfectly. It is therefore necessary to apply field data–based models to detect anomalies in sensors and switches and to estimate their RUL.
As shown in Table 29, field tests were conducted at Narae Trend Co., Ltd.’s internal testbed and at operational farm sites to collect data.
Narae Trend’s internal testbed is a small greenhouse with a floor area of 67 pyeong (approximately 221 m2), where switches operate under integrated environmental control. This setup allowed frequent fine switching actions, and the sensors accurately captured the data required to maintain the target environment.
Using the field validation procedure, we derived anomaly detection outcomes and RUL estimates from smart farm equipment data. For sensors, we predicted the RUL of the humidity and CO2 sensors.
Figure 44 shows the RUL estimate for the humidity sensor. The blue line represents actual humidity, and the red dashed line represents predicted RUL. The sensor functioned normally until approximately day 240. Thereafter, anomalies appeared, and the model predicted an RUL of 396 d. In practice, anomalies persisted, and the sensor was replaced at about day 400. Although this marginally exceeded the 3 d error tolerance specified for anomaly detection and RUL estimation, the prediction remained close when considering field conditions.
Figure 45 presents the RUL estimate for the CO2 sensor. For the CO2 sensor, data remained within the normal range with only small errors until about day 710. After this point, multiple anomalies appeared, and the model predicted sensor replacement after 800 d. However, in practice, the sensor was replaced around day 720, earlier than the model prediction and outside the specified error tolerance. A field analysis revealed that the sensor failed to capture rapid changes in CO2 concentration, producing inaccurate readings that the anomaly detection model did not adequately flag. To address this, we augmented the training dataset with such cases and updated the model to better handle rapid fluctuations.
Figure 46 presents the predicted RUL as a function of operating days for four switches. Throughout the test period, the switches operated normally, with no major anomalies or deviations beyond the error tolerance. The predicted curves, however, show a gradual downward trend in lifetime estimates.
As shown in Table 30, the switches exhibit a mean remaining life of 1655 d, indicating a high RUL. In smart farms, switches typically operate fewer cycles than industrial motors and are well protected by housing, allowing them to function normally in outdoor environments. The health percentage indicates the replacement timing derived from fault diagnosis, while “Time (d)” represents the period from prediction to replacement. Based on these results, and in the absence of clear failure causes, switch failures are not expected to occur frequently. However, older switches with long service histories are more likely to exhibit pronounced degradation.
To enhance field verification, we have integrated early warning mechanisms into our anomaly detection platform. During onsite operation, each sensor produces an anomaly score through LOF, Isolation Forest, and the variance-based detector, and this score is converted into a unified health index. The health index is evaluated using two thresholds: a Warning threshold and a Critical threshold. When the index exceeds the Warning threshold, the system detects early-stage degradation and notifies the farmer that the device may require inspection. If the index surpasses the Critical threshold or if repeated anomalies occur within a short time interval (e.g., three or more anomalies within 30 min), the system issues a Critical Alert indicating a high likelihood of failure.
In the field validation, humidity sensors exhibited repeated deviations and exceeded the Warning threshold approximately 3–5 days before replacement, while CO2 sensors generated Critical Alerts 1–2 days before abrupt drift occurred. For switch actuators, intermittent vibration and voltage anomalies led to early warnings 1–3 weeks prior to field-reported malfunction. These findings demonstrate that the early warning subsystem can successfully identify gradual degradation as well as sudden failure risks, enabling proactive maintenance actions.
Figure 47 shows the anomaly detection platform, which is accessible through a web browser. The platform is interfaced with Narae Trend’s integrated environmental control system, allowing users to monitor farm control information and switch operation data in real time, making malfunctions easy to identify. Here, ① indicates the switch configuration installed at the farm site; ② displays the current operating state of the switch; ③ shows the fault risk indicator (safe vs. at risk) produced by the anomaly detection algorithm; and ④ allows the RUL to be estimated from the graph when a fault risk is present.
Overall, the proposed PHM framework successfully addressed the research question identified in the Introduction by demonstrating reliable anomaly detection and RUL estimation for both sensors and actuators. The anomaly detection models achieved high threshold detection performance, accurately identifying deviations in humidity, CO2, vibration, noise, and voltage signals under real operating conditions. For environmental sensors, the mathematical and AI-based prediction models achieved over 90% predictive accuracy, enabling early threshold estimation and RUL forecasting up to 80 h in advance. For actuators, the hybrid anomaly detection method produced stable detection across noisy field data, and the RUL model estimated the mean remaining lifetime of switches as 1655 days, closely matching actual operational behavior.
These quantitative results demonstrate that the integrated PHM approach effectively compensates for the sensor reliability issues and equipment degradation challenges identified in the research gap. The field validation confirms that the framework not only detects anomalies before failure but also provides actionable RUL information, thereby directly fulfilling the objectives and overcoming the limitations of existing smart farm systems noted in the Introduction.

4. Conclusions

This study aimed to develop AI-based anomaly detection methods for PHM of smart farm equipment and to validate their performance in the field. By addressing key issues—sensor reliability, environmental data accuracy, and failure diagnosis and prediction—the study contributes significantly to improving the operational efficiency and sustainability of smart farm systems. As smart farms grow more complex and increasingly reliant on accurate, data-driven decision-making, predictive maintenance for sensors and actuators becomes indispensable. To meet this need, we adopted a PHM approach integrated with machine learning and tailored to smart farm conditions. The methodology involved systematic data collection and preprocessing for major sensors and the application of advanced anomaly detection models, including machine learning and deep learning techniques, to actuator data. The developed models achieved high accuracy in detecting anomalies and in estimating the RUL of smart farm equipment during experiments.
The proposed system first established sensor thresholds to separate normal from abnormal data. For the humidity and CO2 sensors, expected normal values were predicted using mathematical and AI-based models. Using these predictions, normal ranges were defined, and values outside the range were treated as threshold exceedances to infer sensor health and estimate RUL. The predictions achieved over 90% accuracy compared with observations, and thresholding effectively identified anomalies. For actuator anomalies, thresholds based on statistical indicators and machine learning successfully distinguished normal and abnormal data. A hybrid approach, combining the IQR, Z-score, and Isolation Forest, leveraged the strengths of both statistical and AI paradigms to provide robust and adaptive anomaly detection. Field validation further demonstrated the feasibility and deployability of the proposed methods in operational smart farms.
This work carries several implications for smart agriculture. It offers a scalable and reliable approach to equipment health management that can increase productivity and reduce operational risks. By demonstrating the feasibility of AI-based health monitoring, it also lays a foundation for advancing toward third-generation smart farms.
Overall, the results address immediate operational needs in smart farm management and contribute meaningfully to the broader vision of sustainable and efficient digital agriculture.
This study has several limitations that suggest directions for future research. The CO2 sensor case, in which the model underestimated failure risk during rapid concentration fluctuations, indicates that short-term volatility must be modeled more effectively through adaptive thresholds or physics-informed constraints. In addition, the validation was restricted to specific sensors and switching devices deployed in commercial greenhouses in Korea, and the reported performance reflects the characteristics of these environments. Future work will extend the PHM framework to a broader set of agricultural devices—such as ventilation fans, irrigation pumps, nutrient dosing units, shading actuators, and thermal control systems—and incorporate device-specific features and multi-device modeling strategies. Expanding datasets across diverse greenhouse structures, crop types, and climatic regions, along with the integration of physics-informed and adaptive learning modules, will further enhance the robustness and generalizability of anomaly detection and RUL prediction in rapidly changing smart farm conditions.

Author Contributions

Conceptualization, H.-O.C.; methodology, H.-O.C.; software, H.-O.C.; validation, H.-O.C.; formal analysis, H.-O.C.; investigation, H.-O.C.; resources, H.-O.C.; data curation, H.-O.C.; writing—original draft preparation, H.-O.C.; writing—review and editing, M.-H.L.; visualization, H.-O.C.; supervision, M.-H.L.; project administration, M.-H.L.; funding acquisition, M.-H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the IITP (Institute of Information & Coummunications Technology Planning & Evaluation)-ITRC (Information Technology Research Center) grant funded by the Korea government (Ministry of Science and ICT) (IITP-2025-RS-2023-00259703), This work was supported by Innovative Human Resource Development for Local Intellectualization program through the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (IITP-2025-RS-2020-II201489).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to confidentiality agreements with the collaborating company and restrictions related to joint research.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
PHMPrognostics and health management
ICTInformation and communication technology
RHRelative humidity
VRHVariation in relative humidity
RULRemaining useful life
MAEMean absolute error
RMSERoot mean square error
XGBoostExtreme gradient boosting
KNNk-nearest neighbors
AIArtificial intelligence
ECElectrical conductivity
DBDatabase
MTBFMean time between failures
PoFPhysics-of-failure
MSEMean square error
IQRInterquartile range

References

  1. Cheon, Y. Analysis of the Current Status of Smart Farm Technology and Standardization Trends: A Comparative Analysis of Domestic and Overseas Cases. Master’s Thesis, Chung-Ang University, Seoul, Republic of Korea, 2023. [Google Scholar]
  2. Choe, H.; Lee, M. Development of a health management algorithm based on smart farm sensor states. J. Korea Knowl. Inf. Technol. Soc. 2023, 18, 1819–1828. [Google Scholar]
  3. Byun, J.-Y. Analysis on the Status and Future Development of Smart Farming Project; National Assembly Budget Office: Seoul, Republic of Korea, 2022. [Google Scholar]
  4. Choi, Y.-C.; Jang, I.-H. Smart farm in the fourth industrial revolution era. J. Korean Inst. Commun. Sci. 2019, 36, 9–16. [Google Scholar]
  5. Kim, H. A Study on the Diversification of Distribution Channels Through Analysis of Changes in Production and Distribution of Environment-Friendly Agricultural Products. Master’s Thesis, Dankook University Graduate School, Yongin, Republic of Korea, 2018. [Google Scholar]
  6. Jung, J.; Lee, J.; Noh, H. Web-based data analysis service for smart farms. KIPS Trans. Softw. Data Eng. 2022, 11, 355–362. [Google Scholar]
  7. Oh, J. Design and Implementation of a Smart Farm System Based on Machine Learning for Big Data Applications. Ph.D. Thesis, Daegu Catholic University, Gyeongsangbuk-do, Republic of Korea, 2018. [Google Scholar]
  8. Choe, H.; Lee, M. Artificial intelligence-based fault diagnosis and prediction for smart farm information and communication technology equipment. Agriculture 2023, 13, 2124. [Google Scholar] [CrossRef]
  9. Vahdanjoo, M.; Sørensen, C.G.; Nørremark, M. Digital transformation of the agri-food system. Curr. Opin. Food Sci. 2025, 63, 101287. [Google Scholar] [CrossRef]
  10. Kumar, V.; Sharma, K.V.; Kedam, N.; Patel, A.; Kate, T.R.; Rathnayake, U. A comprehensive review on smart and sustainable agriculture using IoT technologies. Smart Agric. Technol. 2024, 8, 100487. [Google Scholar] [CrossRef]
  11. Sharma, V.; Tripathi, A.K.; Mittal, H. Technological revolutions in smart farming: Current trends, challenges & future directions. Comput. Electron. Agric. 2022, 201, 107217. [Google Scholar] [CrossRef]
  12. Dhanaraju, M.; Chenniappan, P.; Ramalingam, K.; Pazhanivelan, S.; Kaliaperumal, R. Smart farming: Internet of Things (IoT)-based sustainable agriculture. Agriculture 2022, 12, 1745. [Google Scholar] [CrossRef]
  13. Liang, S.; Liu, P.; Zhang, Z.; Wu, Y. Research on fault diagnosis of agricultural IoT sensors based on improved dung beetle optimization–support vector machine. Sustainability 2024, 16, 10001. [Google Scholar] [CrossRef]
  14. Friha, O.; Ferrag, M.A.; Shu, L.; Maglaras, L.; Wang, X. Internet of Things for the future of smart agriculture: A comprehensive survey of emerging technologies. IEEE/CAA J. Autom. Sin. 2021, 8, 718–752. [Google Scholar] [CrossRef]
  15. Yeo, U.; Lee, I.; Kwon, K.; Ha, T.; Park, S.; Kim, R.; Lee, S. Research trends and analysis of ICT core technologies for smart farm implementation. J. Bio-Environ. Control 2016, 25, 30–41. [Google Scholar] [CrossRef]
  16. Na, M.; Park, Y.; Cho, W. Study on the optimal factors for tomato using smart farm data. J. Korean Data Inf. Sci. Soc. 2017, 28, 1427–1435. [Google Scholar]
  17. Oh, J.; Kim, H.; Kim, I. Design and implementation of a fruit harvest time prediction system using machine learning. Smart Media J. 2019, 8, 74–81. [Google Scholar]
  18. Smart Farm Korea. Introduction to Smart Agriculture (Facility Horticulture). Available online: https://www.smartfarmkorea.net/contents/view.do?menuId=M01010103 (accessed on 21 August 2025).
  19. TTAK.KO-10.1090; Service Interface against Abnormal Situation of Devices for Smart Farm Greenhouse based on Cloud Computing. Telecommunications Technology Association (TTA): Seongnam, Republic of Korea, 2018.
  20. KSPHM. Prognostics & Health Management Toward Industrial Digitalization; KSPHM: Yeoksam-ro, Republic of Korea, 2019. [Google Scholar]
  21. Foucher, B.; Boullié, J.; Meslet, B.; Das, D. A Review of Reliability Prediction Methods for Electronic Devices. Microelectron. Reliab. 2002, 42, 1155–1162. [Google Scholar] [CrossRef]
  22. Choi, J.-H. Introduction to failure prediction and health management technology. J. KSME 2013, 53, 24–34. [Google Scholar]
  23. Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2020, 115, 213–237. [Google Scholar] [CrossRef]
  24. Zonta, T.; da Costa, C.A.; Righi, R.D.; Lima, M.; Li, G. Predictive maintenance in Industry 4.0: A systematic literature review. Comput. Ind. Eng. 2020, 150, 106889. [Google Scholar] [CrossRef]
  25. Jeong, Y.-E.; Kim, Y.-S. Analysis of domestic research trends on artificial intelligence-based prognostics and health management. J. Korean Soc. Qual. Manag. 2023, 51, 223–245. [Google Scholar]
  26. Borkar, R.; Abhichandani, S.; Vadrevu, N.R.T. A Novel Approach to Detect Anomaly in Payment Transactions. In Proceedings of the 2025 6th International Conference on Data Intelligence and Cognitive Informatics (ICDICI), Tirunelveli, India, 9–11 July 2025; pp. 1222–1229. [Google Scholar] [CrossRef]
  27. Mahajan, A. A Novel Hybrid Model Merging LOF and iForest Algorithms for Insider Threats Detection. In Proceedings of the 2024 4th Asian Conference on Innovation in Technology (ASIANCON), Pimari Chinchwad, India, 23–25 August 2024; pp. 1–6. [Google Scholar] [CrossRef]
  28. Xu, H.; Pang, G.; Wang, Y.; Wang, Y. Deep Isolation Forest for Anomaly Detection. IEEE Trans. Knowl. Data Eng. 2022, 35, 12591–12604. [Google Scholar] [CrossRef]
  29. Asgarov, K.N. Unsupervised Machine Learning Methods for Real-Time Anomaly Detection in Endpoints. J. Mod. Technol. Eng. 2024, 9, 141–155. [Google Scholar] [CrossRef]
  30. An, D.; Choi, J.H.; Kim, N.H. Options for Prognostics Methods: A Review of Data-Driven and Physics-Based Prognostics. In Proceedings of the AIAA Non-Deterministic Maintenance, Monitoring and Prognostics Session, Boston, MA, USA, 5 April 2013. AIAA Paper 2013-1940. [Google Scholar] [CrossRef]
  31. Zhang, H.; Jiang, S.; Gao, D.; Sun, Y.; Bai, W. A Review of Physics-Based, Data-Driven, and Hybrid Models for Tool Wear Monitoring. Machines 2024, 12, 833. [Google Scholar] [CrossRef]
  32. Su, H.; Lee, J. Machine Learning Approaches for Diagnostics and Prognostics of Industrial Systems Using Open Source Data from PHM Data Challenges: A Review. arXiv 2023, arXiv:2312.16810. [Google Scholar] [CrossRef]
  33. Sutharssan, T.; Stoyanov, S.; Bailey, C.; Yin, C. Prognostic and Health Management for Engineering Systems: A Review of the Data-Driven Approach and Algorithms. J. Eng. 2015, 2015, 215–222. [Google Scholar] [CrossRef]
  34. Kundu, P.; Darpe, A.K.; Kulkarni, M.S. A review on diagnostic and prognostic approaches for gears. Struct. Health Monit.-Int. J. 2020, 20, 147592172097292. [Google Scholar] [CrossRef]
  35. Alghassi, A. Prognostics and Health Management of Power Electronics. 2016. Available online: http://dspace.lib.cranfield.ac.uk/handle/1826/10968 (accessed on 21 August 2025).
  36. Narae Trend Co., Ltd. System Introduction (Smart Farm Bandibburi). Available online: http://www.xspark.co.kr/ (accessed on 21 August 2025).
  37. Woosung Hitec Co., Ltd. Natural Ventilation Window Control System (Roll-Up Star). Available online: https://www.wsh.co.kr/ventcontrol2.html/ (accessed on 21 August 2025).
Figure 1. Structure of a smart farm (facility horticulture).
Figure 1. Structure of a smart farm (facility horticulture).
Applsci 15 12843 g001
Figure 2. Farm cloud-based greenhouse equipment malfunction response service scenario.
Figure 2. Farm cloud-based greenhouse equipment malfunction response service scenario.
Applsci 15 12843 g002
Figure 3. Experience-based soundness management.
Figure 3. Experience-based soundness management.
Applsci 15 12843 g003
Figure 4. Basic process for building PHM.
Figure 4. Basic process for building PHM.
Applsci 15 12843 g004
Figure 5. Narae Trend Co., Ltd. (Bucheon City, Gyeonggi-do, Republic of Korea). Firefly complex environment control smart farm configuration diagram.
Figure 5. Narae Trend Co., Ltd. (Bucheon City, Gyeonggi-do, Republic of Korea). Firefly complex environment control smart farm configuration diagram.
Applsci 15 12843 g005
Figure 6. Voltage/current measurement main screen.
Figure 6. Voltage/current measurement main screen.
Applsci 15 12843 g006
Figure 7. Driver node and main device communication setting screen.
Figure 7. Driver node and main device communication setting screen.
Applsci 15 12843 g007
Figure 8. Voltage change screen according to driver operation.
Figure 8. Voltage change screen according to driver operation.
Applsci 15 12843 g008
Figure 9. Testbed driver signal test (installation photograph).
Figure 9. Testbed driver signal test (installation photograph).
Applsci 15 12843 g009
Figure 10. Internal humidity sensor correlation analysis results.
Figure 10. Internal humidity sensor correlation analysis results.
Applsci 15 12843 g010
Figure 11. Correlation analysis for CO2 concentration prediction.
Figure 11. Correlation analysis for CO2 concentration prediction.
Applsci 15 12843 g011
Figure 12. Vibration value distribution.
Figure 12. Vibration value distribution.
Applsci 15 12843 g012
Figure 13. Noise value distribution.
Figure 13. Noise value distribution.
Applsci 15 12843 g013
Figure 14. Voltage value distribution.
Figure 14. Voltage value distribution.
Applsci 15 12843 g014
Figure 15. Heat temperature distribution.
Figure 15. Heat temperature distribution.
Applsci 15 12843 g015
Figure 16. Feature correlations in normal (left) and abnormal (right) switch operation data. Red indicates strong positive correlations, blue indicates strong negative correlations, and gray indicates feature pairs for which correlation values are undefined due to insufficient or missing data.
Figure 16. Feature correlations in normal (left) and abnormal (right) switch operation data. Red indicates strong positive correlations, blue indicates strong negative correlations, and gray indicates feature pairs for which correlation values are undefined due to insufficient or missing data.
Applsci 15 12843 g016
Figure 17. Comparison of actual humidity values with predicted values.
Figure 17. Comparison of actual humidity values with predicted values.
Applsci 15 12843 g017
Figure 18. Scatterplot of actual vs. predicted values (below); residual plot (above).
Figure 18. Scatterplot of actual vs. predicted values (below); residual plot (above).
Applsci 15 12843 g018
Figure 19. Residuals (difference between actual and predicted humidity) over time.
Figure 19. Residuals (difference between actual and predicted humidity) over time.
Applsci 15 12843 g019
Figure 20. Scatterplot of actual vs. predicted CO2 values (above); residual plot (below).
Figure 20. Scatterplot of actual vs. predicted CO2 values (above); residual plot (below).
Applsci 15 12843 g020
Figure 21. Distribution of residuals for CO2 predictions.
Figure 21. Distribution of residuals for CO2 predictions.
Applsci 15 12843 g021
Figure 22. Residuals (difference between actual and predicted CO2 values) over time.
Figure 22. Residuals (difference between actual and predicted CO2 values) over time.
Applsci 15 12843 g022
Figure 23. Ideal score distribution code after model training.
Figure 23. Ideal score distribution code after model training.
Applsci 15 12843 g023
Figure 24. Normal opener vibration data (OPEN).
Figure 24. Normal opener vibration data (OPEN).
Applsci 15 12843 g024
Figure 25. Normal switch vibration data (CLOSE).
Figure 25. Normal switch vibration data (CLOSE).
Applsci 15 12843 g025
Figure 26. Normal opener noise data (OPEN).
Figure 26. Normal opener noise data (OPEN).
Applsci 15 12843 g026
Figure 27. Normal switch noise data (CLOSE).
Figure 27. Normal switch noise data (CLOSE).
Applsci 15 12843 g027
Figure 28. Normal switch voltage data (OPEN).
Figure 28. Normal switch voltage data (OPEN).
Applsci 15 12843 g028
Figure 29. Normal switch voltage data (CLOSE).
Figure 29. Normal switch voltage data (CLOSE).
Applsci 15 12843 g029
Figure 30. Faulty switch vibration data (OPEN).
Figure 30. Faulty switch vibration data (OPEN).
Applsci 15 12843 g030
Figure 31. Faulty switch vibration data (CLOSE).
Figure 31. Faulty switch vibration data (CLOSE).
Applsci 15 12843 g031
Figure 32. Faulty switch noise data (OPEN).
Figure 32. Faulty switch noise data (OPEN).
Applsci 15 12843 g032
Figure 33. Faulty switch noise data (CLOSE).
Figure 33. Faulty switch noise data (CLOSE).
Applsci 15 12843 g033
Figure 34. Faulty switch voltage data (OPEN).
Figure 34. Faulty switch voltage data (OPEN).
Applsci 15 12843 g034
Figure 35. Faulty switch voltage data (CLOSE).
Figure 35. Faulty switch voltage data (CLOSE).
Applsci 15 12843 g035
Figure 36. Difference between actual humidity sensor readings and predicted humidity values.
Figure 36. Difference between actual humidity sensor readings and predicted humidity values.
Applsci 15 12843 g036
Figure 37. 30 d moving average and corresponding standard deviation of humidity deviations.
Figure 37. 30 d moving average and corresponding standard deviation of humidity deviations.
Applsci 15 12843 g037
Figure 38. Humidity sensor anomaly detection graph based on predicted thresholds.
Figure 38. Humidity sensor anomaly detection graph based on predicted thresholds.
Applsci 15 12843 g038
Figure 39. Difference between actual and predicted CO2 values.
Figure 39. Difference between actual and predicted CO2 values.
Applsci 15 12843 g039
Figure 40. Moving average and standard deviation of CO2 deviations.
Figure 40. Moving average and standard deviation of CO2 deviations.
Applsci 15 12843 g040
Figure 41. CO2 anomaly detection graph based on predicted thresholds.
Figure 41. CO2 anomaly detection graph based on predicted thresholds.
Applsci 15 12843 g041
Figure 42. Performance degradation trend using the KNN approach. (Black asterisks (*) represent the observed degradation/failure points of the device, which are used as ground truth for evaluating the KNN-based RUL prediction).
Figure 42. Performance degradation trend using the KNN approach. (Black asterisks (*) represent the observed degradation/failure points of the device, which are used as ground truth for evaluating the KNN-based RUL prediction).
Applsci 15 12843 g042
Figure 43. Estimation of remaining switch life.
Figure 43. Estimation of remaining switch life.
Applsci 15 12843 g043
Figure 44. Humidity sensor remaining life prediction. The yellow-shaded region represents the prediction horizon during which the model forecasts future deviation trends to estimate the remaining useful life (RUL).
Figure 44. Humidity sensor remaining life prediction. The yellow-shaded region represents the prediction horizon during which the model forecasts future deviation trends to estimate the remaining useful life (RUL).
Applsci 15 12843 g044
Figure 45. CO2 sensor remaining life prediction. The yellow-shaded region represents the prediction horizon during which the model forecasts future deviation trends to estimate the sensor’s remaining useful life (RUL).
Figure 45. CO2 sensor remaining life prediction. The yellow-shaded region represents the prediction horizon during which the model forecasts future deviation trends to estimate the sensor’s remaining useful life (RUL).
Applsci 15 12843 g045
Figure 46. Remaining life prediction for switches based on operating days. The yellow-shaded area indicates the forward-prediction interval, during which the model projects future health degradation to compute the RUL of the four switches.
Figure 46. Remaining life prediction for switches based on operating days. The yellow-shaded area indicates the forward-prediction interval, during which the model projects future health degradation to compute the RUL of the four switches.
Applsci 15 12843 g046
Figure 47. Web-based anomaly detection platform for smart farm equipment.
Figure 47. Web-based anomaly detection platform for smart farm equipment.
Applsci 15 12843 g047
Table 1. Smart farm generation classification.
Table 1. Smart farm generation classification.
CategoryFirst GenerationSecond GenerationThird Generation
Commercialization timelinePresent20302040
Intended effectImprovement in convenienceImprovement in productivityImprovement in sustainability
Key functionsRemote facility controlPrecision cultivation and growth managementFull life cycle management, intelligent and automated management
Core technologiesCommunication technologyCommunication technology, big data, artificial intelligence (AI)Communication technology, robots, big data, AI
Decision making and controlHuman/humanHuman/computerComputer/robot
Representative examplesSmartphone greenhouse control systemData-driven growth management softwareIntelligent robotic farm
Table 2. Key components of a smart farm (facility horticulture).
Table 2. Key components of a smart farm (facility horticulture).
CategoryDetails
Environmental sensorsOutsideTemperature, humidity, wind direction, wind speed, rainfall, solar radiation, etc.
InsideTemperature, humidity, CO2, soil moisture (soil cultivation), nutrient solution measurement sensors (electrical conductivity (EC) and pH of nutrient solution), substrate moisture sensors, etc.
Imaging devicesInfrared cameras, digital video recorders, etc.
Facility-level and integrated control equipmentVentilation, heating, energy-saving facilities, shade curtains, circulation fans, hot water and heating water control, motor control, nutrient solution unit control, LED lighting, etc.
Information management system for the optimal growth environmentReal-time monitoring of the growth environment, facility control, and analysis system with a database (DB) of environmental and growth information, etc.
Table 3. Error data for smart farm sensors.
Table 3. Error data for smart farm sensors.
Sensor TypeFarmersError DataNotes
Temperature153623115%
Humidity153667243%
CO2153644128%
Insolation123215612%
Wind direction1227564%
Wind speed1234504%
Precipitation1231665%
Light120026321%
Soil moisture content3183811%
Soil water tension3265516%
Soil temperature3243410%
Table 4. Usage rates of smart farm controllers.
Table 4. Usage rates of smart farm controllers.
Actuator TypeFarmersNotes
Upper window1536100% owned
Side window1536100% owned
Insulating cover807Approximately 53% owned
Shade screen754Approximately 49% owned
Ventilation fan1383Approximately 90% owned
Flow fan1366Approximately 89% owned
Irrigation motor440Approximately 29% owned
Irrigation valve442Approximately 30% owned
Air conditioner171Approximately 10% owned
Table 5. External environmental sensor data,.
Table 5. External environmental sensor data,.
Survey ItemsStandard WordsUnit of MeasurementData Type
Outside precipitationDamp rainO/XMetadata
Outside temperatureOutside temperature°CMetadata
Outside relative humidityOutside relative humidity%Metadata
Outside wind directionOutside wind directiondegMetadata
Outside wind speedOutside wind speedm/sMetadata
Outside average nighttime temperatureOutside temperature (nighttime average)°CMetadata
Outside average daytime temperatureOutside temperature (daytime average)°CProcessed data
Outside maximum temperatureOutside temperature (highest)°CProcessed data
Outside minimum temperatureOutside temperature (lowest)°CProcessed data
Outside average temperatureOutside temperature (average)°CProcessed data
Outside maximum solar radiationInsolation (highest)W/m2Processed data
Outside accumulated solar radiationInsolation (accumulated)J/cm2Processed data
Table 6. Internal environmental sensor data.
Table 6. Internal environmental sensor data.
Survey ItemsStandard WordsUnit of MeasurementData Type
Internal temperatureInternal temperature°CMetadata
Internal relative humidityInternal relative humidity%Metadata
Internal light intensityInternal lightμmol/m2/sMetadata
Internal average nighttime relative humidityInternal relative humidity (nighttime average)%Processed data
Internal average daytime relative humidityInternal relative humidity (daytime average)%Processed data
Internal highest relative humidityInternal relative humidity (highest)%Processed data
Internal lowest relative humidityInternal relative humidity (lowest)%Processed data
Internal average relative humidityInternal relative humidity (average)%Processed data
Internal average nighttime temperatureInternal temperature (nighttime average)°CProcessed data
Internal average daytime temperatureInternal temperature (daytime average)°CProcessed data
Internal highest temperatureInternal temperature (highest)°CProcessed data
Internal lowest temperatureInternal temperature (lowest)°CProcessed data
Internal average temperatureInternal temperature (average)°CProcessed data
Internal CO2 concentrationCO2ppmMetadata
Table 7. Soil environmental sensor data.
Table 7. Soil environmental sensor data.
Survey ItemsStandard WordsUnit of MeasurementData Type
Internal soil ECSoil ECdS/mMetadata
Internal soil liquid ECSoil liquid ECdS/mMetadata
Internal subsoil moistureSoil moisture%Metadata
Internal subsoil temperatureSoil temperature°CMetadata
Table 8. Humidity sensor detailed specifications.
Table 8. Humidity sensor detailed specifications.
ItemsHumidity Sensor
Supply voltage5 VDC
Operating temperature−35 to 85 °C (standard)/−40 to 125 °C (extended)
Operating humidity range0–80% RH (relative humidity; standard)/0–100% RH (extended)
Humidity output−0.3 V to 5.3 V
Humidity accuracy±2% RH (at 25 °C, from 5 VDC)
Humidity transmitting range0–80% RH (standard)/0–100% RH (extended)
Table 9. CO2 sensor detailed specifications.
Table 9. CO2 sensor detailed specifications.
ItemsCO2 Sensor
Sensing methodNondispersive infrared
Measurement range0–10,000 ppm
Accuracy±30 ppm ±5%
Response time (90%)150 s
Sampling interval3 s
Operating temperature range0–50 °C
Table 10. Switch status sensor data.
Table 10. Switch status sensor data.
Survey ItemsStandard WordsUnit of MeasurementData Type
Switch voltageVoltageVMetadata
Switch currentElectric currentAMetadata
Switch vibration levelVibrationmm/sMetadata
Switch noiseNoisedBMetadata
Switch thermal temperatureHeat temperature°CMetadata
Table 11. Switch control data.
Table 11. Switch control data.
Survey ItemsStandard WordsUnit of MeasurementData Type
Upper windowUpper windowOpen/close/stopMetadata
Side windowSide windowOpen/close/stopMetadata
Table 12. ON/OFF control data.
Table 12. ON/OFF control data.
Survey ItemsStandard WordsUnit of MeasurementData Type
Exhaust fanExhaust fanOn/offMetadata
Flow fanFlow fanOn/offMetadata
Table 13. Switch specifications.
Table 13. Switch specifications.
ActuatorModelSpecifications
Applsci 15 12843 i001DC motor
WSM-4035 [37]
Applications: vinyl, nonwoven fabric, thermal insulation covers, horizontal curtain switches
Operating voltage: 24 VDC, 2–12.5 A, open/stop/close
Table 14. Overview of Data Collection Settings.
Table 14. Overview of Data Collection Settings.
CategoryDescription
Data collection period1 January 2023–31 December 2024 (2 years)
Sampling interval1 min logging interval
Sensors monitoredHumidity sensors (3 units)
CO2 sensors (3 units)
Actuators monitoredSwitch-type actuators (2 units)
Approx. total observations per sensor~1,051,200 records per sensor (2 years × 365 days × 24 h × 60 min)
Table 15. Normal data.
Table 15. Normal data.
CategoryVibration ValueNoise (dB)Voltage (V)Heat Temperature (°C)
Count20,94620,94620,94620,946
Mean1.94416933.1046424.4773313.33905
Std1.92037628.2288924.947264.209443
Min0000.63
25%0.0252255.1010.54
50%3.4976548.962534.1713.92
75%3.8466560.647.916.25
Max4.21874.904579.2177532.24
Table 16. Abnormal data.
Table 16. Abnormal data.
CategoryVibration ValueNoise (dB)Voltage (V)Heat Temperature (°C)
Count16,10516,10516,10516,105
Mean2.39831934.0696127.158316.45986
Std2.33334429.2985127.432296.044037
Min0−9.960−0.21
25%0.04125.05012.9
50%4.211441.2540.3915.79
75%4.611762.4752.6218.8
Max689.8369.9935.8
Table 17. Mean and standard deviation values of normal and abnormal data.
Table 17. Mean and standard deviation values of normal and abnormal data.
ValueVibrationNoiseVoltageHeat Temperature
NormalAbnormalNormalAbnormalNormalAbnormalNormalAbnormal
Mean0.440.8733.3734.0723.9327.1513.3416.46
Standard residual0.440.8628.5729.3024.1827.434.216.04
Table 18. Humidity prediction model performance.
Table 18. Humidity prediction model performance.
MeasurePerformance
Mean absolute error (MAE)0.96
Root mean square error (RMSE)1.87
R20.995
Table 19. Statistical summary of residuals.
Table 19. Statistical summary of residuals.
CategoryDifference Value
Mean residual~0.48
Standard deviation~1.75
Minimum residual~−8.22
Maximum residual~9.99
Table 20. CO2 concentration prediction model performance.
Table 20. CO2 concentration prediction model performance.
MeasurePerformance
RMSE1.87
R20.815
Table 21. Statistical summary of residuals for CO2 sensor.
Table 21. Statistical summary of residuals for CO2 sensor.
CategoryDifference Value
Mean residual~0.01
Standard deviation~14.67
Minimum residual~−46.42
Maximum residual~70.89
Table 22. Advantages and disadvantages of statistical techniques (IQR, Z-score).
Table 22. Advantages and disadvantages of statistical techniques (IQR, Z-score).
MethodAdvantageDisadvantage
IQRRobust to extreme outliers and skewed dataLess effective in detecting subtle anomalies in highly volatile data
Easy to calculate and interpret
Z-scoreProvides a probabilistic interpretation of anomaliesSensitive to skewed data and outliers; less effective with non-Gaussian distributions
Table 23. Threshold results for normal data using statistical techniques.
Table 23. Threshold results for normal data using statistical techniques.
FunctionIQR Lower BoundIQR Upper BoundZ-Score Lower BoundZ-Score Upper Bound
Vibration−5.719.85−3.827.71
Noise−78.15143.85−51.58117.79
Voltage−71.85119.75−50.3699.32
Table 24. Switch thresholds using hybrid technique.
Table 24. Switch thresholds using hybrid technique.
CategoryNormal SwitchFaulty Switch
OPENCLOSEOPENCLOSE
Vibration5.25.25.25.2
Noise72727272
Voltage63496349
Table 25. Summary of Model Training and Validation Settings.
Table 25. Summary of Model Training and Validation Settings.
ItemsHumidity Sensor
Dataset partitioning70% training, 15% validation, 15% test (temporal split to avoid leakage)
Cross-validation5-fold CV for anomaly detection models (Isolation Forest, Z-score, IQR); no CV for time-series-based RUL estimation (chronological order preserved)
Random Forestn_estimators = 300, max_depth = 12, min_samples_split = 2
XGBoostn_estimators = 500, learning_rate = 0.05, max_depth = 8, subsample = 0.8
Gradient Boostingn_estimators = 300, learning_rate = 0.05, max_depth = 5
Stacking modelBase models: RF + XGBoost + GB; Meta-model: Linear Regression
Isolation Forestn_estimators = 200, contamination = 0.01
k-Nearest Neighbors (RUL)k = 5, distance metric = Euclidean
Evaluation metricsR2, MAE, RMSE for prediction; precision/recall/F1-score for anomaly detection; MAE for RUL
Software versionsPython 3.10, scikit-learn 1.3, XGBoost 1.7, NumPy 1.24, TensorFlow 2.12
HardwareExperiments performed on workstation with Intel i9-12900K CPU and 64 GB RAM
Table 26. Humidity sensor RUL prediction model performance.
Table 26. Humidity sensor RUL prediction model performance.
DivisionRandom ForestXGBoostEnsembleGradient BoostingStacking Model (Tuned)
MAE281.17372.23361.20501.7380.5
RMSE501.73553.09545.09553.09143.37
Table 27. CO2 sensor RUL prediction model performance.
Table 27. CO2 sensor RUL prediction model performance.
DivisionRandom ForestXGBoostEnsembleGradient BoostingStacking Model (Tuned)
MAE251.18372.23330.10211.7384
RMSE491.42663.09547.95423.09143.33
Table 28. Health index based on sensor RUL prediction.
Table 28. Health index based on sensor RUL prediction.
Outlier RateStatusLifespan Recommendation
<2%NormalSensor’s useful lifespan
<5%Caution1 month
<11%Warning1 week
>11%SeriousReplace immediately
Table 29. Narae Trend Co., Ltd.’s testbed specifications.
Table 29. Narae Trend Co., Ltd.’s testbed specifications.
Applsci 15 12843 i002Applsci 15 12843 i003
Outside the HouseInside the House
LocationNaraetland TestbedCEOChoi Seung-wook
Phone-CropStrawberry
Address24-8 Dongyang-dong, Gyeyang-gu, Incheon
Detail
-
A strawberry cultivation greenhouse with two connected bays and eight beds, floor area about 80 pyeong (~264.5 m2) and effective cultivation area about 67 pyeong (~221.5 m2).
-
Built for equipment testing to improve smart farm products of Narae Trend Co., Ltd.
-
The system controlled environmental devices such as vent openers, circulation fans, exhaust fans, and supplemental lighting.
-
During the summer cropping season (strawberries for export or tomatoes), operational issues were collected and used for functional improvements.
-
A logging device for ICT equipment stored operating logs of environmental control devices under both remote and local control.
-
A test vent opener was added, and intentional malfunctions were induced to test the “emergency notification device.”
Table 30. Predicted remaining life of switches.
Table 30. Predicted remaining life of switches.
SwitchTime (d)Health (%)
Switch 1155020%
Switch 21728
Switch 31509
Switch 41831
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Choe, H.-O.; Lee, M.-H. Artificial Intelligence-Based Anomaly Detection Technology for Equipment Condition Monitoring in Smart Farms. Appl. Sci. 2025, 15, 12843. https://doi.org/10.3390/app152312843

AMA Style

Choe H-O, Lee M-H. Artificial Intelligence-Based Anomaly Detection Technology for Equipment Condition Monitoring in Smart Farms. Applied Sciences. 2025; 15(23):12843. https://doi.org/10.3390/app152312843

Chicago/Turabian Style

Choe, Hyeon-O, and Meong-Hun Lee. 2025. "Artificial Intelligence-Based Anomaly Detection Technology for Equipment Condition Monitoring in Smart Farms" Applied Sciences 15, no. 23: 12843. https://doi.org/10.3390/app152312843

APA Style

Choe, H.-O., & Lee, M.-H. (2025). Artificial Intelligence-Based Anomaly Detection Technology for Equipment Condition Monitoring in Smart Farms. Applied Sciences, 15(23), 12843. https://doi.org/10.3390/app152312843

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop