1. Introduction
With Industry 4.0, fault diagnosis for industrial machines is complex and labor-intensive [
1]. An unforeseen malfunction could cause the entire industrial production to be halted [
1]. In addition, fault prediction in industrial equipment is necessary to save time and money spent on the overall maintenance process [
2]. Detecting fault conditions with traditional methods may create security risks [
1]. For this reason, it is important to detect early signs of failure and predict the time of failure [
1]. Early-stage fault detection is difficult because gradual deterioration occurs under normal conditions. It is important to increase the true detection power by establishing an optimal balance between sensitivity and specificity [
3]. Especially the rapidly increasing number of industries and machines every day confirms this need [
1].
Fault Detection and Diagnosis, a subfield of industrial automation and control engineering, enables the identification of anomalies. Knowledge-based systems based on sensors and parameters have limitations in characterizing new fault conditions. Therefore, it is not suitable for dynamic industrial environments. However, it is possible to overcome these limitations with current approaches. For example, data-driven approaches capture the relationship and pattern between samples in the dataset. This condition identifies all known and unknown failure conditions. This reduces dependency on rules by ensuring compliance with various conditions [
4]. In this regard, artificial intelligence techniques have the potential to produce a suitable solution for condition monitoring and fault detection [
1,
2]. There are three main steps in developing an AI system. These steps are data collection, feature extraction, and AI-based detection and identification. Data collection is the process of storing the characteristics that affect the situation. These characteristics can be temperature, pressure, vibration, or oil analysis outputs. The obtained information is collected through various types of equipment. Examples of this equipment include sensors, accelerometers or compressors [
1]. Data must be provided meticulously within the framework of legal regulations, ethical rules and confidentiality procedures [
5].
Especially, the data processing phase extracts important features that directly affect the final output [
1]. It cleans data that causes outliers. This step is valuable because it increases accuracy and efficiency [
1]. It also helps successfully identify fault patterns [
6].
AI-based detection and identification is the application by machines of cognitive abilities possessed by the human brain. A successfully trained AI-based model provides a robust decision-making mechanism [
1]. AI-based decision support systems enable cost-effective product acquisition thanks to developing computer technologies [
7]. These products existing in real-world scenarios are valuable for Industry 4.0, which plans to increase Production System Reliability and Efficiency with Real-Time Fault Detection and Diagnosis [
4].
The Industrial Artificial Intelligence (IAI) framework was created as a result of the combination of standard industrial processes with artificial intelligence technology [
8]. The IAI framework is based on the concepts of modeling, diagnosis, prediction, optimization, decision-making, and implementation. It demonstrates functionality across a wide business network [
8]. In the future, there is a need to develop new methods or improve existing methods to cope with unbalanced datasets in fault detection and diagnosis [
4]. Various studies aiming to overcome these limitations have been reviewed in the literature. The improved approaches about fault detection within the framework of traditional and recent methods are as follows. In [
9], dynamic Bayesian network was used to improve the accuracy and comprehensiveness of fault predictions. Estimation errors are reduced by this approach since it compensates for deficiencies in data integrity. In [
10], Support Vector Machines (SVM) and k nearest neighbor (kNN) algorithms were used for fault detection and classification in rotating machinery. When the SHAP method is chosen as the feature selection criterion, it is seen that an accuracy rate of over 98.5% is obtained for the SVM and kNN methods. In [
11], a general purpose and real-time fault diagnosis and status monitoring system using edge artificial intelligence (Edge AI) and FIWARE is proposed. This system detected abnormal conditions of autonomous transport vehicle (ATV) equipment with the Edge AI unit. The results were then transferred to the data storage area. It is stated that the proposed system has been successful in real-time monitoring of autonomous transport vehicle malfunctions. In [
12], an artificial neural network-based fault detection technique was used to address faults in rotating electrical machines and to detect short-circuit fault currents in the stator windings of permanent magnet synchronous machines. The proposed technique was reported to be effective for real-time fault detection. In [
8], a scheme using a fuzzy logic system was proposed to detect machine faults in a complex production facility. The proposed scheme was validated using real data and machine health reports. Ref. [
13] demonstrated that a deep learning model trained for one type of failure can be used to predict another. This demonstration was tested on rotating machinery with vibration signals. The results demonstrate that the increased performance in information transfer reduces the need to collect data separately for each type of failure. In [
14], various fault detection models were evaluated using machine learning (ML), deep learning (DL), and deep hybrid learning (DHL) methods. The performance of the proposed algorithms was tested on a predictive maintenance dataset. Experimental results show that over 90% accuracy was achieved for the Deep Forest and Gradient Boosting algorithms. In [
15], CNN, LSTM, and CNN-LSTM models were used to detect machine faults. The dataset obtained from the Microsoft case study consists of fault history, maintenance history, error states, and machine characteristics, as well as sensor data. The evaluation results indicate that the proposed hybrid CNN-LSTM framework delivers reliable outputs with high predictive accuracy. In [
16], a hybrid deep learning framework combining Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) methods was proposed to improve accuracy in predictive maintenance and fault detection in DC motor drives of industrial robots. Comparative analysis results indicated that the proposed CNN-RNN framework can perform faster processing with higher accuracy.
An evaluation of studies in the literature reveals that modern technology has great potential for fault detection. Predicting the possibility of a machine failure before it actually occurs prevents disruption to ongoing business processes. It also provides a more reliable and cost-effective system.
5. Results and Discussion
This study involves three main stages to maximize the final result. The first stage is preprocessing, which allows for the editing, addition, or removal of features that affect performance. The second stage is providing the prepared dataset as input to the IWO-based artificial neural network model. This stage optimizes the weight and bias values used in the artificial neural network approach using the IWO metaheuristic optimization algorithm. The third stage is initializing the initial weights with fractal-based weights instead of random initialization. But all of these stages have the potential to improve performance if appropriate parameters are selected. Therefore, 11 different experiments were conducted to test the artificial neural network model with different layer numbers. The layer numbers and the number of neurons used in these experiments were given in
Table 1. To determine the optimal number of layers for the planned target in the first stage, the parameters of the IWO optimization algorithm were not changed. These parameters were chosen as VarMin = −1, VarMax = 1, nPop0 = 10, nPop = 40, Smin = 0, Smax = 10, n = 1, sigma_initial = 1, sigma_final = 0.0001.
The experimental sets given in
Table 1 were applied for 1000 iterations. Maximum success was achieved for Experiment 6, which had two hidden layers of 20 and 40 neurons on the training and test datasets, respectively. Fine tuning for the selection of the number of layers and the number of neurons increased the success. Therefore, the parameters of the IWO optimization algorithm will be modified along this layer hierarchy for subsequent architectures. Five different experiments were conducted for the fine tuning process of the parameters of the IWO optimization algorithm. These experiments are described below.
The 12th experiment is the retraining of the model in Experiment 6, which achieved the highest success with nPop0 = 10 and nPop = 40, for 1000 iterations with nPop0 = 10 and nPop = 60.
The 13th experiment is the retraining of the model in Experiment 6, which provided the highest success with Smin = 0 and Smax = 10, for 1000 iterations with Smin = 5 and Smax = 15 assignments.
The 14th experiment is a retraining of the model in Experiment 6, which achieved the highest success with n = 1 (n indicates that the number of seeds varies depending on the iterations) for 1000 iterations at n = 1.5. Another difference from the model hyperparameters trained in Experiment 6 is the assignment of Smin = 5 and Smax = 15, which optimized success in the thirteenth experiment.
Experiment 15 is the retraining of the model in Experiment 6, which achieved the highest success with sigma_initial = 1 and sigma_final = 0.0001, for 1000 iterations with sigma_initial = 1.5, sigma_final = 0.007.
The parameters related to the artificial neural network architecture of Experiment 6, which showed maximum performance, were kept constant for Experiment 12 - Experiment 15. However, the parameters related to the IWO optimization algorithm were adjusted. Accordingly, Experiment 16 was retrained for 1000 iterations with the hyperparameters VarMin = −1, VarMax = 1, nPop0 = 10, nPop = 50, Smin = 5, Smax = 15, n = 1.5, sigma_initial = 1.5, sigma_final = 0.0001. The achieved performance metrics are given in
Table 2.
The performance metrics given in
Table 2 show the maximum performance achieved for experiment 16. The performance improvement in this experiment is explained by the fine-tuning of the neural network architecture and the IWO algorithm used to optimize the weights and biases of the neural network architecture. Compared to the architecture trained without tuning its parameters, performance improved by approximately 4%. This increase is significant because the model saturates at accuracies above 95%. Therefore, even a 1% increase represents a gain and indicates an increase in the model’s power by reducing the number of false positives.
All these experiments present a study to obtain appropriate values for the artificial neural network model and the IWO optimization algorithm. The summary of the selected values for Experiment 16, which shows maximum performance, is as follows. Two separate hidden layers were used in the artificial neural network architecture. The number of neurons in these hidden layers is 20 and 40, respectively. The hyperbolic tangent function was used in the hidden layers. In order to find appropriate weight and bias values during the training process of the artificial neural network architecture, the values of the integrated IWO optimization algorithm parameters VarMin, VarMax, MaxIt, nPop0, nPop, Smin, Smax, n, sigma_initial, sigma_final were determined as −1, 1, 1000, 10, 50, 5, 15, 1.5, 1.5, and 0.0001, respectively. Experiments ran for 1000 iterations. Under normal conditions, increasing the number of iterations indicates the model’s potential to obtain a better solution. But, if the solution does not improve, it is necessary to terminate the iteration. Therefore, Experiment 16, running 1000 iterations, was retrained for 2000, 3000, and 4000 iterations. A gradual performance increase was observed at 1000, 2000, and 3000 iterations. However, this increase was prevented by the 4000 iteration value. The model’s performance outputs for 3000 iterations are given in
Table 3.
Table 3 shows the maximum performance with a 3000 iteration value. Therefore, 3000 iteration value was used in subsequent experiments.
The IWO-based ANN algorithm used in this study adopts a randomly assigned approach for the initial population. However, to improve solution quality, the IWO initial population was initialized with fractal-based values derived from the Julia Set. For functions of the form F(z) = z2 + c, the different z values that constitute the Julie Set are the starting points.
In the IWO algorithm, weight assignment is initialized on z values for this study. The hyperparameters of experiment 16, which demonstrated maximum performance, were tested for 500 iterations at different z values. In this direction, the accuracy rates of 0.9748, 0.9699, 0.9766, 0.9783, 0.9726, 0.9779, 0.9752, 0.9713, 0.9730 were obtained in the test dataset for the values 0.1 + 0.1j, −2 − 2j, 2 − 2j, 2 + 2j, −0.8 + 0.156j, −0.4 + 0.6j, 0.285 + 0j, −0.70176 − 0.3842j, −0.1 + 0.651j, respectively. Accuracy rates of 0.9735, 0.9691, 0.9774, 0.9738, 0.9729, 0.9725, 0.9733, 0.9733, 0.9684 were obtained in the training datasets, respectively.
When examining the experiments, the value 2 − 2j assigned to z was chosen because the difference between the training and test datasets is small. This indicates the model’s stronger generalization ability. In addition to the balance in the training and test accuracy rates, the success rate for the training set is higher than for the test dataset. This indicates that the model is more reliable for real-world scenarios. The evaluation results achieved on the training and testing datasets by the 2 − 2j assignment for 3000 iterations are given in
Table 4.
When
Table 4 is examined, the fractal evolution of z initialized with 2 − 2j exhibited a behavior that improved the true positive and true negative detection rate for the test dataset that the model has never seen.
The cost related to the optimization process is given in
Figure 8.
The IWO optimization algorithm whose cost analysis is performed in
Figure 8 was initialized with fractal-based weights. But randomness was not removed in the new offspring generation. When the graph in
Figure 8 is examined based on this information, it is seen that the rate of incorrect predictions in training gradually decreases throughout the iterations. Minimizing the cost as a result of this reduction indicates the success of the model. However, comparing the cost analysis of the improved model is another important issue. In this direction, the ANN based IWO optimization algorithm initialized with fractal-based weights was compared with the ANN based PSO algorithm initialized with fractal-based weights and the ANN based ABC algorithm initialized with fractal-based weights. The costs of the optimization process of these two hybrid structures are given in
Figure 9 and
Figure 10.
The PSO optimization algorithm, whose cost analysis is performed in
Figure 9, was initialized with fractal-based weights. However, randomness was not eliminated in the r1 and r2 assignments used in the velocity update. On the other hand, the ABC optimization algorithm, whose cost analysis is performed in
Figure 10, was initialized with fractal-based weights. However, randomness was not eliminated in the updates of the employed bees, onlooker bees, and scout bees. When the graphs given in
Figure 9 and
Figure 10 are examined based on this information, it is seen that the cost does not decrease gradually throughout the iterations. In addition, the minimum best cost point shown in
Figure 8 was not reached. This indicates that the ANN-based IWO optimization algorithm initialized with fractal-based weights has a higher performance.
The concept of fractal used in this system is complex and unlimited forms. The field of study that examines these forms offers solutions to dynamic systems in the complex mathematical plane. Fractals are produced when even the simplest mathematical expressions are interpreted as dynamical systems. Furthermore, the fact that these fractals are generated by simple systems like z
2 + c makes them accessible and applicable to all disciplines [
22].
In normal conditions, initializing the artificial neural network with random weights ensures that each neuron in the artificial neural network is unique. Thus learning and feature extraction capabilities improves. But this randomness lacks order. Performance fluctuates for repeated training. A poor start with the same architecture and hyperparameters can result in the model positioned directly in a shallow domain. It is possible that the model will get stuck in this area. This situation negatively affects the performance achieved throughout the training process. On the other hand, stability, consistency, and the ability to generalize are crucial for real-world scenarios. It prevents the process from being interrupted and ensures direct achievement of the goal, because it is important to combine the weights in an order. It eliminates the drawbacks of random initialization. Therefore, the fractal-based initialization used in this study is an important part of the optimization process, as it keeps the overall success and stability within a certain framework. The existence of real-life scenarios in a dynamic order confirms the necessity of this system. Also, the generalizability of the improved system is crucial. It should be tested on real-life observations as well as simulated data. For this reason, this approach, which proved its success on simulated data by the Industrial Equipment Monitoring dataset, was re-run on the ReneWind dataset consisting of real data under the same conditions. The evaluation results achieved on the training and testing datasets are given in
Table 5.
Table 5 shows the performance results generated by the IWO-based ANN model initialized with the fractal-based value of 2 − 2j z value of the dataset named ReneWind, which consists of real observations. The training outputs achieved on the training and test datasets show that the false alarm rate is low. Furthermore, the true positive and true negative detection rates are high. These explanations indicate the generalizability of the developed hybrid structure. Additionally, the datasets used in this study (the Industrial Equipment Monitoring dataset and the ReneWind dataset) are unbalanced. In fact, it is difficult to complete the detection process with high accuracy on imbalanced datasets. Especially in fault detection, new methods must be developed or existing methods must be improved to cope with this difficulty. The improved approach plays a complementary role in this respect. In particular, strengthening the detection process before a fault occurs improves net profit. It has applicability because it addresses the need in real-world scenarios.
This study successfully demonstrates how the optimization process is implemented to improve performance. Future analyses of the optimization process with different fractal clusters are planned.