An Intelligent Deep Learning Technique for Predicting Hobbing Tool Wear Based on Gear Hobbing Using Real-Time Monitoring Data

: Industry 4.0 has been an impactful and much-needed revolution that has not only influenced different aspects of life but has also changed the course of manufacturing processes. The main purpose of the manufacturing industry is to increase productivity, reduce manufacturing costs, and improve the quality of the product. This has helped to drive economic growth and improve people’s standards. The gear-hobbing industry, being the most efficient one, has not received much attention in terms of Industry 4.0. In prior works, simulation-based approaches with individual parameters, e.g., temperature, current, and vibration, or a few of these parameters, were considered with different approaches, This work presents a real-time experimental approach that involves raw data collection on three different parameters together, i.e., temperature, current, and vibration, using sensors placed on an industrial machine during gear hobbing process manufacturing. The data are preprocessed and then utilised for training an artificial neural network (ANN) to predict the remaininguseful life (RUL) of a tool. It is demonstrated that an ANN with multiple hidden layers can predict the RUL of the tool with high accuracy. The compared results show that tool wear prediction using an ANN with multiple layers has better prediction accuracy during worm gear hobbing.


Introduction
Machines play a significant role in today's world, and they have transformed virtually every aspect of modern society. They are widely used in manufacturing and production, enabling companies to produce goods faster, more efficiently, and at a lower cost. This has helped to drive economic growth and improve standards of living around the world. Machines have appreciably revolutionised sectors; e.g., in transportation, machines enable people to travel faster, connecting people and businesses across the world, and, in healthcare, machines enable doctors to diagnose and treat patients more effectively. For example, medical imaging machines such as magnetic resonance imaging (MRI) and computed tomography (CT) scanners allow doctors to examine the patient's body without invasive procedures; in communication, machines allow people to communicate, connect, and share information easily using smartphones and social media; in agriculture, helping farmers grow crops more efficiently and sustainably, e.g., with the use of tractors and other agricultural machinery, enables farmers to work larger fields and increase crop yields; and, in energy, the production and distribution of energy, from power plants to wind turbines and solar panels. This has helped to enhance energy security and reduce dependence on non-renewable energy sources. setup of how data is collected, and the proposed method to predict the RUL of the tool; the results are discussed in Section 5; and, in Section 6, the concluding remarks of this study are discussed, along with the possible future direction.

Background and Motivational Statement
Machines are devices that use energy to perform a specific task. Typically, two primary categories of machine elements are referred to as general purpose and special purpose. These components serve as the fundamental structure for a wide variety of different kinds of machines. General-purpose machine elements include components such as screws, nuts, bolts, rivets, and other similar types of fasteners, chains, shafts, keys, bearings, belts, etc. The special purpose elements include, e.g., batteries, gears, ball bearings, springs, shafts, couplings, seals, valves, turbines in a jet engine, blades in a fan, pistons, crankshafts, etc. These elements are used in a particular machine, depending on its purpose and design. All such parts are crucial for the machine's performance [18], and these elements should be checked, changed, or replaced within the period, as malfunctioning of such parts may cause fatal losses.
Generally, human operators have the least concern about the tool's life and run the machines at their maximum capacity. Ignoring the optimum conditions affects the induced vibrations, and high rising temperatures during cutting cause noticeable tool wear at the edges [19]. The tool that may not be monitored by the human operators at a certain condition or at a particular time could cause tool wear. This tool wear affects the surface quality and output precision of the manufactured gear. Figure 1 shows the process of gear hobbing where there is a rotating workpiece with a rotating hob cutter that is inclined at a certain angle to cut according to the specification. Numerous efforts have been made to improve the gear manufacturing process, with different strategies for improving the tool's life. For instance, Mikołajczyk et al. [20] used image processing and experimentally collected data from three cutting edges to detect the tool wear and, by combining both, predicted the tool's life. The use of particle swarm and backpropagation techniques were demonstrated by Sun et al. [21] for predicting the geometric deviation, and the authors claimed that machine learning is an effective approach as it predicts the output based on the available data and calculates better geometric deviation. SVR (support vector regression) is a common ML model that is used for predicting the life of the tool, specifically in a milling machine operation, and Bagga et al. [22] demonstrated that a support vector machine (SVM) outperforms the MVR (multivariable regression) model, resulting in a decrease in downtime with proper predictive maintenance. The combination of different techniques, e.g., using a relatively smaller dataset of tool wear and applying time series methods to predict the previous data, was demonstrated by Kun et al. [23]. Later, the authors applied PCA (principal component analysis) and feed-forward ANNs (artificial neural networks) on the newly collected dataset to predict the tools' wear and, with the help of the ARMA (autoregressive moving average) model, compared the error percentage of the final output.
Tool wear occurs for numerous reasons, such as a high feed rate, cutting speed, temperature, chatter, excessive vibrations, and corresponding factors. Excessive tool wear is not only hazardous for the tool, but there is also a high probability of tool breakage. Tool wear affects not only the tool life but also the production time and quality [24][25][26][27][28]. Surface roughness refers to the physical irregularities that are inherent in the manufacturing process. Several important parameters affect the surface quality, such as chatter, temperature, and remaining useful life (RUL) [29]. Sensors and their placement are crucial to recording the desired parameters. Different works are shown in Table 1 that utilised different parameters to investigate the effect on the remaining useful life (RUL) of the tool. The status shown in Figure 2 depicts the scheduled pieces that are to be manufactured along with the pieces that are produced during the production. The factory is supposed to meet a certain number of orders within the designated time. However, as can be observed in Figure 2, the scheduled quantity is never accomplished due to no shop floor scheduling. If there is any machine that is not operating within the production time, then the load is not distributed among machines, which certainly affects the whole production line, resulting in a waste of time and resources and an increase in cost. In addition, another cause could be poor worker efficiency and higher downtimes, which certainly make it difficult to achieve the desired results. Three important parameters, i.e., temperature, current, and vibrations, are considered for this work. The values of these parameters are recorded with the help of respective sensors to improve the efficiency of the hobbing gear-cutting process. To achieve better placement for respective sensors, Table 1 is used. The real motivation for this study is to provide a deep-learning-based approach for the gear manufacturing process using hobbing gear-cutting tools, based on the values obtained from the parameters, i.e., current, vibration, and temperature, to improve the overall product quality by predicting the RUL of cutting tools. In addition, in developed countries, industries already take care of such parameters to produce reliable and standard machines. In developing countries, these parameters are not taken into account, especially in midsize industries, which leads to inefficient and lowquality manufactured gears. Hence, this experimental-based work can be useful for such industries in developing countries. The approach can also be effective for manufacturers to improve hobbing gear manufacturing, save wastage, avoid malicious cutting tools, and provide a cost-effective solution.

Thermal
PT 100 [33] Interior of a machine tool, interior of a hob spindle's end cover, Inlet pipe for lubricating oil, component column, equipment bed, hob spindle's rear end cap, hob shifting spindle bearing end, outlet pipe for lubricant, hydraulic oil's discharge pipe, workpiece clamping cylinder and bearing end cover of the spindle [33] Thermal deformation and tooth thickness errors are being determined using fuzzy logic clustering in order to model a compensation model for effective dry cutting process.

Power Signal
Montronix PS200 DGM [34] (device) Main power [34] Calculated torque of the hobbing machine using power signal. Developed an ANN model to determine the relation between these two quantities.
Vibration Accelerometer [35] Tool fixture, workpiece fixture [35] Vibration analysis of the gear hobbing machine has been carried out for improvising product quality as vibration is dependent upon certain parameters, which reduces the effectiveness of the machine in the cutting process.

Literature Review
Industry 4.0 puts a significant emphasis on the mechanisation of the industrial process. Machining is a common processing method in the production phase, and automation of machining is the vital component of this phase. In the machining process, the cutting tool is the final executive component that comes into direct contact with the workpiece. This proximity makes the cutting tool susceptible to wear, which impacts the surface quality. The majority of the issues that arise throughout the machining process are due to tool wear and damage. In recent years, much research has be conducted on tool condition monitoring (TCM) approaches that rely on deep learning (DL).

Deep-Learning-Based Models
Artificial neural networks (ANNs), the foundation of deep and representational learning, are one of its key components. A DL model contains an input layer, followed by many hidden layers, and an output layer. Each layer is established by numerous neurons, and the neurons in adjacent layers are fully interconnected with each other to form a network. The degree to which the two neurons are connected is referred to as the weight, and the values of the weights are modified to minimise the amount of output error that is acquired through training samples. To acquire acceptable accuracy, practically all DL-based approaches require a substantial number of learning examples, which is quite difficult for TCM in terms of both cost and time.
Zhou et al. [36] presented a new improved multi-scale edge-labelling graph neural network (MEGNN) as a means of increasing the recognition accuracy of deep-learning-based TCM when working with limited sample sizes. Applications of the proposed MEGNNbased approach to the PHM 2010 milling TCM dataset, the approach discussed in [36], reveal that it outperforms three DL-based methods (CNN, AlexNet, and ResNet) when working with tiny samples. D'Addona et al. [37] presented the utilisation of two natureinspired computing techniques, ANN and (in silico) DNA-based computing (DBC), to regulate tool wear. These methods are known as ANN and DNA-based computing, respectively. The ANN was trained using experimental data, which comprised photos of the cutting tool's worn zone. This information was then used to carry out the DBC. It has been demonstrated that the ANN can predict the degree of tool wear from a set of tool-wear images processed using a specific procedure, whereas the DBC can identify the degree to which the processed images are similar to or different from one another.

AI in Manufacturing
There has been a growing application of data-driven methodologies to machinery prognostics and maintenance management, which has resulted in the transformation of legacy manufacturing systems into smart manufacturing systems using artificial intelligence. The rapid development of artificial intelligence has led to the creation of a wide variety of machine learning algorithms, which have now found widespread use in a variety of engineering subfields. Wu et al. [38] presented a prognostic method for tool wear prediction based on random forests (RFs), and the authors compared the performance of RFs with that of feed-forward backpropagation (FFBP) artificial neural networks (ANNs) and support vector regression (SVR). To be more specific, the effectiveness of FFBP ANNs, SVRs, and RFs are evaluated through the utilisation of experimental data obtained from 315 milling tests. The outcomes of several experiments have demonstrated that RFs are capable of producing more accurate predictions than FFBP ANNs equipped with a single hidden layer and SVR. Mahmood et al. [39] proposed a method that extracted the optimum conditions for the ball mill to achieve the desired surface finish for the process. Prediction of the tool life and the effect of hot machining on the tool life can be made using backpropagation ANN in hot machining.

Sensor Based Monitoring
An experimental approach was presented by Kuntolu et al. [40], in which they took into consideration five distinct sensors and adapted them to a lathe to gather data and analyse the capability of each sensor in reflecting tool wear. During the process of turning AISI 5140 using coated carbide tools, measurements are taken of cutting forces, vibration, acoustic emission, temperature, and current. The data that are obtained indicate that temperature and acoustic emission signals appear to be effective approximately 74% of the time for flank wear. In addition, a high level of accuracy is achieved when the fuzzylogic-based prediction of flank wear is carried out with the assistance of temperature and sound emission sensors. This confirmed the sensors' suitability for use in sensor fusion. Patra et al. [41] created a tool condition monitoring system for micro-drilling employing a tri-axial accelerometer, a data gathering and signal processing module, and an artificial neural network. An artificial neural network (ANN) model was created to forecast the drilled hole number by fusing the RMS values of all three directional vibration signals, as well as the spindle speed and feed parameters. The ANN model predicted the drilled hole number, which is in good agreement with the experimentally obtained drilled hole number. It has also been demonstrated that the neural network model produces less inaccuracy in hole number prediction than the regression approach.

Other Works
Zou et al. [27] proposed a reliable online hob wear state monitoring system for dry gear hobbing machines, considering the adverse effects of both the thermal-induced error of the machine tool and the ununiform machining allowance of workpieces on the characterisation ability of the hob spindle power signal. The approach relies on tracking energy usage and thermally induced inaccuracy as the thermal deformation evolves with time. Workpiece machining allowances are also gathered as their typical value. More cutting can be done with high-speed dry gear hobbing than with alternative methods, but it also has several downsides. Some of these negatives include increased tool wear due to a higher cutting force and temperature. Other works proposed developing a model for optimisation of the gear finish in the hobbing process [42][43][44] and the use of machine learning models to predict the RUL [4,17]. Cheng et al. [19] provided a mechanism to test how high-speed dry gear hobbing can affect tool wear. The work demonstrated a theoretical basis for working out how much cutting force and tool wear to expect using the undeformed geometry chip. The simulation results are compared to the experimental findings, and the simulation shows good agreement with the experiments in terms of the shapes and patterns of tool wear.
To the best of our knowledge, certain works adopt deep learning models (e.g., ANN) and machine learning models (e.g., SVM, logistic regression, etc.) for hobbing process tool wear, but most are simulation-based. A few other works, as discussed earlier, also proposed simulation-based experimental methods to predict tool wear. Here, we propose an experimental-based approach by considering three important parameters, i.e., temperature, current, and vibrations altogether, which are recorded using respective sensors to improve the efficiency of the hobbing gear-cutting process. The collected raw data are normalised and utilised for training and testing the deep learning model, i.e., ANN with single and multiple layers for predicting the RUL of a cutting tool.

Materials and Methods
This section presents the proposed methodology for employing the hobbing process for tool wear prediction using an artificial neural network (ANN) with single and multiple layers. This section goes into additional detail regarding the process of data collection, the initialisation of the experiment, and the implementation of ANN.

Mathematical Model
For experimentation, the linear regression model and ANN are used to predict the remaining useful life (RUL) of the tool. Linear regression, the most common type of statistical tool, is used to model a relationship between an input variable and an outcome. The main variables are x and y [45]. The variables help to predict important outputs in certain situations. Next, to find out how far the predicted data are from originality, the root mean squared error (RMSE) is used.
The mean squared error (MSE) using Equation (2) to define the error between the actual and target value where y is the actual andȳ is the predicted value.
In the case of curve fitting, ANN is employed for the identification of the data pattern in the input and finding the relevant target value [46] to provide predictions and approximations when other data are given as input after the relationship is established, which is discussed in Section 4.5.

Experimental Setup
The experimental setup is shown in Figure 3. The parameters, i.e., current, vibration, and temperature, are considered for predicting the RUL of the hobbing tool. It is because, as the manufacturing of the hobbing tool proceeds, more current is required to operate the process, which results in the decay of the tool. Similarly, an increase in vibration in machines and a rise in temperature decreases the RUL of the hobbing tool. To record the values of the current and vibration parameters, thermal sensors are placed at specified positions to observe the hobbing tool's behaviours, as shown in Figure 4. The current sensor is placed on the main circuit that is feeding the motor current from the voltage main circuit. Two vibration sensors are placed to record vibration values, i.e., one sensor on the main body of the machine and the other on the tool post near the cutting tool. The temperature sensor is placed close to the cutting tool. Since other machines are running nearby, which might affect the reading of the machine and tool post-vibration, two vibration sensors are positioned to analyse the machine's actual vibration and eliminate nearby machines' vibration effects. The temperature sensor is placed on the cutting tool. The temperature sensor is stowed to obtain the temperature difference between the gear manufactured by the sharp tool and the blunt tool.  The introduction of microelectromechanical systems (MEMS) [47] provides a low-cost sensor technology that is used in industrial environments to obtain sufficient information for data training. Under observation, the machine used for sensor implementation is an old USSR hobbing machine with no sensors installed previously. The temperature of the workpiece is measured due to continuous oil flow on the tool. Since the temperature of the tool cannot be measured directly, an infrared temperature sensor is installed close to the workpiece that constantly rotates, and the temperature just after the cut is noted.
To calculate the actual vibration of the machine, the tool post-vibration reading is subtracted from the real machine vibration. The current transformers with a measuring capacity of 30 amps are attached to the main supply to determine the increase in current consumption over time. The specifications and parameters, e.g., feed rate, cutting time, etc., of the gear hobbing machine, i.e., USSR-5E32, are portrayed in Table 2.

Methodological Steps
The pictorial demonstration of the methodological steps is shown in Figure 5. The methodological steps are as follows:

1.
Data collection: the placed sensors are used to collect data on the parameters, i.e., current (I1, I2, I3), vibration (X, Y, Z), and temperature (T1), that affect the RUL and production time. The data help determine the machine's downtime. The complete details of data collection are discussed in Section 4.4.

2.
Data filtration: apply a filter to remove the noise from the data. The process is further described in Section 4.4.

4.
Assign weights: assign random weights to start the algorithm. The raw signal is checked to see whether noise is present or not. If noise is present, then the initial coordinates are identified and weights are assigned to the coordinates. The whole process is described in Section 4.4.

5.
Dataset training: the normalised data are used to train the ANN and predict the RUL (80% for training and 20% for prediction).

6.
Rate of activation: find the rate of activation of hidden nodes and their connections to the output and find out how often output nodes are turned on. 7.
Error rate calculation: find the error rate at the output node and recalibrate all the linkages between the hidden nodes and the output nodes. 8.
Applying weights and errors found: using the weights and errors found at the output node, cascade down the error to hidden nodes. 9.
Weight recalibrate: recalibrate the weights between the hidden node and the input nodes. 10. Process repetition: repeat the process until the convergence criteria are met. 11. Apply the final linkage weight score: using the final linkage weight score, determine the activation rate of the output nodes (The complete details from step 5 to 11 are available in Section 4.5).

Data Collection and Acquisition
As shown in Figure 4, the sensors deployed on the machine are used to record the data every second and store them in the local server's database. and the Arduino microcontroller [48] is used to accumulate the time series data from the sensors, and the data are transferred to the local server. Due to the presence of noise in the raw signals, it is required to normalise the raw signal data for further analysis. Accelerometers are equipped to measure the increase in vibration at the tool post, and an infrared temperature sensor is placed close to the cutting tool, and the current motors are equipped with a 3-phase power supply to measure the current consumption throughout the process. The sample of collected raw data is attached in Appendix A.1. The sample dataset shows the collection of date, time, vibration (X1, Y1, Z1, and X2, Y2, Z3), current (I1, I2, I3), and temperature (T1). Initially, the data are collected every 3-4 s as the machine starts. When the machine is switched off, the current is at its minimum, only getting noise (leakage current). As soon as the manufacturing of the tool is initiated, the current, vibration, and temperature start increasing. The complete raw dataset is available from [49].
For data preprocessing, a bandpass filter (BPF), power spectral density (PSD), a smooth filter (locally weighted running line smoother-LOESS [50]), and a median filter (MF) are used to filter data from the sensors. Firstly, the 1D median filter is used to reduce the high spikes in the corresponding signal. These high spikes are garbage values generated by the sensors. To filter vibration signals, the PSD function is used to find the power density with respect to frequency. This helps to determine the frequency range where the data are present, and the remaining high frequencies are the noise. To filter current and temperature signals, a smooth filter is equipped that helps to reduce uneven spikes and smooths the signal to some extent because the machine's motor is winding a few times. This increases the noise and garbage values, making the signal noisy with high garbage values. Lastly, a bandpass filter helps clean up data, particularly where large amounts of gain have been added. Figure 6 shows filtered data on vibration. The plot shows on the y-axis the total number of values for vibration in metres per second squared and on the x-axis the sample values that are passed to record the vibration. The sample normalised data are shown in Appendix A.2. The complete normalised data are available here [49]. In the industrial environment, the real challenge during the data collection phase is that the circuitry is facing back EMF (electromotive force), which results in bursting the circuitry. Initially, raspberry pi was used, but it did not work because of the EMF issues, and the raspberry pi circuit was burned. After applying certain protection to the circuitry, the data are collected. Figure 7 exhibits the filtered data's current signals. The plot shows the value for current, i.e., 1-6 AMPs, and on the x-axis the sample current values that are passed to record the current. Figure 8 indicates a comparison between a mounted sharp tool and a mounted blunt tool based on vibration signals.  Referring to Figure 8, the green and red waves display the effects of the vibration of the sharp and blunt tools. It can be observed that the blunt tool produces a higher vibrational magnitude compared with the sharp tool. This affects product quality and increases the waste of resources. Vibration also plays a significant role in the manufacturing operation of gears. Gear tool manufacturing with a blunt tool not only utilises more time to operate but also affects other machine parts such as motor winding due to the supply of high current, in addition to temperature increases. These factors gradually decrease the machine's life.
The reason for the faulty tool manufacturing and/or a part of the tool being damaged is also due to the fact that workers cannot have information if the tool's life has expired unless it starts damaging the manufactured gear. Hence, without any data collection or computation, it is quite difficult to predict the remaining useful life (RUL) of the gear tools, and it can have certain impacts on the production line of the gears, resulting in an overall decline in the machine's efficiency and increasing the cost, power consumption, and manufacturing time. During production, it is also observed that the tool wear gradually increases and the production time for each piece also increases from a minimum of 24 min to a maximum of approximately 28 min. There is a noticeable enlargement in the current and temperature signals with time and tool wear. After collecting the data from the expired tool, a sample of the gear produced using a sharp tool is compared with the sample produced using a blunt tool, and both pieces are analysed. Figures 9 and 10 show both the samples of the pieces manufactured using a mounted sharp tool and a blunt tool. It is quite visible that blunt tools affect product quality compared with sharp tools.

Classification and ANN Implementation
To perform data analysis on the obtained normalised dataset (available at [49]), the classification process is initiated by importing the gathered data and the required utilities. All sensor values, including acceleration on different axes, current, and temperature, are compared against the target value. The target value (desired output) is a tool condition to check whether any variable has a strong linear positive or negative relationship with the target variable. The target variable for predicting the RUL is given in Table 3. The range starts from 0% (sharp state) to 100% (blunt/worn tool). Between 0% to 100%, all states are divided into 5 classes, starting from 0% as class A to 100% as class F. The reason for dividing target outputs into different classes is to examine tool condition(s) at different levels and/or time intervals. An artificial neural network (ANN) model is desired for remaining useful life (RUL) prediction. In ANN, all neurons are composed of activation functions, which allow them to switch their activated mode from on to off. All neural networks learn trends from the data during the training phase. Later, static bias, which is unique for each layer, is added to the sum of all input values and weights using Equation (3). X is the value of each input and W is the weight associated with each input, and the total of each input is added with a bias to obtain final values.
where X = inputs, and W = weights. The values obtained using Equation (3) are transferred to the activation function (each neuron has its activation function), and the activation function passes the final value to the neuron. Having numerous activation functions, whose usage depends on the nature of the data, neurons are trained on these input values. Finally, the loss function is calculated in the last layer (known as the 'final neural net layer'). The loss function calculates actual vs. expected output. Once the loss function is achieved, backpropagation is applied. The purpose of backpropagation is to reduce minimum loss, and this helps neural networks produce more accurate results. In backpropagation, the ANN updates all its weights from the end layer to the first layer. To initiate the prediction process, the collected data along with the necessary utilities are important for training and testing the ANN model. Each entity is compared against the target value. The predicted target value is the RUL of the tool. Each column of the dataset is compared against the target value. There are about 14 columns for input variables. If, at a certain stage, the tool is in mid-range (Class C from classification), how long it lasts indicates tool life prediction. The dataset is divided into 80% for training and 20% for testing. The model was trained on 122,778 datasets (final samples) and 30,695 as test datasets. Figure 11 shows a pictorial view of the input and output parameters of ANN. The code implementation of the model is available at [49].

Result and Discussion
Downtime is the idle time when a machine is not manufacturing any tool part(s), while, in the production phase, this may be due to workers' mismanagement. The reason behind analysing the downtime and working time is to examine when a machine is not operational and to improve the manufacturing process.The number of working hours of a worker is around 8 , and a worker should ideally produce 26 to 30 parts daily. However, it is observed that the duration a worker takes after completion of one workpiece to initiate the process of making the next workpiece does not justify time efficiency, which is a major reason for downtime. From the collected data, the overall time the workers take to manufacture one part after the next is calculated, and it is used to evaluate the workers' efficiency.
The objective of this experiment is to check how workers' performance can affect the gear hobbing process. From Figure 12, it can be confirmed that the workers' efficiency is never higher than 80% although the slope shows a positive gradient. Yet, the workers' efficiency is not up to par. It is also due to unmonitored downtime and/or human inefficiency. Next, to encounter human inefficiency, this work aims to calculate this downtime with real-time monitoring and predictive maintenance that can improve tool manufacturing. As discussed in Section 4.5, a multi-layer perceptron ANN is trained and tested on Appendix A, as input to predict the RUL of the tool. An ANN is trained with 122,778 (80%) samples and tested with 30,695 (20%) samples. The activation function used is ReLU with alpha 5 × 10 −5 and a stochastic gradient-based optimiser also known as Adam. The choice of Adam is due to a large dataset for training, while, after trials with different activation functions, ReLU is selected. Table 4 shows the hidden layer size and the accuracy achieved by the respective neural network after 1000 epochs. The accuracy is improved as the number of hidden layers increases from approximately 74% to 95%. The accuracy comparison between the ANNs with single and multiple hidden layers, the latter of which is also referred to as a 'deep neural network (DNN)' is shown in Figure 13. ANN can be shallow or deep: having one hidden layer between input and output refers to shallow, while having more than one hidden layer refers to a deep network (DN). If the neural network is complex and deep or the relevance between the input variables shows non-linear characteristics, then deep networks outperform shallow neural networks. With our findings, the ANN with multiple hidden layers performed well since neural networks appear complex and deep compared with simple single-layer networks, as shown in Figure 13. The F1 score and accuracy calculated for both ANN and DNN with and without PCA are presented in Table 5. The obtained values are between 65% and 85% for both models' classification with PCA. Classification without PCA increases the value of the F1 score and accuracy from 75% to 95%. Both models, when trained initially, selected a default optimum value due to the high F1 score achieved. In both cases, the DNN outperformed the ANN with high accuracy.   Different activation functions, i.e., ReLU, sigmoid, and tanh, are employed to analyse and compare, as shown in Figure 16. The maximum accuracy acquired by ReLU i.e., 96.56% compared with tanh and sigmoid, i.e., 91.2%. Figures 17 and 18 display the predicted vs. actual values graphs predicted by the ANN with a single layer and the DNN. The ANN with a single layer provides more wrong predictions than the DNN in 100 samples. The former has an accuracy of around 74.56%, while the latter has 96.65% accuracy, as can be viewed in Figures 17 and 18. Based on all these results, it can be asserted that the performance of the DNN is quite encouraging for predicting the RUL of the tool.

Conclusions
This experimental work proposes a novel method for predictive maintenance and moves towards the idea of Industry 4.0. The chosen parameters for experimentation are vibration, current, and temperature. The input parameters are vibration (X, Y, and Z), current (I1, I2, and I3), and the tool's temperature (T1). The values for these parameters are collected using sensors, and the data are preprocessed. Later, the data are used to train a deep learning model, i.e., an ANN with single and multiple layers that can be effectively used for predicting the remaining useful life (RUL) of the gear tool. During experimentation, the downtime of machines and the working time of workers are monitored to analyse the number of gear tool parts that can be produced in 08 working hours. It is found that the downtime is about four hours due to workers' inefficiency and the manufactured tools' parts not being up to standard due to tool wear, which increases the cost of tool manufacturing and is a waste of resources.
To handle it, an artificial neural network (ANN) is trained on large datasets with a ratio of 80:20 for training and testing. The results confirmed that the ANN with a single layer can predict the RUL of tool wear with an accuracy of 74.56%, and, with multiple hidden layers, i.e., DNN, the efficacy is improved up to 95.65%. During the trials, three activation functions, i.e., tanh, ReLU, and sigmoid, are used to enhance accuracy. The best result is achieved by ReLU, with approximately 96% accuracy, compared with the other two with 92% accuracy. Based on the results, it can be concluded that the ANN with multiple hidden layers (i.e., DNN) predicts the RUL of gear tools more accurately. This study can also provide a mechanism to improve tool part production with less waste of resources and minimisation of costs, especially for mid-level industries in developing countries.
For future research, this work has opened a direction to conduct experiments with parameters such as power and cutting force. A tool condition monitoring (TCM) system can be developed to improve product quality, tool life, and shop floor scheduling. The future focus is Industry 4.0, which includes forming an IoT-based server along with TCM. Other algorithms, e.g., the heuristic approach and the genetic algorithm, could be used for shop floor scheduling. For tool life prediction, the neuro-fuzzy network and the radial basis function network are also targeted for future work.

Data Availability Statement:
The complete raw data are available here [49].

Acknowledgments:
The authors would like to extend their gratitude to Umar Siddique who provided a review for this study.

Conflicts of Interest:
The authors declare no conflict of Interest.

Abbreviations
The following abbreviations are used in this manuscript: The sample collected raw data is shown below: