Intelligent Vibration Monitoring System for Smart Industry Utilizing Optical Fiber Sensor Combined with Machine Learning

: In this paper, we proposed and experimentally demonstrated the association of a fiber Bragg Grating (FBG) sensing system with You Only Look Once V7 (YOLO V7) to identify the vibration signal of a faulty machine. In the experiment, the YOLO V7 network architecture consists of a backbone, three detection heads (Headx3), a path aggregation network (PAN), and a feature pyramid network (FPN). The proposed architecture has an FBG sensor and the FBG interrogator employed for collecting sensing vibration signals or vibration data when degradation or fault occurs. An FBG interrogator collects vibration data independently, and then the YOLO V7 object detection algorithm is the recognition architecture of the vibration pattern of the signal. Thus, the proposed vibration recognition or detection is an assurance for detecting vibration signals that can support monitoring the machine’s health. Moreover, this research is promising for ensuring a high accuracy detection of faulty signals rate in industrial equipment monitoring and offers a robust system, resulting in remarkable accuracy with an overall model accuracy of 99.7%. The result shows that the model can identify the faulty signal more accurately and effectively detect the faulty vibration signal using the detection algorithm.


Introduction
The traditional method of monitoring machine health has relied on a particular operator for handling the system.However, this manual approach lacked real-time capabilities, consumed time, and needed human resources.To address these limitations, many studies have been conducted.Fiber sensors and artificial intelligence are two rapidly evolving technologies.When fiber sensors are combined with machine learning, they offer solutions to many problems related to optical fiber sensors, such as motor machines in the industry generating vibration signals using different sensors.Sensors designed to measure various strains address numerous issues, with vibrations being one of the challenges addressed globally [1][2][3][4][5].In many types of research, object detection is associated with the fiber sensor and is a hot topic in vibration for daily applications, such as industrial motors, automatic machines, turbines, tunnels, elevators, and railway tracks problems to detect early faults and signals intrusion classification in structures to determine early warning.Smart Industry employs various types of motors in different output machinery [6,7].The primary importance of motor monitoring in various industries lies in its early fault detection capabilities, performance enhancement, and health assessment, especially in the vehicle sector.Safety concerns are a paramount consideration, hence the utilization of artificial intelligence algorithms for monitoring.Some studies of electronic sensing raised a lot of concerns, such as electromagnetic interference sensitivity and space constraints.Some studies reflect that electromagnetic interference can influence electronic sensors to cause inaccurate readings [8].However, to overcome these limitations, an FBG vibration sensor is used.FBG has the advantages of high sensitivity, long-distance transmission, immunity to electromagnetic interference, high multiplexing, lightweight, and working in a tiny space.This method also addresses challenges related to long distances, space limitations, and high sensitivity while delivering accurate results.Some research highlights the remarkable and unique properties of FBGs, and the interrogation technique involves both wavelength-shifting and intensity-based methods to create adaptable sensors [9,10].This detection technique works for real-time fault detection at an applied place for finding the faulty site point or faulty machine with the association of artificial intelligence such as object detection or machine learning [9][10][11][12][13][14].
In previous detection methods, various approaches are employed, including the convolutional neural networks (CNNs), Faster R-CNN, and different versions of YOLO ranging from YOLO V1 to YOLO V5 along with benefits of YOLO V7.Each version of YOLO is tailored to achieve specific outcomes or improvements in the detection process.Tasks related to non-linear problems have high complexity of processing and increment in processing time.For machine learning or deep learning, the CNN method automatically extracts features and understands the pattern for classifying the generated signal.These CNN methods are classified into two categories: First is a one-dimensional convolutional neural network (1-D CNN) that receives the transmit signal directly, and another is a twodimensional convolutional neural network (2-D CNN) that receives input as a converted image directly from the signal.However, the standard method 2-D CNN model is only capable of doing a signal target object detection in the picture, so during the multiple signal generation scenarios, problems cannot be solved using this system [15][16][17][18].Another is a faster region-based convolutional neural network (Faster R-CNN), which is a popular object detection algorithm.This solved multiple object detection problems instead of the convolutional neural network [19].This introduced the regional proposal network (RPN).The RPN is a fully pre-trained convolutional neural network that is used on the feature and generated by backbone networks, such as the visual geometry group (VGG-16) and residual neural network (ResNet).VGG-16 is a convolutional neural network that has 16 layers in the architecture, and ResNet is a model with a weight that learns residual function with reference to input layers.However, this is a popular detection algorithm but has demanded more computation than its predecessors because of the addition of the region proposal network.In addition, the training process is time-consuming, and the network acquires large computation resources during inference.Moreover, this Faster R-CNN is computing single scaling processing and fixed anchor boxes process with limited ability to vary the size of the detected object and lies on already defined anchor boxes, which is not ideal for the size object ratio, respectively [20,21].Some studies of object detection or signal detection have various kinds of result variations, and their outcomes vary due to their different kinds of versions [22].Every year, the YOLO version is updated with some new advances in the model architecture [23].In a series of YOLO, introduced in 2016 and the first real-time object detection algorithm, YOLO V1 converts input images to the grid and predicts label boxes and probabilities from the grid cells.The backbone of this version is Darknet-19; the number of layers of the network is 19 convolutional layers.The main lack of this version is a lower value result of mean average precision (MAP).In 2017, YOLO V2 was introduced, also known as YOLO 9000, to improve the lack of YOLO v1; this version introduced the anchor boxes and feature pyramid network (FPN) in various scales, and it improves the detection accuracy [22][23][24].This YOLO V2 used Darknet-19 and Darknet-53 as the backbone that was used for small datasets and custom datasets in maximum count, respectively.The study review reflects that complexity is increased in comparison to YOLO V1 because the number of convolutional layers is 53, and the maximum layer needs more space for computation; this is a struggle for small object detection compared to modern detection methods.In early 2018, YOLO V3 was introduced again to improve the detection accuracy and small object detection.The version multi-scale detection approach was adopted with three kinds of scales in the process of detection, but the Darknet-53 backbone did not change because of the powerful feature extraction; although this provides benefits to the network, one limitation is that it is more expensive and complex computationally [25,26].Further, YOLO V4 was launched in 2020, and developers improved the backbone Darknet-53 to crossstage partial (CSP) Darknet-53, which was the new backbone of this detection algorithm; CSP Darknet-53 employs the cross-stage partial connection to increase the flow of data across every step-in layer to layer path aggregation network (PANet) built in YOLO V4 to handle the multi-scale features and improve the accuracy at the different quality of images.For saving this multi-scale contextual data effectively, spatial pyramid pooling (SPP) is used.Moreover, self-attention module (SAM) and pseudo-attention network (PAN) modules can also be introduced to increase the network's ability to approach the features in the image; however, due to more computational demand than its predecessors, this affects the speed of resource requirements such as particular power and memory specified.At the end of 2021, YOLO V5 introduced some new features, such as implementing the PyTorch framework because it was easy to use and implement, and the backbone was CSPDarknet-53, which was referred to by the previous version, although the accuracy problem was occurring in that version.In the latest research paper, improved methods are also being used to increase the accuracy of detection objects [27,28].
In this paper, the FBG sensor is employed for sensing the vibration from the vibration sources, such as various kinds of machinery factories, as shown in the schematic diagram, Figure 1.However, in the experiment setup, only motors have been used to prove the experiment on a minor stage with the motors.For detection, the four kinds of vibration conditions are chosen here in the experiment.To detect signal and classification, the recent version of YOLO is being employed.Although it is not the latest version, it offers improved accuracy and speed compared to the earlier model networks.The evaluation metrics result, such as MAP or latency of this model, is better than other detectors.YOLO V7 precision detection outcomes in research are quite impressive.The overall structure is similar to YOLO V5, like Backbone, PAN, FPN, and three different head scales, despite this module changing a lot.For the feature extraction unit, Efficient Layer Aggregation Network (ELAN) is being used here, with max pooling and convolutional with stride 2 in the down-sampling method [29,30].The following contribution in this paper is to identify the optical fiber vibration signals.
A. Integrating FBG sensors and artificial intelligence for diverse motor conditions with an advanced YOLO V7 architecture.B. The Extended ELAN is implemented as a feature extraction unit and max-pooling, and convolution in YOLO V7 along stride 2 is used in down-sampling.Besides, the network contains a backbone, a head with three input branches, a path aggregation network, and a feature pyramid network, which causes improved efficiency of signal recognition.C. YOLO V7 detection technique is implemented to accomplish the detection and classification of vibration signals, which means improving the capability of signal identification with high accuracy.
The rest of this paper is organized as follows: Section 2 describes the material and methods in which we elaborate on data collection, label annotation methods, and proposed YOLO V7 architecture and its theoretical foundation of the algorithm.Section 3 presents the evaluation performance of the YOLO V7 model and analyses through the experiment of vibration signal detection datasets.Finally, we provide a conclusion of this work in Section 4.

Optical Fiber Vibration Sensing System for Smart Industry
FBG vibration sensing technique is a widely used approach.In this experiment, FBGs are used for sensing the vibration of machines due to their high sensitivity and capability through the single-mode fiber.The FBG interrogator is a wavelength detector that measures the shifted wavelength of the reflected light from sensors.As a result, it is able to monitor the signal at different points of the stage.In the experimental setup, four motor conditions are used to generate the vibration.Figure 2 presents how to set up an experiment to establish an optical fiber vibration sensing system.The process of this sensing technique consists of several steps.Firstly, the attached FBG sensor is at a different location on the machines (motor), as shown in Figure 1.For reference, every sensor responds at a specific location, such as the vibration sensing part of the smart industry.Multiple couplers are deployed, such as coupler1, coupler2 and coupler3.Each coupler has FBG sensors connected to the printed circuit board factory, an electric vehicle factory, and a material factory, respectively.The whole setup transmits a signal to the central unit, and then optical fiber is an aid to connect and transmit the reflected light signal to the wavelength detector.The wavelength detector is the FBG interrogator, and these shifts are detected by the interrogator and converted into vibration data(datasets), the process also known as data acquisition.One channel is detecting multiple vibrations by channel port to the source; these are the benefits of FBGs with FBG interrogators.In Figure 2, the experimental setup is a real experiment in which vibration sources have two kinds of independent motors with four conditions or faulty signals, as shown in the data labeling section of the signal detection process.FBGs are used for collecting vibration using an FBG interrogator with a swept laser (SL) and photodetector (PD).Singlemode fiber cable connects the FBG sensor and the FBG interrogator, and the network cable transmits data to the PC.The signal detection algorithm collects that data and processes it, further training the algorithm.YOLO V7 algorithm is embedded here for detecting the faulty signal.A trained model also can be used for real-time detection of test data.Below, we have described every parameter of the signal detection system of this experiment.

Signal Detection System
This subsection concentrates on the signal detection system.The first two subsections describe the data collection and annotation of the dataset, followed by an explanation of the vibration detection algorithm (YOLO V7).The dataset is divided into a training set and a validation set in a specific ratio in the data division process.Furthermore, it provides a detailed description of the experimental environment setup within the system.Lastly, the subsequent segment presents a table listing the hyperparameters.

Data Collection
Data collection is a method in which datasets are collected; here, data is collected through an experiment done using four different conditions of motor vibration: motor 1, motor 2, motor 1 abnormal, and motor 2 abnormal.Four different vibration signal data are created to collect, as shown in the proposed architecture.After that, possible vibration signal data is gathered.Data is collected in the form of numerical values of amplitude or power and time, and then collected data is converted to the form of an xlxs file extension for showing in rows and columns.Further number data were converted into video machine learning libraries, such as pandas, matplotlib, matplotlib.animation, and the DateTime method.Generated videos are converted to images using Python libraries and open CV; this provides us with a dataset.The images generated are the y-axis as the wavelength of the sensor and the x-axis as time.Each class consists of 600 images, and the total number of images is 2400 images that are generated from the source video, as shown in Table 1.The collected data is annotated or labeled; the below division has a description of labeling the signal vibration image.1.

YOLO V7
In the previous section, the data collection and annotation method are discussed, followed by the description of the YOLO V7 algorithm, which was developed by a researcher in early 2022.It incorporates a new efficient layer called Extended Efficient Layer Aggregation Networks (E-ELAN), model scaling, and reparameterization for concatenationbased models to achieve an appropriate balance between detection efficiency and precision.YOLO is divided into four major modules: input, backbone, head, and prediction module.
Input: The first steps in the YOLO V7 in input images with their corresponding label masking the object detection, the size is 416 × 416 as a color image as input of the backbone layer network.
Backbone: The backbone consists of three main modules: E-ELAN, CBS, and MPI modules.Convolution, batch normalization, and SiLU activation functions are the three primary components of the CBS module.The E-ELAN module keeps the basic ELAN design but improves the network's learning capacity by directing discrete groups of feature computational blocks to learn more diverse features while keeping the original gradient path.The MPI module is organized using a combination of CBS and MaxPool operations with upper and lower branches.In the upper branch, MaxPool is used to reduce the image's dimensions in half, both in length and width.A CBS operation with 128 output channels is also used to reduce the image's channel count in half.On the other hand, the lower branch uses a CBS operation with a 1 × 1 kernel and stride to cut the number of channels in half.Furthermore, the image's length and breadth are cut in half using a CBS operation with a 3 × 3 kernel and 2 × 2 stride.Concatenation (Cat) is used to integrate the features extracted from both branches.Max pool focuses on collecting the most valuable information from localized locations, whereas CBS keeps a wide range of valuable information from these tiny local areas.This combination method improves the network's capacity to extract meaningful features from input data greatly.
Head: This part of the YOLO V7 network is built using the feature pyramid network (FPN) structure, which implements the PANs structure design.Batch normalization, CBS block (SiLU activation), and many convolutional layers are composed together, coupled with the convolutional spatial pyramid (CSP) and spatial pyramid pooling (SPP) module, elongate E-ELAN and MaxPool 2 (MP2) that improves accuracy and aim to feature extraction and model optimization.The layer E-LAN-H composition of many feature layers depends on E-ELAN.There are two MP blocks; both are the same, with a different number of the output channel.
Prediction: This is a prediction network that has three rep design structures that employ a number of picture channels for the head network's output features.Here, 1 × 1 convolution is used to predict confidence, category, and anchor frame.VGG is the inspiration for the Rep structure and, as a result, reduces the complexity of the model without diminishing its predictive ability.
To train the YOLO V7 model, the following minimum system specifications were employed: an experimental apparatus equipped with an Intel(R) Core (TM) i7-9700k CPU @ 3.60 Hz, and 16 GB of installed RAM and NVIDIA GeForce RTX 2080Ti, CUDA 11.8.

Model Hyperparameter
The hyperparameters, as outlined in Table 2, play a pivotal role in determining the effectiveness of YOLO V7.They aid in the quest for optimal results from the neural network.After the setup, the process was initiated by meticulously configuring the system and then proceeding to train the algorithm with the predefined hyperparameters.

Results
In this paper, we propose an experimental setup run with defined parameters and provide the results.Initially, we obtain predictions on the validation dataset using the YOLO V7 algorithm, and subsequently, we evaluate the metrics as outlined in the following subsections.

Prediction of YOLO V7 Model
In Figure 3a-d prediction results of YOLO V7, the bounding boxes highlight the target signals generated from industrial machinery motors.The bounding box prediction is dependent on both the anchor box size and the feature information outputs of the network.The size of the anchor box also referred to as the a priori box, is established through prior experience.The network's prediction information adjusts the position of the preceding box to yield the predicted bounding box.The process of creating prediction boxes using test signals is illustrated in Figure 3.The four kinds of different color bounding boxes reflect the results of different predicted classes.In Figure 3a inscribed by a light green color bounding box, is the predicted result of the proposed model when motor 1 is in a normal condition.On the other hand, in Figure 3b, inscribed by a blue color box, is the predicted result of the proposed model when motor 1 is in an abnormal condition.In Figure 3c, inscribed by a green color box, is the predicted result of the proposed model when motor 2 is in a normal condition.Furthermore, Figure 3d inscribed by an orange color box, is the predicted result of the proposed model when motor 2 is in an abnormal condition.

Model Evaluation Metrics
To evaluate the performance of the proposed model, we used different performance metrics such as recall, precision, and intersection over the union (IoU) at a threshold of 0.5, average precision (AP), and mean average precision (MAP) [3,31].
In Equation ( 1), the intersection over union () metric, is used to calculate the degree of overlap between the system-predicted and ground-truth bounding boxes in the original image.The intersection and concatenation ratio of the detection results to ground truth is calculated, where ground truth is actual data and the detection result is the output produced by the algorithm.The equation below represents the precision calculation [31].

𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 = 𝑻𝒓𝒖𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆 𝑻𝒓𝒖𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆 + 𝑭𝒂𝒍𝒔𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆
(2) In Equations ( 2) and ( 3), where the true positive shows the predicted label is true and the actual label is also true, the false positive points out the predicted label which is true and the actual label is false and false negative means the predicted label is false, and the actual label is true.So, recall and precision reflect on possible true positive instances and ensure that predicted predictions are valid and accurate to classify the vibration signals [3,31].
In Equation ( 4), mean average precision (mAP) is the most important graph commonly used for object detection, and it is an incremental calculation of average precision in order to calculate the mAP.First, calculate the average precision of a number of classes (k) and then calculate the mean all average precision for measuring the mean average.In the next section, vibration is here in order to measure all these values to show the graphs such as confusion matrix, precision-recall curve, and so on [3,26].

Proposed Model Analysis on Experimental Vibration Datasets
The proposed algorithm evaluated the performance of the signal detection model (YOLO V7) on the vibration dataset, as shown in Figures 4 and 5.The four classes are accurately classified using YOLO V7, and each class's separate accuracy is shown also at the precision-recall relation curve, confusion matrix, and Loss graph.The result of each class is as follows: motor 1 normal, motor 1 abnormal, motor 2 normal, and motor 2 abnormal is shown in the confusion matrix, with the detection result of false negative and false positive with respect to each class.Besides that, a precision-recall curve assesses the performance of the detection model for correctly identifying the positive detection with fewer false positives.The curve precision highlights the true positive to the predicted positive and, meanwhile, the ratio of true positive to actual positive.The area under the curve (AUC) reflects the performance.As a result, high precision shows the accurate positive vibration signal and minimal false positive degradation signal.In Figure 6, the loss graph displays the training and validation loss curve, with the blue curve representing the training loss and the red curve representing the validation loss.Loss is a function that consists of bounding box loss, confidence loss, and class loss.It is calculated by the addition of a bounding box, confidence, and class loss.Bounding loss is a prediction of the accurate coordinates of the box; it is also known as localization loss.Confidence loss is also known as objectness loss, which is defined as the prediction of the object in the grid cell or anchor box, and class loss predicts that object contained in the bounding box.The main focus of the three losses are increased accuracy of classification, localization, and signal detection.In the experiment, when we calculate the loss, it consistently decreases during the training and validation of the signal detection model.The training and validation loss converg approximately at 75 epochs.In addition, during the training validation time, the model at 160 epochs was totally stable, and loss was minimal.As Figure 6 shows, at every epoch after 75 epochs, the loss curve was continuously in progress until optimal epochs.Our loss result was optimal at 195 epochs.Along with a learning rate of 1 × 10 −5 , momentum was 0.98, weight decay was 0.001, batch size was 16, the optimizer was Adam, the image size was 416 × 416, and total epochs were 200.When 191 epoch loss of validation 0.0005 approximately which was the best loss and performance was quite impressive as shown in the figures below.

Conclusions
In this paper, a novel approach is proposed to monitor the health of the industrial motor in the smart industry to enhance the durability and capability of the sensing system as well as decrease the cost.This proposed model efficiently detects vibration signals generated using an FBG interrogator, combined with the identification technology of YOLO V7, making it an excellent performer with minimal equipment.The accuracy of four conditions "motor 1 normal is 99.7%", "motor 1 abnormal is 99.2%", "motor 2 normal is 99.6%", "motor 2 abnormal is 99.7%", and the average accuracy mean average of all classes are 99.5% at a threshold of 0.5.Besides, the proposed model accurately classifies the signal generated and provides an excellent performance.Therefore, optical fiber sensors combined with YOLO V7 easily output accuracy, robustness, sensitivity, and cost-effective apparatus setup for handling the problem of vibration.

Figure 1 .
Figure 1.Conceptual schematic diagram of FBG integrated with YOLO V7 in the smart industry with a distribution unit and a control unit as a central unit for the PCB Factory, electric vehicle factory, and material factory.(PCB: Printed circuit board, FBG: Fiber Bragg grating).

Figure 3 .
Figure 3.The predicted result of the proposed model is when the motors are in different conditions.(a) motor 1 normal condition, (b) motor 1 abnormal condition, (c) motor 2 normal condition, and (d) motor 2 abnormal condition.

Figure 4 .
Figure 4.The confusion matrix of labels for different classes or conditions of motors.

Figure 5 .
Figure 5.The precision-recall curve relation for each class/condition of the motor.

Figure 6 .
Figure 6.The proposed model training and validation losses at various epoch numbers.

Table 1 .
Dataset.data to train a model of collected vibration datasets; then, vibration image data is labeled using labelImg.LabelImg is an open-source graphical image annotation tool that is used to generate labels of the corresponding image for training an objection detection model.In labelImg, four classes were created; the name annotation of four classes are as follows: Motor 1 normal, Motor 1 Abnormal, Motor 2 normal, and Motor 2 Abnormal.After classes are image labelled, each class generates 600 labels to their corresponding class.The number of images used for training YOLO is 2400 images, of which 70% were used for training and 30% for validation, as shown in Table