AVOA-LightGBM Power Fiber Optic Cable Event Pattern Recognition Method Based on Wavelet Packet Decomposition

: The type of power ﬁber optic cable fault event obtained by analyzing the optical time domain reﬂectometer (OTDR) detection curve is an important basis for ensuring the operation quality of communication lines. To address the issue of low accuracy in recognizing fault event patterns, this research proposes the AVOA-LightGBM method for optical cable fault event pattern recognition based on wavelet packet decomposition. Initially, a three-layer wavelet packet decomposition is performed on different fault events, resulting in eight characteristic signals. These signals are then normalized and used as input for each recognition model. The Light Gradient Boosting Machine (LightGBM) is optimized using the African vulture optimization algorithm (AVOA) for pattern recognition. The experimental results demonstrate that this method achieves a recognition accuracy of 98.24%. It outperforms LightGBM, support vector machine (SVM), and extreme learning machine (ELM) by 3.7%, 19.15%, and 5.67%, respectively, in terms of accuracy. Moreover, it shows a 1.8% improvement compared with the combined model PSO-LightGBM.


Introduction
Due to their advantages in terms of good anti-interference performance, long distance transmission, large range coverage, and intrinsic safety [1,2], power optical cables have become an essential means of information transmission in power systems [3]. The stability of their transmission plays a crucial role in the overall safety of the power system [4,5]. However, due to their unique installation method, power optical cables are exposed to the wild for extended periods and are vulnerable to failures caused by severe weather conditions such as strong winds and icing. These failures directly jeopardize the safety and stability of the power system. Therefore, identifying the common fault event types of power optical cables is crucial for the timely resolution of fault issues and ensuring the reliability of the power communication system [6,7].
The OTDR is an instrument that utilizes Rayleigh scattering and Fresnel reflection phenomena of light as it propagates through optical fibers. It extracts valuable signals from optical attenuation information and detects and analyzes the condition of optical fibers [8,9]. OTDR offers several advantages including easy operation, convenience, low energy consumption, and compatibility with portable power sources. It only requires access to one end of the optical fiber link. In power optical cables, OTDR plays a crucial role [10][11][12]. However, the current conventional operation and maintenance management methods heavily rely on the manual inspection of physical damage to the cables, which contradicts the expectations of the smart grid. The maintenance of optical cables is challenging and inefficient due to the influence of materials and natural factors [13]. This paper proposes a method to obtain an optical power attenuation curve containing the status information of the optical cable by connecting and testing the OTDR with the optical cable under test. By analyzing this curve, it is possible to identify the fault event type of the optical cable under test [14].
Numerous studies have been conducted on OTDR event recognition methods. H. Wu et al. employed a four-layer wavelet decomposition to extract event features and combined it with a BP neural network for recognition, achieving a recognition accuracy rate of 91.1% [15]. Similarly, C. Xu et al. utilized SVM to identify time-domain features of OTDR events. Their results demonstrated that SVM can effectively identify types of optical cable faults with an accuracy of 93.8% [16]. However, this method overlooks the frequency-domain information present in OTDR signals. B. M. Tabi Fouda et al. proposed a two-level target recognition method. In this approach, the signal's short-term energy and short-term cross-threshold rate are compared with a dynamic threshold in the first level, followed by the estimation of the power spectrum of the initially screened samples. Features are then extracted and input to SVM for secondary recognition [17]. This two-level recognition method significantly improves the accuracy of recognition. J. Wu et al. proposed a multi-scale one-dimensional convolutional neural network for identifying event types [18]. This method simplifies the algorithm complexity and maintains a comparative advantage by directly inputting preprocessed data into the network, resulting in good recognition performance. Y. Wang et al. conducted wavelet energy spectrum analysis on the original vibration signal to extract feature vectors. They determined that the optimal number of decomposition layers for wavelet energy spectrum analysis was six. The extracted features were then classified using a correlation vector machine (RVM) to achieve the recognition of OTDR events [19]. However, the research results indicate an average recognition accuracy of only 88.60%. X. Wang et al. employed the random forest classifier to identify data after time-domain feature extraction, achieving a recognition accuracy of 96.58% [20]. However, the random forest algorithm is complex and computationally expensive. M. H. Li et al. first preprocessed the signal using an average filter and then extracted time-and frequency-domain signals for classification using a support vector machine [21]. However, this method is unable to simultaneously identify multiple events.
This paper focuses on addressing the issue of low recognition accuracy of optical cable fault event types. The research is conducted in three main areas: data feature extraction, recognition algorithm, and optimization algorithm. Initially, the original signal undergoes wavelet packet decomposition and energy spectrum construction processing. The resulting signal is then used as a feature vector for training and testing in the LightGBM algorithm. Additionally, four parameters are optimized to enhance the accuracy and performance of the LightGBM algorithm. Contributions of this paper can be summarized as follows: (1) In order to address the issue of low recognition accuracy of power optical cable fault event types, we propose the AVOA-LightGBM optical cable fault event pattern recognition method based on wavelet packet decomposition. We simulated various types of optical cable faults in the laboratory to validate the feasibility of this identification method. Furthermore, field data demonstrate the high superiority and accuracy of our proposed method. (2) Traditional feature extraction methods may encounter problems such as information loss, inaccurate feature extraction, and loss of time series information when dealing with non-stationary data sets in the time domain and frequency domain. To address these issues, we use the 'rbio3.1' wavelet basis function for three-layer wavelet packet decomposition on the data set. This decomposition allows us to obtain eight frequency subband signals from the time-domain signal, and the energy value of each subband signal serves as the fault characteristic value of the signal. To improve the performance and generalization ability of the recognition model, we introduce the AVOA to determine the four parameters of the LightGBM algorithm used for classifying fault events. Experimental results show that the average accuracy rate of the model reached 98.24%, demonstrating the success of our approach. The rest of this paper is composed as follows: Section 2 introduces the algorithm principle, including LightGBM, AVOA, and wavelet packet decomposition; Section 3 is the data processing, conducting various analyses on the experimental results, and comparing them with other models; Section 4 is the conclusion of this paper.

LightGBM
Gradient boosting decision tree (GBDT) is a well-established machine-learning algorithm [22] that has demonstrated exceptional performance and effectiveness in numerous practical applications. However, GBDT requires building a new tree for each iteration, which makes the training process slower and consumes more memory due to the large number of trees. Additionally, if the depth of the tree is large and the parameter settings are improper, it is more likely to encounter overfitting issues. Moreover, GBDT needs to deal with missing values when faced with them in the data. The LightGBM model is an efficient implementation of GBDT where a set of weak learners are combined using a boosting strategy to progressively enhance the accuracy of the overall learner. In contrast to GBDT, LightGBM incorporates a range of optimization algorithms, enabling significant reductions in memory usage and training time without compromising model accuracy; when there are missing values in the data, it can also automatically handle these without additional preprocessing steps.
The histogram algorithm discretizes continuous eigenvalues into k integers and constructs a histogram with a width of k. The algorithm uses the discretized values as indexes to accumulate statistics in the histogram during traversal. It then finds the optimal split point based on the index. This approach only requires saving the discrete value instead of the entire training set, resulting in significantly reduced memory consumption. Figure 1 illustrates the histogram algorithm. classifying fault events. Experimental results show that the average accuracy rate of the model reached 98.24%, demonstrating the success of our approach. The rest of this paper is composed as follows: Section 2 introduces the algorithm principle, including LightGBM, AVOA, and wavelet packet decomposition; Section 3 is the data processing, conducting various analyses on the experimental results, and comparing them with other models; Section 4 is the conclusion of this paper.

LightGBM
Gradient boosting decision tree (GBDT) is a well-established machine-learning algorithm [22] that has demonstrated exceptional performance and effectiveness in numerous practical applications. However, GBDT requires building a new tree for each iteration, which makes the training process slower and consumes more memory due to the large number of trees. Additionally, if the depth of the tree is large and the parameter settings are improper, it is more likely to encounter overfitting issues. Moreover, GBDT needs to deal with missing values when faced with them in the data. The LightGBM model is an efficient implementation of GBDT where a set of weak learners are combined using a boosting strategy to progressively enhance the accuracy of the overall learner. In contrast to GBDT, LightGBM incorporates a range of optimization algorithms, enabling significant reductions in memory usage and training time without compromising model accuracy; when there are missing values in the data, it can also automatically handle these without additional preprocessing steps.
The histogram algorithm discretizes continuous eigenvalues into k integers and constructs a histogram with a width of k . The algorithm uses the discretized values as indexes to accumulate statistics in the histogram during traversal. It then finds the optimal split point based on the index. This approach only requires saving the discrete value instead of the entire training set, resulting in significantly reduced memory consumption. Figure 1 illustrates the histogram algorithm. LightGBM utilizes the leaf-wise decision tree growth strategy [23]. This strategy involves finding the leaf with the highest splitting gain among all the current leaves for splitting, repeating the process, and continuously iterating. This approach offers higher precision compared with the level-wise decision used in traditional GBDT. The leaf-wise decision tree growth strategy is illustrated in Figure 2. LightGBM utilizes the leaf-wise decision tree growth strategy [23]. This strategy involves finding the leaf with the highest splitting gain among all the current leaves for splitting, repeating the process, and continuously iterating. This approach offers higher precision compared with the level-wise decision used in traditional GBDT. The leaf-wise decision tree growth strategy is illustrated in Figure 2.
In the traditional gradient sampling method, samples with small gradients are typically selected, which can impact the training speed and accuracy of the model. To address this issue, LightGBM employs a technique called gradient-based one-side sampling (GOSS). This method prioritizes the retention of instances with larger gradients and randomly samples instances with smaller gradients. By doing so, accurate information gain estimates can be obtained using a smaller amount of data. This approach effectively reduces computational requirements while maintaining the predictive performance of the model. In the traditional gradient sampling method, samples with small gradients are typically selected, which can impact the training speed and accuracy of the model. To address this issue, LightGBM employs a technique called gradient-based one-side sampling (GOSS). This method prioritizes the retention of instances with larger gradients and randomly samples instances with smaller gradients. By doing so, accurate information gain estimates can be obtained using a smaller amount of data. This approach effectively reduces computational requirements while maintaining the predictive performance of the model.

The African Vulture Optimization Algorithm
The AVOA is a novel intelligent optimization algorithm that draws inspiration from the foraging behavior and living habits of African vultures. The AVOA exhibits strong optimization ability and fast convergence speed [24]. The process is illustrated in Figure  3. The position of an individual African vulture within the population can be represented by a mathematical vector. The vulture population, which consists of n vultures, is represented by Formula (1): , , , , , where n represents the number of vultures in the population, and i X represents the position of the i-th vulture in the population. The AVOA is an iterative process that can be expressed in the following four steps: Phase 1: Population grouping and identification of optimal vultures. After the initial population is formed, the fitness of all vultures is calculated. The vulture position that corresponds to the optimal fitness value is considered the optimal vulture position, while the vulture position that corresponds to the suboptimal fitness value is considered the suboptimal vulture position [25]. The remaining vultures move toward the optimal and suboptimal positions using Formula (2). Formula (3) is the calculation method for the probability of selecting the best vulture. This process is repeated in each fitness iteration, recalculating the entire population.
where 1 BestVulture represents the optimal vulture position, while

The African Vulture Optimization Algorithm
The AVOA is a novel intelligent optimization algorithm that draws inspiration from the foraging behavior and living habits of African vultures. The AVOA exhibits strong optimization ability and fast convergence speed [24]. The process is illustrated in Figure 3. The position of an individual African vulture within the population can be represented by a mathematical vector. The vulture population, which consists of n vultures, is represented by Formula (1): where n represents the number of vultures in the population, and X i represents the position of the i-th vulture in the population. The AVOA is an iterative process that can be expressed in the following four steps: Phase 1: Population grouping and identification of optimal vultures. After the initial population is formed, the fitness of all vultures is calculated. The vulture position that corresponds to the optimal fitness value is considered the optimal vulture position, while the vulture position that corresponds to the suboptimal fitness value is considered the suboptimal vulture position [25]. The remaining vultures move toward the optimal and suboptimal positions using Formula (2). Formula (3) is the calculation method for the probability of selecting the best vulture. This process is repeated in each fitness iteration, recalculating the entire population.
where BestVulture 1 represents the optimal vulture position, while BestVulture 2 represents the suboptimal vulture position. The probability of selecting the best vulture is denoted by p i . The fitness value of other vultures is represented by F i . Lastly, t denotes the current iteration number. L 1 and L 2 are algorithm setting parameters that range from 0 to 1. The sum of L 1 and L 2 is always equal to 1. When L 1 approaches 1 and L 2 approaches 0, the algorithm's search ability is enhanced, making it more effective in finding local solutions. On the other hand, when L 1 approaches 0 and L 2 approaches 1, the algorithm's search diversity increases, making it more effective in finding the global optimal solution. Phase 2: Determine the hunger rate of vultures. Vultures that are full have more energy, enabling them to fly longer distances in search of food. On the other hand, vultures in a hungry state conserve energy by staying close to stronger vultures and can only search for food nearby. Additionally, vultures in a hungry state exhibit more aggressive behavior [26], which can be represented by Formula (4): where F represents the hunger rate of the vulture; T is the maximum number of iterations; rand 1 ∈ (0, 1); z ∈ (−1, 1) will change with each iteration. When the z value drops below 0, it indicates that the vulture is in a hungry state. When it increases to 0, it indicates that the vulture is full. Meanwhile, ω is a parameter set before the optimization operation.
Increasing the value of ω increases the probability of entering the exploration stage in the final optimization stage; otherwise, the probability decreases. , , , p rand rand rand rand are random numbers between 0 and 1, and the rest of the parameters have the same meaning as above. When the value of F is less than 0.5, the vulture experiences fatigue and hunger, which simulates its gathering behavior and triggers an offensive behavior. This behavior can be represented by Formulas (11)- (13):   Phase 3: Exploration stage. Vultures can forage using two different strategies and select the strategy mode based on the user-defined parameter p 1 (p 1 ∈ (0, 1)) [27]. This stage can be represented by Formulas (5) and (6).
where P(i + 1) and P(i) represent the position of the vulture in the t-th and t + 1 iterations, respectively. X is the coefficient vector, which aims to increase the random motion and changes in each iteration. X is calculated as 2 multiplied by a random number between 0 and 1. P 1 is the selection parameter of the exploration stage. rand 2 ,rand 3 , rand p 1 are random numbers between 0 and 1. u b represents the upper limit of the search space, while l b represents the lower limit of the search space. Phase 4: Exploitation stage. When the value of |F| is less than 1, the AVOA enters the exploitation stage. In this stage, there are two development strategies. The critical value for choosing which strategy is |F| = 0.5. When |F| ≥ 0.5, the mathematical model imitates the characteristics of the vulture's spiral flight and appears as Formulas (7)- (10): where rand 4 , rand 5 , rand 6 , rand p 2 are random numbers between 0 and 1, and the rest of the parameters have the same meaning as above.
When the value of |F| is less than 0.5, the vulture experiences fatigue and hunger, which simulates its gathering behavior and triggers an offensive behavior. This behavior can be represented by Formulas (11)-(13): where rand p 3 is a random number between 0 and 1. BestVulture 1 (i) and BestVulture 2 (i) denote the optimal and suboptimal positions of vultures in the current iteration. Levy(d) is a stochastic process that simulates a random walk, influencing the step size of each individual in the population across dimensions. A larger value of Levy(d) corresponds to a larger step size, increasing the algorithm's exploratory nature but making it harder to converge to the global optimal solution. Conversely, a smaller Levy(d) implies a smaller step size, making the algorithm more inclined toward local search but potentially overlooking the global optimal solution within the search space.

AVOA-LightGBM Power Fiber Optic Cable Event Pattern Recognition Method Based on Wavelet Packet Decomposition
The LightGBM algorithm offers flexibility to cater to various requirements such as model complexity control, computing efficiency, memory control, and different fields [28]. To optimize the LightGBM algorithm and enhance recognition accuracy, this paper utilizes  Table 1.  [29]. [3,12] The fitness function of the AVOA is constructed as shown in Formula (14): where n is the number of test datasets, Y(i) is the classification value, and f (x i ) is the real value.
The AVOA-LightGBM power fiber optical cable fault event identification model based on wavelet packet decomposition, is illustrated in Figure 4. It comprises three main components: Data Preprocessing, AVOA-LightGBM model optimization, and Model classification. The specific steps are as follows: 1.
Data preprocessing: The OTDR curve data are subjected to wavelet packet decomposition to extract the feature vector. This vector is then normalized and used as the input for the LightGBM recognition model. Furthermore, the data are divided into a training set and a test set for model evaluation.

2.
AVOA-LightGBM model optimization: The AVAO-LightGBM optical cable event recognition model is constructed by setting and initializing the initial value of Light-GBM and the hyperparameter optimization range. The algorithm continuously calculates the fitness value of the AVOA and updates the individual and population parameters until the termination condition is reached. Finally, the optimal parameters are out-putted.

3.
Model classification: Input the data set into the model for identification and output the classification result. recognition model is constructed by setting and initializing the initial value of LightGBM and the hyperparameter optimization range. The algorithm continuously calculates the fitness value of the AVOA and updates the individual and population parameters until the termination condition is reached. Finally, the optimal parameters are out-putted. 3. Model classification: Input the data set into the model for identification and output the classification result.

Experimental Data Acquisition
In this paper, an OTDR is utilized to collect the signal of the faulty power fiber optical cable. The acquisition device is depicted in Figure 5. The pulse generator is employed to control the laser diode (LD), injecting an optical pulse of specific width and period into the optical fiber via the circulator. Subsequently, after the optical signal is injected into the fiber, certain signals return due to Rayleigh backscattering and Fresnel reflection. These returning signals then pass through the coupler once again. The coupler separates the returned optical signal from the emitted one, directing it to the photodiode. The optical signal is then converted into an electrical signal, amplified by the amplifier, sampled by the DAQ sampler, and output and displayed as an optical power attenuation curve on the PC. Additionally, this curve is stored for subsequent analysis.

Experimental Data Acquisition
In this paper, an OTDR is utilized to collect the signal of the faulty power fiber optical cable. The acquisition device is depicted in Figure 5. The pulse generator is employed to control the laser diode (LD), injecting an optical pulse of specific width and period into the optical fiber via the circulator. Subsequently, after the optical signal is injected into the fiber, certain signals return due to Rayleigh backscattering and Fresnel reflection. These returning signals then pass through the coupler once again. The coupler separates the returned optical signal from the emitted one, directing it to the photodiode. The optical signal is then converted into an electrical signal, amplified by the amplifier, sampled by the DAQ sampler, and output and displayed as an optical power attenuation curve on the PC. Additionally, this curve is stored for subsequent analysis.

Experimental Data Processing
In this paper, the above-mentioned OTDR device is utilized to capture two common types of optical cable faults, namely reflection events and non-reflection events. The reflection event is depicted as point A on the left side of Figure 5, appearing as a bulge on the curve where the relative optical power experiences a sudden increase followed by attenuation. This paper collects reflection events by conducting destructive experiments such as man-made fiber breaks and poor connections at splice points. The non-reflective events are represented as B in the graph, and the curve demonstrates a stepwise decrease in relative optical power. Additionally, the paper collects non-reflective events through destructive experiments involving improper fiber fusion and excessive bending. As each event occurs at a different location, this paper segments the OTDR fault events by allocating 1000 sampling points to each event.

Experimental Data Processing
In this paper, the above-mentioned OTDR device is utilized to capture two common types of optical cable faults, namely reflection events and non-reflection events. The reflection event is depicted as point A on the left side of Figure 5, appearing as a bulge on the curve where the relative optical power experiences a sudden increase followed by attenuation. This paper collects reflection events by conducting destructive experiments such as man-made fiber breaks and poor connections at splice points. The non-reflective events are represented as B in the graph, and the curve demonstrates a stepwise decrease in relative optical power. Additionally, the paper collects non-reflective events through destructive experiments involving improper fiber fusion and excessive bending. As each event occurs at a different location, this paper segments the OTDR fault events by allocating 1000 sampling points to each event.
The OTDR signal curve undergoes reflection and attenuation under different conditions of the optical cable line. It is non-stationary in both the time domain and the frequency domain. The frequency components of the reflection event are typically concentrated in the high-frequency band, while the non-reflection event signal usually has a lower-frequency element. Wavelet packet decomposition is a time-frequency analysis method that decomposes the original signal into a series of frequency bands. This decomposition allows for a better reflection of the signal resolution in each frequency band [30]. The steps for feature extraction using wavelet packet are as follows: 1. Select "rbio3.1" as the wavelet base to decompose the OTDR signal curve in layers, and frequency bands can be obtained.
where N is the sum of the coefficients of each frequency band.
4. To reconstruct the eigenvector, the total energy of each node is used as an element. The energy ratio of each node is then calculated. The X eigenvector can be expressed in Formula (16): The OTDR signal curve undergoes reflection and attenuation under different conditions of the optical cable line. It is non-stationary in both the time domain and the frequency domain. The frequency components of the reflection event are typically concentrated in the high-frequency band, while the non-reflection event signal usually has a lower-frequency element. Wavelet packet decomposition is a time-frequency analysis method that decomposes the original signal into a series of frequency bands. This decomposition allows for a better reflection of the signal resolution in each frequency band [30]. The steps for feature extraction using wavelet packet are as follows:

1.
Select "rbio3.1" as the wavelet base to decompose the OTDR signal curve in layers, and frequency bands can be obtained.

2.
To obtain D p j (k), reconstruct the wavelet packet coefficient d p j at node (j, p). Here, j represents the number of decomposition layers, and p = 0, 1, . . . , 2 j − 1, k = 1, 2, . . . , m. D p j (k) denotes the amplitude of the k-th sampling point corresponding to the p-th sub-band of the j-th layer after decomposition.

3.
Calculate the energy value of each frequency band of the j-th layer according to Formula (15): where N is the sum of the coefficients of each frequency band.

4.
To reconstruct the eigenvector, the total energy of each node is used as an element. The energy ratio of each node is then calculated. The X eigenvector can be expressed in Formula (16): In Equation (17), P i represents the total energy of the signal, while E i j represents the total energy of each node.

5.
The wavelet packet energy feature extraction is separately conducted on the reflection event and the non-reflection event. This process yields two sets of feature vectors, each containing 8 features. These vectors are then normalized. The feature energy and the enlarged energy of the two types of event points are depicted in Figure 6.
In Equation (17), i P represents the total energy of the signal, while j E represents the total energy of each node.
5. The wavelet packet energy feature extraction is separately conducted on the reflection event and the non-reflection event. This process yields two sets of feature vectors, each containing 8 features. These vectors are then normalized. The feature energy and the enlarged energy of the two types of event points are depicted in Figure 6.

Result Analysis
This paper utilizes 500 groups of fault events collected by OTDR as the training set. It also incorporates a total of 57 actual optical cable lines from the new building of China Grid Jilin Electric Power Co., Ltd. to the Sanjiazi 220 kV substation of State Grid Jilin Electric Power Co., Ltd. as the test set for optical cable fault events. In this test set, reflection events are marked as 0, while non-reflection events are marked as 1, with the order of events being randomly disturbed. To compare and evaluate the performance of the LightGBM algorithm, this paper constructs two additional models using classic machine-learning algorithms: SVM and ELM. These models are trained and tested using

Result Analysis
This paper utilizes 500 groups of fault events collected by OTDR as the training set. It also incorporates a total of 57 actual optical cable lines from the new building of China Grid Jilin Electric Power Co., Ltd. to the Sanjiazi 220 kV substation of State Grid Jilin Electric Power Co., Ltd. as the test set for optical cable fault events. In this test set, reflection events are marked as 0, while non-reflection events are marked as 1, with the order of events being randomly disturbed. To compare and evaluate the performance of the LightGBM algorithm, this paper constructs two additional models using classic machine-learning algorithms: SVM and ELM. These models are trained and tested using the same training and test sets. The experimental results, as depicted in Figure 7, show that the LightGBM algorithm achieves the highest recognition accuracy at 94.74%. The ELM algorithm achieves a recognition accuracy of 92.98%, surpassing the SVM algorithm. The SVM algorithm exhibits the lowest recognition accuracy at 82.46%. Therefore, it can be concluded that utilizing the LightGBM algorithm to develop the recognition model is reasonable and can yield the expected recognition effect.
Based on the previous analysis, it can be observed that LightGBM offers several parameters that can be optimized. By optimizing some of these parameters, it is possible to improve the recognition results. In this study, the values of p1, p2, and p3 in the AVOA were set to 0.6, 0.4, and 0.6, respectively. The number of populations was set to 50, and the maximum number of iterations was set to 50. After optimization, the learning_rate value was determined to be 0.3, the feature_fraction value was set to 0.6, the num_leaves value was set to 25, and the max_depth value was set to 3. parameters that can be optimized. By optimizing some of these parameters, it is possible to improve the recognition results. In this study, the values of p1, p2, and p3 in the AVOA were set to 0.6, 0.4, and 0.6, respectively. The number of populations was set to 50, and the maximum number of iterations was set to 50. After optimization, the learning_rate value was determined to be 0.3, the feature_fraction value was set to 0.6, the num_leaves value was set to 25, and the max_depth value was set to 3. In order to compare and study the optimization effect of the AVOA, this paper also uses the classic particle swarm optimization algorithm (PSO) to construct the PSO-LightGBM recognition model. Comparative experiments were conducted to evaluate the recognition accuracy of the two combined models: AVOA-LightGBM and PSO-LightGBM. The results in Figure 7 show that the recognition accuracy of AVOA-LightGBM is 98.25%, while the recognition accuracy of PSO-LightGBM is 96.49%. Both models have higher recognition accuracy compared with the unoptimized recognition model. The fitness curve of the two combined models is depicted in Figure 8. The AVOA exhibits a strong global search ability, allowing for extensive exploration of the search space. This reduces the risk of getting stuck in a local optimal solution. However, the convergence speed of the algorithm may vary depending on the complexity of the problem and parameter settings. On the other hand, the PSO algorithm also possesses some global search ability, but it is less sensitive to parameters and more prone to falling into local optimal solutions when dealing with problems. In order to compare and study the optimization effect of the AVOA, this paper also uses the classic particle swarm optimization algorithm (PSO) to construct the PSO-LightGBM recognition model. Comparative experiments were conducted to evaluate the recognition accuracy of the two combined models: AVOA-LightGBM and PSO-LightGBM. The results in Figure 7 show that the recognition accuracy of AVOA-LightGBM is 98.25%, while the recognition accuracy of PSO-LightGBM is 96.49%. Both models have higher recognition accuracy compared with the unoptimized recognition model. The fitness curve of the two combined models is depicted in Figure 8. The AVOA exhibits a strong global search ability, allowing for extensive exploration of the search space. This reduces the risk of getting stuck in a local optimal solution. However, the convergence speed of the algorithm may vary depending on the complexity of the problem and parameter settings. On the other hand, the PSO algorithm also possesses some global search ability, but it is less sensitive to parameters and more prone to falling into local optimal solutions when dealing with problems.

Model Evaluation
In order to further validate the effectiveness of the model proposed in this paper, four indicators-accuracy, precision, recall, and F1 value-are used to evaluate the performance of the proposed model and compare it with the baseline model. The accuracy rate represents the proportion of correctly predicted samples out of the total

Model Evaluation
In order to further validate the effectiveness of the model proposed in this paper, four indicators-accuracy, precision, recall, and F1 value-are used to evaluate the performance of the proposed model and compare it with the baseline model. The accuracy rate represents the proportion of correctly predicted samples out of the total number of samples, providing an overall measure of recognition accuracy. The precision rate measures the proportion of samples predicted as positive examples that are actually positive, while the recall rate measures the proportion of true positive examples that are correctly predicted as positive. The F1 value, which is the harmonic mean of precision and recall, provides a comprehensive assessment of the model's performance. The experimental results comparing the recognition models are presented in Table 2.  Table 2 demonstrates that directly using the basic LightGBM as a model for optical cable event recognition yields favorable results when processing the same data set. Thanks to the AVOA's exceptional optimization ability, the AVOA-LightGBM model outperforms others in terms of accuracy, precision, recall, and F1 value. Its accuracy rate reaches 98.25%, which is 3.7% higher than that of the single LightGBM model and 19.15% higher than that of the SVM recognition algorithm. This further validates the superiority of the algorithm employed in this paper for optical cable event identification.

Conclusions
In order to enhance the accuracy of power optical cable fault event recognition, this paper presents a novel method called AVOA-LightGBM for pattern recognition of optical cable fault events based on wavelet packet decomposition. The feature vector used in the identification model is obtained by performing wavelet packet decomposition on fault events and extracting the energy vector. This enables accurate identification of reflection events and non-reflection events in the OTDR curve. Experimental results demonstrate that LightGBM achieves the highest recognition accuracy among the four models tested (LightGBM, SVM, ELM, and PSO-LightGBM). Additionally, the AVOA demonstrates the strongest global search capability. The proposed AVOA-LightGBM model achieves a recognition accuracy of 98.24%, which is a 3.7% improvement over the single LightGBM model and 19.15% more accurate than the SVM recognition algorithm. In practical applications of power fiber optic cables, utilizing the AVOA-LightGBM model proposed in this study for identifying the fault event of power fiber optic cable event points holds significant importance for fault diagnosis and maintenance of the fiber optic cable.
This study only identified two types of fault events: reflective events and non-reflective events. However, the OTDR detection curve also includes start events and end events. Therefore, future studies should focus on increasing the detection of these two types of events. Additionally, the AVOA, a newly proposed optimization algorithm in recent years, has several limitations such as parameter sensitivity, limited application range, and effectiveness on various problems. In our future research, we will aim to improve the algorithm itself to address or minimize these limitations.