A Sequence-Aware Surrogate-Assisted Optimization Framework for Precision Gyroscope Assembly Based on AB-BiLSTM and SEG-HHO

Lin, Donghuang; Jian, Yongbo; Yang, Haigen

doi:10.3390/electronics14173470

Open AccessArticle

A Sequence-Aware Surrogate-Assisted Optimization Framework for Precision Gyroscope Assembly Based on AB-BiLSTM and SEG-HHO

by

Donghuang Lin

,

Yongbo Jian

and

Haigen Yang

^*

School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(17), 3470; https://doi.org/10.3390/electronics14173470

Submission received: 19 July 2025 / Revised: 17 August 2025 / Accepted: 25 August 2025 / Published: 29 August 2025

(This article belongs to the Special Issue Advancements in Artificial Intelligence (AI) for Engineering Applications)

Download

Browse Figures

Versions Notes

Abstract

High-precision assembly plays a central role in aerospace, defense, and precision instrumentation, where errors in bolt preload or tightening sequences can directly degrade product reliability and lead to costly rework. Traditional finite element analysis (FEA) offers accuracy but is too computationally expensive for iterative or real-time optimization. Surrogate models are a promising alternative, yet conventional machine learning methods often neglect the sequential and constraint-aware nature of multi-bolt assembly. To overcome these limitations, this paper introduces an integrated framework that combines an Attention-based Bidirectional Long Short-Term Memory (AB-BiLSTM) surrogate with a stratified version of the Harris Hawks Optimizer (SEG-HHO). The AB-BiLSTM captures temporal dependencies in preload evolution while providing interpretability through attention–weight visualization, linking model focus to physical assembly dynamics. SEG-HHO employs an encoding–decoding mechanism to embed engineering constraints, enabling efficient search in complex and constrained design spaces. Validation on a gyroscope assembly task demonstrates that the framework achieves high predictive accuracy (Mean Absolute Error of 3.59 × 10⁻⁵), reduces optimization cost by orders of magnitude compared with FEA, and reveals physically meaningful patterns in bolt interactions. These results indicate a scalable and interpretable solution for precision assembly optimization.

Keywords:

gyroscope; assembly process optimization; surrogate model; deep learning; Harris Hawks Optimization

1. Introduction

High-precision assembly is a critical process in high-technology manufacturing sectors such as aerospace, precision instrumentation, and defense systems. Even minor deviations in bolt torque or tightening sequence can lead to substantial performance degradation and costly rework. Industry analyses indicate that rework and scrap costs can account for 5–30% of total manufacturing costs, with up to 30% of equipment failures traced back to improper installation and assembly procedures [1]. In aerospace manufacturing, such assembly-related defects are among the leading causes of cost overruns. A recent case study in aircraft production demonstrated that implementing advanced defect detection technologies reduced total manufacturing costs by 34.32% [2], highlighting the considerable economic burden imposed by assembly errors. These figures underscore the urgency of developing assembly optimization frameworks that are both accurate and computationally efficient.

Despite the extensive use of finite element method (FEM)-based simulations and surrogate-assisted workflows in industrial settings, significant limitations persist. FEM-based evaluations are computationally expensive—single simulation runs for complex assemblies can take from minutes to hours—rendering them impractical for iterative optimization loops in production environments [3]. Conventional surrogate models, such as support vector regression (SVR) or multilayer perceptrons (MLP), often lack the capacity to capture the sequential dependencies and coupled constraints present in precision assembly processes [4].

Recent advances in surrogate modeling have seen the rise of time-series models such as recurrent neural networks (RNNs) and attention-based architectures [5], which have been successfully applied in domains like manufacturing scheduling and predictive maintenance. However, they are rarely tailored to the physics-informed constraints and sequential dependencies of multi-fastener precision assembly. This gap is particularly evident in the lack of integration between time-series surrogate modeling and constraint-aware optimization. In particular, few studies have addressed constraint handling through constructive encoding–decoding strategies that can inherently satisfy complex physical assembly constraints. Additionally, most existing studies focus on prediction accuracy but neglect interpretability, making it difficult for engineers to link model outputs to underlying physical phenomena.

The assembly process optimization task in this study is fundamentally a constrained combinatorial optimization problem. The core challenge of such problems lies in identifying an optimal solution within a vast, discrete search space that satisfies a complex set of constraints. The academic community has extensively explored these NP-hard problems. Foundational works, such as simulated annealing proposed by Kirkpatrick et al. [6], provided a powerful heuristic framework by simulating the physical annealing process. As problem complexity has increased, more advanced methods have emerged; for instance, in handling highly constrained satisfaction problems, algorithms derived from statistical physics like Survey Propagation [7] have demonstrated unique advantages. These foundational contributions provide theoretical context for our proposed optimizer, emphasizing the need to balance exploration and exploitation in high-dimensional constrained spaces.

To address these challenges, this paper proposes an integrated framework that combines finite element simulation, deep learning-based surrogate modeling, and intelligent optimization. As illustrated in Figure 1, this framework consists of three main stages: first, a sample dataset containing various assembly process schemes is constructed through high-fidelity finite element simulation; second, an Attention-based Bidirectional Long Short-Term Memory (AB-BiLSTM) network is designed as a high-fidelity, millisecond-level surrogate for FEM simulations, while attention weight visualization enables interpretability by linking model focus to physical changes; finally, using the trained surrogate model as the fitness function, a Stratified Evolutionary Group-based Harris Hawks Optimizer (SEG-HHO) algorithm is proposed, which dynamically partitions the population into three groups with different tasks and introduce evolutionary operator, to perform a global search for the optimal bolt tightening sequence and preloads that minimize the orthogonal axis error.

The main contributions of this paper are as follows, each directly addressing the research gaps identified above:

(1): Integrated Framework for Constrained Assembly Optimization: To address the computational inefficiency of FEM-based simulations and the lack of integration between time-series surrogates and constraint-aware optimization, we propose an end-to-end framework that seamlessly combines FEM data generation, deep learning surrogates, and metaheuristic optimization, enabling efficient handling of sequential dependencies and physical constraints in precision assembly.
(2): High-Efficiency Surrogate Model with Interpretability: Addressing the limitations of traditional surrogate models in capturing sequential dependencies, we developed an AB-BiLSTM model specifically designed for multi-fastener assembly. This model achieves millisecond-level predictions that replace costly finite element method (FEM) evaluations, while incorporating an attention mechanism to enhance interpretability by correlating attention weights with physical phenomena such as preload-induced deformations.
(3): Novel Constraint-Handling Framework and Enhanced Optimizer: To address the critical challenge of incorporating complex physical rules into the optimization loop, we propose two tightly coupled innovations. First, we introduce a constructive encoding–decoding mechanism, a core contribution of our framework that guarantees 100% of candidate solutions are physically feasible by design. This approach fundamentally differs from traditional penalty or repair methods and is pivotal for efficient search. Second, to navigate the resulting complex search space, we developed the SEG-HHO algorithm. Unlike prior HHO improvements that often focus on single-operator enhancements, SEG-HHO implements a novel stratified evolutionary framework that partitions the population into functionally distinct groups, systematically balancing global exploration and local exploitation to improve convergence and maintain solution diversity.
(4): Attention Visualization for Physical Insights: To address the interpretability gap in existing time-series models, we conduct a qualitative-to-quantitative analysis of attention weights, correlating model attention patterns with key assembly steps (e.g., symmetric bolt pairs), thus offering engineers actionable insights into error sources.

The remainder of this paper is organized as follows: Section 2 reviews related work in the field. Section 3 and Section 4 detail the proposed error prediction model, hybrid encoding–decoding mechanism, and optimization algorithm. Section 5 presents detailed comparative experiments and result analysis. Section 6 discusses the advantages, limitations, and potential extensions of the proposed framework. Finally, Section 7 concludes this study.

2. Related Work

This research intersects with three primary areas: assembly error analysis, surrogate-based engineering optimization, and metaheuristic algorithms. This section provides a review of the relevant studies in these domains.

In the domain of assembly error analysis, FEM has been the conventional tool for simulating the impact of processes like bolted connections on structural integrity [8,9]. For instance, Wang et al. [10] conducted a detailed analysis using FEM on the effect of tightening sequences on the contact pressure distribution in multi-bolted joints. While these studies offer profound insights into the physical mechanisms of error generation, their high computational cost limits their application in multi-scenario evaluation and automated optimization.

To address this efficiency bottleneck, early research explored data-driven surrogate models. Classic methods include Support Vector Regression (SVR) [11], Response Surface Methodology (RSM) [12], Kriging [13], and Radial Basis Functions (RBF) [14]. However, these models often rely on polynomial or kernel-based functions, which limits their expressive power when confronted with the high-dimensional, non-linear, and sequentially dependent nature of multi-bolt tightening processes. Their inability to capture temporal dependencies makes them suboptimal for modeling processes where the order of operations is critical.

With the rise of deep learning, models designed for sequential data have gained prominence. Recurrent Neural Networks (RNN) [15] and their advanced variants like Long Short-Term Memory (LSTM) [16] and Gated Recurrent Unit (GRU) [17] have demonstrated exceptional capabilities in capturing temporal patterns. This makes them naturally suited for modeling the bolt tightening process, where each step’s outcome is conditioned on all previous steps. While a conventional Bidirectional LSTM (BiLSTM) [18] can process the sequence from both forward and backward directions to capture context, it treats all steps with uniform importance. This is a significant drawback, as not all assembly steps contribute equally to the final error. In contrast, our proposed Attention-based BiLSTM (AB-BiLSTM) enhances this architecture by introducing a temporal attention mechanism. This allows the model to dynamically assign higher weights to more influential steps—such as the final locking torque—thereby achieving a more physically meaningful representation and superior predictive accuracy compared to standard BiLSTM models that overlook this critical temporal hierarchy.

The field of surrogate-assisted optimization continues to evolve rapidly. Beyond RNN-based architectures, other powerful models have been explored. Gaussian Processes (GPs) [19], a cornerstone of Bayesian optimization, offer a probabilistic approach by providing not only predictions but also uncertainty estimates, which can be invaluable for guiding the search process in active learning scenarios. However, standard GPs scale poorly with large datasets (typically with cubic complexity) [20] and are not inherently designed to handle sequential or permutable inputs, making them less suitable for our specific problem.

More recently, Transformer-based architectures, which have revolutionized natural language processing, are being adapted for surrogate modeling in physical systems [21]. Their self-attention mechanism allows them to capture long-range dependencies more effectively than traditional RNNs, making them a promising candidate for complex manufacturing processes. A study by Zhang et al. [5] demonstrated the potential of Transformers for predicting multi-physics field distributions in gas turbines. While powerful, Transformers often require larger datasets for effective training, and their application to constrained, permutation-based problems like ours is still an emerging research area. Given our dataset size and the clear sequential nature of the problem, the AB-BiLSTM provides a robust and efficient solution tailored to the task’s specific structure.

The task of optimizing bolt tightening parameters is a constrained combinatorial optimization problem, which is NP-hard. Metaheuristic algorithms are widely adopted for such problems due to their gradient-free nature and strong global search capabilities. Classic algorithms such as Genetic Algorithm (GA) [22,23,24], Particle Swarm Optimization (PSO) [25,26,27], and Ant Colony Optimization (ACO) [28,29,30] have a long history of application in assembly sequence planning. These algorithms have proven effective in navigating vast, discrete search spaces. However, their performance can degrade when faced with the hybrid (discrete sequence and continuous preload) and tightly constrained solution space of our problem.

The Harris Hawks Optimization (HHO) [31] is a more recent swarm intelligence algorithm that has demonstrated excellent performance in many engineering optimization problems [32,33,34,35,36]. However, the standard HHO algorithm is not without its limitations. As noted in several studies [32,36], when applied to high-dimensional or complex multimodal problems, HHO can suffer from a decline in population diversity during the later stages of the search, leading to premature convergence to local optima. To address these shortcomings, researchers have proposed numerous HHO variants. These improvements can be broadly categorized into three groups: (1) hybridization, which combines HHO with other optimizers like GA to leverage their complementary strengths; (2) population enhancement, which introduces mechanisms like chaotic initialization or population stratification to improve diversity and exploration; and (3) operator modification, which refines the exploration or exploitation phases of HHO. Our proposed SEG-HHO falls into three categories by introducing a stratified evolutionary framework and hybrid search mechanisms, aiming for a more robust balance between exploration and exploitation.

The integration of metaheuristics with simulation and surrogate models is a highly active research area, enabling the optimization of complex systems that were previously computationally intractable. Recent works continue to push the boundaries of this paradigm. For instance, Zhang et al. [4] applied a surrogate-assisted optimization framework to manage O-ring stress in solid rocket motor assembly, highlighting the continued relevance of this approach in aerospace. In another recent work, Xu et al. [16] developed a neural network to predict bolt preload from sensor signals, which can be integrated with optimization algorithms for real-time control. Furthermore, Wan et al. [22] proposed a multi-solution genetic algorithm for assembly sequence planning, demonstrating the ongoing effort to enhance metaheuristics for manufacturing applications. These studies underscore a clear trend towards creating tightly coupled, data-driven optimization loops in intelligent manufacturing. However, for bolt parameter optimization, comparing a few predefined strategies using FEM simulation or experimental validation remains a mainstream approach [10]. This “compare-and-verify” model is essentially an inefficient trial-and-error method that explores a very limited portion of the solution space. When optimization algorithms are indeed deployed, their integration method with high-fidelity simulations or surrogate models becomes the new bottleneck. As demonstrated in the work of Ni et al. [37], this indirect coupling either inherits the high computational cost of FEM or risks the accuracy of its optimization results being limited by an overly simplistic surrogate model.

Our review of the literature reveals two primary gaps. First, although sequence-aware models such as LSTMs have been employed in manufacturing tasks, they are typically developed for generic scheduling or monitoring purposes and rarely tailored to physics-informed constraints in multi-fastener precision assembly. Moreover, these approaches usually lack explicit interpretability, which is critical for engineering decision support. Advanced architectures such as Transformers are only in the nascent stages of adoption in this field. Thus, there is a need for surrogate models that are both accurate and interpretable within the assembly context. Second, although numerous HHO variants have been proposed, few adopt a systematic population structuring strategy that explicitly separates exploration from exploitation. Furthermore, most existing studies rely on a “compare-and-verify” paradigm rather than developing fully automated optimization frameworks.

Therefore, this research is motivated by the need to create an integrated, automated framework. We propose the AB-BiLSTM to build a high-fidelity surrogate that captures sequential dependencies with enhanced interpretability. We then develop the SEG-HHO algorithm, with its stratified population structure, to efficiently and reliably navigate the complex, constrained search space, thereby bridging the identified gaps in the literature.

3. Surrogate Modeling for Assembly Error Prediction

3.1. Dataset Generation via FEM Simulation

To acquire high-fidelity data for training the surrogate model, this study employs the finite element analysis software ANSYS 2023R1 to model and simulate the inertial gyroscope’s bolt assembly process.

The actual assembly process involves two stages: “initial tightening” and “final tightening.” However, according to engineering expert opinion, the final tightening stage has a decisive impact on assembly accuracy. This was corroborated by our preliminary feature selection experiments (detailed in Section 3.3.1), which showed that including initial preload features did not significantly improve prediction accuracy. Furthermore, while the actual assembly contains 48 bolts, the tightening state of 24 key bolts dominates the final structural deformation. Therefore, to enhance simulation efficiency while preserving the problem’s critical physical characteristics, we made the following simplifications: (1) the assembly process only considers the final tightening stage and (2) the analysis focuses on the 24 key bolts.

Based on these assumptions, we established a finite element model comprising the inner and outer rings, rotors, and 48 bolts (see Figure 2). The material properties were set according to the actual engineering materials, and the mesh was refined in critical areas. We used programmatic scripts to control ANSYS for batch simulations, with each data sample representing a complete assembly process. The preload vector

P

was sampled within the range of [3800 N, 7500 N], which was determined based on expert knowledge and torque-to-preload conversion principles. The tightening sequence vector

S

was generated to satisfy physical constraints described in Section 3.2. Both

P

and

S

were initialized using Latin Hypercube Sampling (LHS) to ensure sufficient coverage of the solution space. The output label for each sample is the final assembly’s orthogonal axis error,

ε_{o r t h o}

, obtained from the simulation. Specifically, the two symmetric rotors on the inner ring and the outer ring each define a central axis. In an ideal state, the central axis of the inner ring and that of the outer ring should be perfectly orthogonal (at a 90° angle). However, non-uniform forces during the assembly process cause minute deviations in these axes.

ε_{o r t h o}

is defined as the deviation between the actual angle of these two axes and the ideal 90°. Through this method, we generated a total of 2000 valid “assembly process–assembly error” data pairs, forming the raw dataset for this study.

3.2. Physical Constraints of the Assembly Process

The inertial gyroscope assembly studied in this paper possesses a precision structure, featuring an inner and an outer ring, each housing two rotors: one for angular velocity measurement and the other serving as a drive motor. Each rotor is secured by 6 bolts, resulting in a total of 24 critical axial bolt connections that require precise tightening. To guarantee the final performance of the gyroscope, its assembly process must adhere to a strict set of procedural specifications derived from engineering practice. The core physical objective of these specifications is to achieve a uniform stress distribution and to minimize assembly-induced structural deformations and asymmetries, which are the primary sources of orthogonal error.

These process specifications are embodied in the following four key physical constraints:

(1): Grouped Assembly: The 24 bolts are logically divided into four groups, corresponding to the four rotors (inner ring measurement, inner ring drive, outer ring measurement, outer ring drive). During assembly, all bolts within one group must be tightened before proceeding to the next.
(2): Fixed Inter-Group Sequence: The tightening order of the four bolt groups is strictly fixed: first the inner ring measurement rotor, followed by the inner ring drive rotor, then the outer ring measurement rotor, and finally the outer ring drive rotor. This constraint ensures consistency in the macroscopic loading process.
(3): Symmetric Pairing: Within each 6-bolt group, the bolts are paired to form three physically symmetric pairs. Adhering to the principle of symmetry is crucial for ensuring balanced load application and counteracting bending moments.
(4): Adjacent Tightening of Pairs: This is the most critical micro-operational constraint. Within a symmetric pair, the tightening operations for the two bolts must be consecutively adjacent. This action aims to rapidly close the local moment loop, preventing minor component warping due to one-sided forces, which is paramount for controlling the final orthogonal axis error.

3.3. Data Preprocessing and Feature Engineering

3.3.1. Feature Selection

To further refine the surrogate inputs, we excluded the initial preload variable. This decision was motivated by both expert insights and experimental evidence. According to domain experts from our collaborating research institute, the initial preload has negligible influence on the subsequent error propagation, thus adding unnecessary complexity to the surrogate. To validate this claim, we conducted a comparative analysis: when the initial preload was included, the model’s predictive accuracy degraded notably, with the R² score dropping from 0.5353 to −0.0331. This can be explained by the fact that the initial preload is highly uncertain in practice, and including it introduces noise rather than an informative signal. Excluding this feature not only simplified the model but also improved its stability. A concise summary of this comparison is presented in Table 1. Consequently, we regenerated the dataset with only the final preload considered and excluded all features related to initial preload during feature engineering, aiming to enhance both model performance and generalization capability. Beyond simplifying the model structure, this feature selection strategy is highly practical, as it focuses on the final state of the tightening process, which is usually of key interest in engineering applications.

3.3.2. Data Transformation and Normalization

Input Features: The bolt ID (1-24) in the tightening sequence, being categorical data, is fed directly to the model as integers to be handled by an internal embedding layer. The continuous preload feature is normalized using Z-score standardization (StandardScaler) to follow a standard normal distribution with a mean of 0 and a variance of 1. This method is more robust to potential outliers compared to Min-Max scaling and is beneficial for the stable training of neural networks by centering the data.

Target Output: Considering the orthogonal axis error is a non-negative value exhibiting a right-skewed distribution close to zero, we first apply a log(1 + p) transformation to mitigate the skewness. Subsequently, we use Min-Max Normalization (MinMaxScaler) to scale the target values into the [0, 1] range. Alternative transformations (e.g., Z-score Normalization, power transformation, and Min-Max Normalization only) were evaluated during preprocessing, but log(1 + p) with Min-Max Normalization yielded the most stable and optimal performance for subsequent model training. This series of transformations not only makes the target distribution easier for the model to learn but also tries to ensure that the de-normalized output from the model cannot be a physically impossible negative value.

3.4. The Proposed AB-BiLSTM Predictive Model

In the complex assembly process, the impact of different tightening steps on the final assembly accuracy is not equal. Some key steps, such as initial diagonal tightening or final torque loading, may contain more decisive information than intermediate steps. However, standard RNNs or LSTMs may assign similar weights to all time steps when processing long sequences or “forget” the impact of early key steps due to long-distance dependency issues. To overcome this limitation, this paper designs an Attention-based Bidirectional Long Short-Term Memory (AB-BiLSTM) network. The core idea of this model is to treat the complex assembly process as a time-series problem and leverage a deep learning model to automatically capture the profound non-linear and temporal dependencies within it.

As illustrated in Figure 3, the overall architecture of the AB-BiLSTM model follows an “Encode–Attend–Decode” pattern, primarily consisting of four core modules:

(1): Input Embedding Layer: Converts the heterogeneous input of each time step (a discrete bolt ID and a continuous preload value) into a unified high-dimensional feature vector.
(2): BiLSTM Feature Extraction Layer: Processes the feature sequence from both forward and backward directions to capture the complete contextual information of each assembly step.
(3): Temporal Attention Layer: Applies weights to the output of the BiLSTM layer to dynamically identify and focus on the “critical steps” that have the greatest impact on the final error.
(4): Output Layer: Integrates the weighted information and regresses the final orthogonal axis error value.

Before detailing each module, we provide a concise summary of the model’s input–output definition in Table 2. This table explicitly defines each component’s meaning, type, dimension, and preprocessing method.

The following subsections will provide a detailed explanation of the implementation of each module:

3.4.1. Input Embedding Layer

For the input features (bolt ID and preload) at each time step, the model first maps them into a unified high-dimensional feature space via a hybrid embedding layer. The discrete bolt ID is converted into a dense vector

e_{i d} \in R^{d_{e}}

using a trainable embedding layer, allowing the model to learn the underlying relationships (e.g., symmetry) between different bolts autonomously. The continuous preload value is projected into a vector

e_{f o r c e} \in R^{d_{e}}

using a fully connected layer. Finally, the two vectors are concatenated to form a fused feature vector for the time step,

x_{t} = [e_{i d}; e_{f o r c e}] \in R^{2 d_{e}}

, which serves as the input for subsequent layers.

3.4.2. BiLSTM Feature Extraction Layer

To accurately capture the complex dependency structures in the bolt tightening sequence, we employ a bidirectional LSTM (BiLSTM) network, which processes the input in both forward and backward directions. This structure enables the model to utilize full contextual information from the sequence, capturing both past and future interactions between tightening steps.

Let the embedded input at time step

t

be

x_{t} \in R^{d}

. The forward LSTM computes a hidden state

{\vec{h}}_{t}

, and the backward LSTM computes

{\overset{\leftarrow}{h}}_{t}

. The final representation for step

t

is the concatenation:

h_{t} = [{\vec{h}}_{t}; {\overset{\leftarrow}{h}}_{t}] \in R^{2 d_{h}},

(1)

where

d_{h}

is the hidden dimension of each LSTM. The entire sequence of hidden states is denoted

H = [h_{1}, \dots, h_{T}] \in R^{T \times 2 d_{h}}

.

This dual-directional representation enables the model to infer the relative importance of tightening steps not only based on their local context but also from the global sequence pattern.

3.4.3. Temporal Attention Mechanism

To differentiate the importance of various tightening steps, an attention mechanism is introduced after the BiLSTM layer. Let

H = [h_{1}, h_{2}, \dots, h_{T}] \in R^{T \times d}

denote the sequence of hidden states from the BiLSTM, where

T

is the sequence length and

d

is the dimensionality of each hidden state.

The attention mechanism computes a context vector

c \in R^{d}

, representing a weighted summary of the entire sequence, as follows:

Each hidden state is passed through a feedforward attention network to compute an unnormalized attention score $e_{t}$ :

$e_{t} = v^{T} t a n h (W_{a} h_{t} + b_{a}),$

(2)

where $W_{a} \in R^{d_{a} \times d}$ , $b_{a} \in R^{d_{a}}$ , and $v \in R^{d_{a}}$ are trainable parameters, and $d_{a}$ is the attention dimension.
The attention weights α_t are obtained via a softmax over the attention scores:

$α_{t} = \frac{e x p (e_{t})}{\sum_{k = 1}^{T} e x p (e_{k})},$

(3)
The final context vector $c$ is computed as the weighted sum of the hidden states:

$c = \sum_{t = 1}^{T} α_{t} h_{t},$

(4)

This context vector

c

encodes the dynamic contribution of each assembly step and is passed to the output layer for final error prediction.

3.4.4. Output Layer and Loss Function

Finally, the context vector

c

is passed through a fully connected layer to regress the final assembly error (which has been transformed and normalized). The output layer employs the Softplus activation function to ensure the model’s output is strictly non-negative.

To train the model, we designed a combined loss function to handle the small-valued, outlier-sensitive target variable. This function integrates the Mean Absolute Error (MAE) and the Log-Cosh loss [38]:

L = λ \times L_{M A E} + (1 - λ) \times L_{L o g C o s h},

(5)

The Log-Cosh loss is defined as

L_{L o g C o s h} = l o g (c o s h (ŷ - y))

, where

ŷ

is the predicted value and

y

is the true value.

λ

is the weight between 0 and 1.

This combined loss function was carefully designed to match the prediction target of assembly error in this study. Specifically, the Log-Cosh loss offers a unique advantage: when the prediction error is small (as is the case for most instances in our task), its functional form closely approximates that of Mean Squared Error (MSE), providing a smooth, twice-differentiable surface that facilitates stable and precise fine-tuning near the optimal solution. Conversely, when the error is large, it behaves similarly to MAE, which significantly reduces the model’s sensitivity to potential outliers in the dataset, thereby enhancing its robustness. By combining Log-Cosh directly with MAE, we further amplify this robustness to outliers while maintaining a smooth gradient for optimization, making it an ideal choice for predicting small, non-negative error values.

4. Process Parameter Optimization via SEG-HHO

4.1. Mathematical Formulation of the Optimization Problem

Leveraging the trained surrogate model

f_{A B - B i L S T M} (\cdot)

from Section 3.4, the process parameter optimization for the gyroscope assembly can be modeled as a constrained minimization problem. Our goal is to find an optimal tightening sequence

S

and a preload vector

P

that minimize the orthogonal axis error predicted by the surrogate model:

Decision Variables
- The tightening sequence vector $S = [s_{0}, s_{1}, \dots, s_{23}]$ , where $s_{t}$ represents the ID of the bolt being operated on at the $t$ -th tightening step. $S$ is a permutation of the set ${0, 1, \dots, 23}$ .
- The preload vector $P = [p_{1}, p_{2}, \dots, p_{24}]$ , where $p_{i}$ represents the final preload value for the bolt with ID $i$ .
Objective Function
min $f_{A B - B i L S T M} (S, P)}$ .
Constraints
To formally describe the assembly constraints detailed in Section 3.2, we first define the bolt groups. The 24 bolts are partitioned into four disjoint sets: $G_{0} = {0, \dots, 5}$ , $G_{1} = {6, \dots, 11}$ , $G_{2} = {12, \dots, 17}$ , and $G_{3} = {18, \dots, 23}$ . The decision variables must satisfy the following conditions:
- $p_{i} \in [P_{m i n}, P_{m a x}]$ , for all $i \in {0, \dots, 23}$ .
- must satisfy the four process constraints of grouping, group order, symmetry, and adjacency as defined in Section 3.2.

This mathematical model clearly defines the optimization task. However, because the decision variable

S

is a discrete permutation subject to complex combinatorial constraints, traditional methods like gradient-based optimization methods are not directly applicable. Therefore, we employ a metaheuristic algorithm and handle these constraints using a specialized encoding–decoding mechanism, which will be introduced in Section 4.2 and Section 4.3.

4.2. The Improved Harris Hawks Optimization Algorithm, SEG-HHO

4.2.1. Standard HHO

The Harris Hawks Optimization (HHO) algorithm is a metaheuristic algorithm that mimics the cooperative predatory behavior of Harris’s hawks. Its core lies in simulating the strategies of exploration and exploitation that the hawk flock employs at different stages to capture prey (the optimal solution). A key parameter in the optimization process is the prey’s escaping energy,

E

, which decreases from 2 to 0 as the number of iterations

t

increases. Its formula is:

E = 2 E_{0} (1 - \frac{t}{T_{m a x}}),

(6)

where

T_{m a x}

is the maximum number of iterations, and

E_{0}

is a random initial energy varying within

(- 1, 1)

. The HHO process primarily consists of two phases:

Exploration Phase ( $| E | \geq 1$ ): In this phase, the hawks widely search for prey, updating their position $X (t + 1)$ using one of two strategies based on a random number $q$ :

$\{\begin{matrix} X (t + 1) = X_{r a n d} (t) - r_{1} |X_{r a n d} (t) - 2 r_{2} X (t)|, q < 0.5 \\ X (t + 1) = |X_{r a b b i t} (t) - X_{m} (t)| - r_{3} [l b + r_{4} (u b - l b)], q \geq 0.5 \end{matrix},$

(7)

where $X_{r a b b i t}$ is the current best solution (prey’s location), $X_{r a n d}$ is a randomly selected hawk, $X_{m}$ is the average position of the population, and $r_{1}$ to $r_{4}$ are random numbers in $(0,1)$ ; $l b$ and $u b$ , respectively, represent the lower and upper bounds of the search space.
Exploitation Phase ( $| E | < 1$ ): The hawks besiege the discovered prey. This phase simulates four different attack behaviors based on the energy $| E |$ and the escape probability $r$ :

Soft Besiege: When $r \geq 0.5$ and $| E | \geq 0.5$ , the hawks gradually encircle the prey.

$X (t + 1) = Δ X (t) - E | J X_{r a b b i t} (t) - X (t) |,$

(8)

where $Δ X (t) = X_{r a b b i t} (t) - X (t)$ represents the positional difference between the current individual and the prey. $J = 2 (1 - r_{5})$ represents the random jumping intensity of prey during escape, and $r_{5}$ is a random number in $(0, 1)$ .
Hard Besiege: When $r \geq 0.5$ and $| E | < 0.5$ , the prey is exhausted, and the hawks tighten their encirclement.

$X (t + 1) = X_{r a b b i t} (t) - E | Δ X (t) |,$

(9)
Soft Besiege with Progressive Rapid Dives: When $r < 0.5$ and $| E | \geq 0.5$ , the prey attempts an erratic escape, and the hawks employ dive strategies based on Levy Flight (LF).

$X (t + 1) = \{\begin{matrix} Y, f (Y) < f (X (t)) \\ Z, f (Z) < f (X (t)) \end{matrix},$

(10)

$Y = X_{r a b b i t} (t) - E | J X_{r a b b i t} (t) - X (t) |,$

(11)

$Z = Y + S \times L F (D),$

(12)

$L F (x) = 0.01 \times \frac{u \times σ}{| v |^{\frac{1}{β}}},$

(13)

$σ = {(\frac{Γ (1 + β) \times s i n (\frac{π β}{2})}{Γ (\frac{1 + β}{2}) \times β \times 2^{(\frac{β - 1}{2})}})}^{\frac{1}{β}},$

(14)

where $f (\cdot)$ represents the individual’s fitness value; $D$ is the dimension of the problem; $S$ is a $1 \times D$ random vector; $u, v$ are random numbers within (0,1); $β$ is usually set to 1.5; and $Γ$ is a standard gamma function with the following expression: $Γ (x) = \int_{0}^{+ \infty} e^{- t} t^{x - 1} d t$ .
Hard Besiege with Progressive Rapid Dives: When $r < 0.5$ and $| E | < 0.5$ , the hawks launch a final assault, incorporating Levy Flight in the final encirclement:

$X (t + 1) = \{\begin{matrix} Y, f (Y) < f (X (t)) \\ Z, f (Z) < f (X (t)) \end{matrix},$

(15)

$Y = X_{r a b b i t} (t) - E |J X_{r a b b i t} (t) - X_{m} (t)|,$

(16)

$Z = Y + S \times L F (D),$

(17)

4.2.2. The Proposed Stratified and Hybrid Search Strategies

To overcome the limitations of the standard HHO, such as its tendency to fall into local optima and its insufficient exploration capability when dealing with complex optimization problems, this paper introduces a series of improvements in three key areas: population initialization, population structure, and search strategy, forming the Stratified Evolutionary and Hybrid search-based SEG-HHO algorithm.

Initialization via Tent Chaotic Map. Unlike the standard random initialization of HHO, our algorithm employs a Tent chaotic map to generate the initial population. In contrast to standard random initialization, which can lead to population clustering, the ergodic and non-repetitive properties of chaotic systems allow the initial population to be more uniformly distributed throughout the search space, thereby enhancing the algorithm’s global search efficiency. The mathematical definition of the Tent map is as follows:

z_{t + 1} = \{\begin{matrix} \frac{z_{t}}{μ}, 0 < z_{t} \leq μ \\ \frac{{1 - z}_{t}}{1 - μ}, μ < z_{t} < 1 \end{matrix},

(18)

where

z_{t} \in (0,1)

is the chaotic variable at iteration

t

, and

μ \in (0,1)

is the control parameter, typically set to

μ = 0.5

for standard Tent behavior

We first randomly generate an initial chaotic vector

z_{0}

and then iterate the map

N

times (with

N

being the population size) to produce a set of chaotic values

\{z_{1}, z_{2}, \dots, z_{N}\}

. Finally, these values are mapped to the problem’s search space

[X_{m i n}, X_{m a x}]

to generate the initial population’s position vectors

X

(each individual has

D

dimensions) using the following equation:

X_{i, j} = X_{m i n}^{(j)} + z_{i, j} \times (X_{m a x}^{(j)} - X_{m i n}^{(j)}), i = 1, \dots, N; j = 1, \dots, D,

(19)

Stratified Evolutionary Framework. Building on this, we introduce the concept of stratified evolution, where the population is dynamically divided into three functionally distinct sub-populations in each generation based on individual fitness values: the top group (top 1/3, responsible for deep exploitation), the middle group (middle 1/3, responsible for adaptive hybrid strategy), and the bottom group (bottom 1/3, responsible for broad exploration). This structured population makes it possible to design targeted search strategies for sub-populations with different characteristics.

Hybrid Search Mechanisms for Different Evolutionary Groups. To achieve efficient global optimization, each sub-population is assigned a unique update strategy designed to overcome specific deficiencies of the standard HHO:

Bottom Group: Incorporating SCA for Structured Exploration. The exploration mechanism of standard HHO relies on random walks, which can sometimes result in inefficient search paths. To address this, we integrate the core principles of the Sine Cosine Algorithm (SCA) to guide the bottom group. SCA leverages the smooth, periodic oscillations of sine and cosine functions, enabling exploratory individuals to probe the space around the current best solution in a wave-like, more directional manner. Its position is updated as follows:
When $q < 0.5$ ,

$\{\begin{matrix} X (t + 1) = X (t) + R_{1} \times s i n (R_{2}) \times |R_{3} X_{r a n d} (t) - X (t)|, R_{4} < 0.5 \\ X (t + 1) = X (t) + ⌈ R_{1} \times c o s (R_{2}) \times |R_{3} X_{r a n d} (t) - X (t)|, R_{4} \geq 0.5 \end{matrix},$

(20)

where $R_{1}$ is a control parameter that linearly decreases from 2 to 0 to balance exploration and exploitation; $R_{2}$ , $R_{3}$ , and $R_{4}$ are random control parameters for SCA in the ranges $[0, 2 π]$ , $[0, 2]$ , and $[0, 1]$ respectively. This structured exploration, compared to purely random jumps, more effectively covers promising regions.
Middle Group: Alternating Hybrid Search with HHO and Differential Evolution (DE). To improve exploration–exploitation balance and enhance population diversity, the middle group adopts an alternating hybrid strategy based on iteration count. Specifically, the algorithm executes HHO during odd-numbered iterations and a DE scheme during even-numbered iterations.
In the HHO phase (odd iterations), the algorithm determines its behavior based on the prey’s escaping energy. When the energy is high, the algorithm performs exploration using enhanced strategies inspired by SCA, which enables the search agents to traverse the space more dynamically. When the energy is low, the algorithm switches to the four classical exploitation strategies of HHO, allowing the population to rapidly converge toward the current best solution.
In the DE phase (even iterations), the algorithm introduces the “DE/rand/1” mutation and binomial crossover to inject diversity and prevent premature convergence. For each target vector $X_{i}$ , a mutant vector $V_{i}$ is generated using:

$V_{i} (t) = X_{r 1} (t) + F_{m u t} \times (X_{r 2} (t) - X_{r 3} (t)),$

(21)

where $r 1$ , $r 2$ , and $r 3$ are distinct random indices from the population and $F_{m u t}$ is the mutation scaling factor. This is followed by binomial crossover to produce the trial vector $U_{i}$ .
DE generates new individuals based on the difference vectors between existing population members, which is an extremely effective mechanism for maintaining diversity. By introducing this “perturbation,” we significantly enhance the algorithm’s ability to escape local optima in complex multimodal problems.
Top Group: Intensifying Local Exploitation to Accelerate Convergence. For the top group, which consists of the fittest individuals, the primary mission is the rapid and precise excavation of promising regions already discovered. Therefore, we retain and intensify the highly efficient hard and soft besiege strategies from the standard HHO. This ensures that the algorithm can leverage the collective intelligence of the population in its final stages to perform a swift refinement of the optimal solution, thereby guaranteeing final convergence accuracy.

The complete process of the proposed SEG-HHO algorithm is summarized in Algorithm 1. The flowchart of SEG-HHO is shown as Figure 4.

Algorithm 1 The Proposed SEG-HHO Algorithm

Input:

N

: Population size

D

: Problem dimension

T_{m a x}

: Maximum number of iterations

F_{m u t}

, C_{r}

: Parameters for Differential Evolution

L B, U B

: Lower and upper bounds of the search space
Output:

X_{b e s t}

: The best solution found

f (X_{b e s t})

: The fitness of the best solution

1 : Initialize population X = {X_{1}, X_{2}, \dots, X_{N}}

using the Tent chaotic map.

2 : Evaluate the fitness f (X_{i})

for each individual X_{i}

.

3 : Identify the initial global best solution X_{b e s t}

.
4:

5 : for t = 1

to T_{m a x}

do

6 : Sort the population X

by fitness in ascending order.

7 : Partition X

into three groups:

8 : X_{t o p} \leftarrow

Top N / 3

individuals (indices 1 to N / 3

).

9 : X_{m i d} \leftarrow

Middle N / 3

individuals (indices N / 3 + 1

to 2 N / 3

).

10 : X_{b t m} \leftarrow

Bottom N / 3

individuals (indices 2 N / 3 + 1

to N

).
11:
12: // Process the Top Group: Intensive Exploitation

13 : for each individual X_{i}

in X_{t o p}

do

14 : X_{i} \leftarrow

Apply_HHO_Exploitation_Phase (X_{i}, X_{b e s t})

15: end for
16:
17: // Process the Bottom Group: SCA-Enhanced Exploration

18 : for each individual X_{i}

in X_{b t m}

do
19: // This group exclusively uses the HHO exploration logic

20 : X_{i}

\leftarrow

Apply_SCA_Enhanced_Exploration_Phase (X_{i}

)
21: end for
22:
23: // Process the Middle Group: Alternating Hybrid Search

24 : if \mod (t

, 2) \neq

0 then // Odd Iteration

25 : for each individual X_{i}

in X_{m i d}

do

26 : X_{i}

\leftarrow

Apply_Full_HHO_Operator (X_{i}

, X_{b e s t}

) // Includes both SCA_Enhanced_Exploration and exploitation
27: end for
28: else // Even Iteration

29 : for each individual X_{i}

in X_{m i d}

do

30 : X_{i}

\leftarrow

Apply_DE_Operator (X_{i}

, X_{i n t e r}

) // DE operates within its own subgroup
31:          end for
32:      end if
33:
34:      // Combine, Evaluate, and Update Global Best

35 : Combine all updated individuals from the three groups into the main population X

.
36: Evaluate the fitness of any newly generated individuals.

37 : Update X_{b e s t}

if a better solution is found in the current population X

.
38: end for
39:

40 : Return X_{b e s t}

To further clarify the roles of each sub-population and the novelty of our proposed hybrid strategy, Table 3 provides an intuitive summary of the SEG-HHO phases, and Table 4 offers a detailed comparison between SEG-HHO and its foundational algorithms.

As shown in Table 3, the core novelty of SEG-HHO is not merely the inclusion of DE and SCA operators but the creation of a systematic, role-based population structure. Unlike many hybrid algorithms that apply the same logic to all individuals, SEG-HHO assigns specific, complementary tasks to different population segments, creating a more organized and efficient search dynamic. Table 4 further contrasts SEG-HHO with its component algorithms, highlighting its unique synthesis.

4.3. Encoding and Decoding Mechanism for Assembly Constraints

To employ the continuous variable-based SEG-HHO algorithm for solving the mixed-variable optimization problem, which involves continuous preload variables and discrete tightening sequence variables, an effective encoding and decoding scheme is imperative. This scheme is designed to map a complex solution that adheres to the assembly process constraints (as detailed in Section 3.2) into a standardized real-valued vector, thereby building a bridge from the algorithm’s search space to the physically feasible solution space.

4.3.1. Definition of the Hybrid Encoding Vector

We encode each candidate solution into a 48-dimensional real-valued vector

X \in R^{48}

. This vector is composed of three concatenated parts:

X = [P; K; Δ],

(22)

where:

P = [p_{0}, p_{1}, \dots, p_{23}]

is a 24-dimensional vector directly representing the preload values of the 24 bolts. Each element

p_{i}

is constrained within a predefined physical boundary

[P_{m i n}, P_{m a x}]

.

K = [k_{0}, k_{2}, \dots, k_{11}]

is a 12-dimensional vector used to encode the relative tightening order of the three symmetric bolt pairs within each of the four bolt groups.

Δ = [δ_{0}, δ_{1}, \dots, δ_{11}]

is a 12-dimensional vector used to encode the internal tightening direction (i.e., which bolt is tightened first) for each of the 12 symmetric pairs.

Each element within the

K

and

Δ

vectors is bounded in the range

[0, 1]

.

4.3.2. The Decoding Process

Decoding is the process of converting an arbitrary 48-dimensional vector

X

, generated by the optimization algorithm, into a unique and physically valid tightening sequence

S

and a preload vector

P

.

Preload Decoding: The vector $P$ requires no further decoding; its 24 elements directly correspond to the final preload values for the 24 bolts.
Tightening Sequence Decoding: The generation of the tightening sequence $S$ adheres to the predefined assembly constraints and is achieved through a programmatic decoding of the $K$ and $Δ$ vectors. The decoding algorithm proceeds according to the fixed group order ( $G 0 \to G 1 \to G 2 \to G 3$ ), with the following steps:
- For the $g$ -th bolt group ( $g \in {0, 1, 2, 3}$ ), its corresponding sequence-encoding sub-vector $K_{g} = [k_{3 g}, k_{3 g + 1}, k_{3 g + 2}]$ and direction-encoding sub-vector $Δ_{g} = [δ_{3 g}, δ_{3 g + 1}, δ_{3 g + 2}]$ are extracted.
- The $a r g s o r t$ operation is applied to $K_{g}$ to determine the tightening order index ${O r d e r}_{p a i r}$ for the three symmetric pairs within the group.
- The three pairs are processed sequentially according to ${O r d e r}_{p a i r}$ . For the $j$ -th symmetric pair ${A, B}$ , its corresponding direction-encoding value $δ_{g, j}$ is queried. If $δ_{g j} < 0.5$ , the tightening order is $A \to B$ ; otherwise, it is $B \to A$ .
- After executing the decoding for all four groups, a final tightening sequence vector $S$ of length 24, which fully satisfies all adjacency and symmetry constraints, is concatenated.

Through this encoding–decoding mechanism, every candidate solution generated in each iteration of the optimization algorithm can be deterministically mapped to a valid assembly process scheme. Its performance is then rapidly evaluated by the AB-BiLSTM surrogate model described in Section 3.4.

The encoding and decoding mechanisms are shown as Figure 5 and Figure 6. Figure 5 provides a conceptual and schematic overview of the encoding–decoding mechanism. This diagram illustrates the mapping from the 48-dimensional continuous vector

X

, optimized by SEG-HHO, to the final physically constrained assembly plan, consisting of a discrete tightening sequence

S

and a continuous preload vector

P

. The decoding logic ensures all assembly rules (grouping, symmetry, and adjacency) are inherently satisfied. Figure 6 illustrates the decoding process with a concrete numerical example.

4.3.3. Discussion on the Constraint-Handling Strategy

The encoding–decoding mechanism employed in this study is a constructive approach to constraint handling. Its primary advantage lies in the deterministic decoding process, which guarantees that any candidate solution vector

X

generated by the optimization algorithm in the continuous search space is uniquely mapped to a feasible assembly scheme that is 100% compliant with all physical constraints (e.g., grouping, symmetry, and adjacency). This “feasible-by-design” paradigm precludes the generation of any physically infeasible solutions during the optimization process.

To better contextualize our choice of strategy, it is useful to compare it with two other widely used constraint-handling techniques:

Penalty Functions: This method allows the algorithm to generate infeasible solutions but discourages them by adding a large penalty term to the fitness function. While simple to implement, its performance is highly sensitive to the penalty coefficient, which, if too small, fails to effectively exclude infeasible solutions, and if too large, can distort the fitness landscape and mislead the search.

Repair Operators: This approach programmatically modifies an infeasible solution to make it feasible before evaluation. While it avoids parameter tuning, designing an efficient repair operator that does not disrupt the beneficial schemata within a solution is challenging and may introduce additional computational overhead.

Given the highly coupled and strict nature of the assembly constraints in this task, we selected the constructive encoding–decoding approach. This method ensures that the surrogate model is always evaluated on physically meaningful, feasible inputs, which preserves the smoothness of the fitness landscape and eliminates the complexity of tuning penalty or repair parameters. Although the initial design of the decoder is more complex, it establishes a solid foundation for efficient and reliable optimization.

5. Experiments and Result Analysis

5.1. Experimental Setup

5.1.1. Dataset Overview and Partitioning

The machine learning experiments in this study are based on the dataset of 2000 samples generated via FEM simulation, as described in Section 3.1. To reliably evaluate model performance, we strictly partitioned the dataset into a training set, a validation set, and a test set, with a ratio of 60%:20%:20%. Model training was conducted on the training set, hyperparameter tuning was based on performance on the validation set, and the final model evaluation was performed on the independent test set, which the model had never seen before.

5.1.2. Experimental Environment

To ensure computational efficiency and software compatibility, the simulations and model training tasks were executed on the same physical server but under different operating systems. Specifically, the finite element simulations were performed under Windows 10 Pro using ANSYS 2023R1 software, while the training and validation of deep learning models were conducted under Ubuntu 22.04 LTS, optimized for GPU-based computing and open-source frameworks. The server is equipped with two physical Intel Xeon Silver 4310 CPUs, an NVIDIA RTX A6000 GPU (48 GB VRAM), and 128 GB RAM, supporting a dual-boot configuration.

The benchmark performance tests on the CEC2005 functions were carried out on a local PC (CPU: Intel i5-12400F, RAM: 16 GB, OS: Windows 10) using MATLAB R2021a to run the official benchmark suite.

All other algorithm components, including model inference and assembly optimization, were implemented in Python 3.11, using open-source libraries such as PyTorch-Lightning 2.5.0, NumPy 2.3.2, and SciPy 1.16.0.

5.1.3. Performance Metrics

To comprehensively evaluate the performance of both the predictive models and the optimization algorithms, we adopted a set of metrics tailored to each task category.

Predictive Model Performance Metrics: The accuracy of the AB-BiLSTM prediction model was assessed using three standard regression metrics: Root Mean Squared Error (RMSE), MSE, MAE, and the Coefficient of Determination (

R^{2}

). Given a ground truth vector

y

, a predicted vector

\hat{y}

, and the mean of the true values

\bar{y}

, their definitions are as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}},

(23)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2},

(24)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|,

(25)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}},

(26)

These metrics collectively reflect the model’s prediction accuracy, stability, and fitting quality.

Optimization Algorithm Benchmark Metrics: For benchmarking the optimization algorithms on the CEC2005 test suite, we employed classical numerical indicators, including the best, mean, worst, and standard deviation of objective values across 30 independent runs. To further assess statistical significance and overall performance ranking, we applied the Friedman test, a non-parametric method that ranks algorithms across all benchmark functions and computes an average rank score. This metric provides a fair and comparative view of algorithm robustness and consistency across different problem types.

5.2. Performance Evaluation of Predictive Models

5.2.1. Model Selection

To ensure that the proposed AB-BiLSTM model achieves its optimal performance and to establish a fair platform for subsequent baseline comparisons, we first conducted a systematic search and optimization for the model’s key hyperparameters. We employed the Optuna framework, a state-of-the-art automated hyperparameter search tool that uses the Tree-structured Parzen Estimator (TPE) algorithm to execute Bayesian optimization. The optimization objective was set to minimize the combined loss on the validation set through 150 trials. The search space for the key hyperparameters is detailed in Table 5.

After the optimization process, the resulting optimal hyperparameter combination is presented in Table 6. To maintain fairness in the following comparative experiments, all sequence-based models (e.g., RNN, LSTM, GRU, and BiLSTM) uniformly adopted the same key architectural parameters, such as the number of layers and hidden dimensions.

To comprehensively evaluate the performance of the proposed model, we selected several classic machine learning and deep learning models as benchmarks. The architectural parameters for all comparative models are configured as shown in Table 6 to ensure the rigor of the comparison. For non-sequential baseline models (MLP and 1D-CNN), standard architectures were adopted with commonly used activation functions (e.g., ReLU). The layer dimensions of these models were configured to match the hidden layer dimension of the sequence models, enabling a fair structural comparison. The detailed architectures for all comparative models are shown in Table 7.

5.2.2. Experiment Results

In this section, we present a comprehensive performance comparison of the proposed AB-BiLSTM model against several benchmarks and provide an in-depth analysis of its training process and prediction efficiency.

First, we examined the training convergence of the AB-BiLSTM model. As shown in Figure 7, both the training loss and validation loss decrease steadily with the number of epochs and eventually converge to a low level. The absence of a significant gap between the two curves indicates that the model learned effectively without suffering from severe overfitting, thus validating its good generalization ability.

Table 8 summarizes the comprehensive performance metrics of all models on the test set. It is evident from this table that the proposed AB-BiLSTM model significantly outperforms all other baseline models across all key evaluation metrics. Specifically, the AB-BiLSTM achieved an MAE of 3.5949E−05 and an RMSE of 5.3333E−05, both of which are the lowest among all models, indicating the smallest average deviation between its predictions and the true values. Its R² score reached an impressive 0.8173, surpassing the sub-optimal LSTM model (R² = 0.8077) and BiLSTM model (R² = 0.8017). All these performances strongly demonstrate that the introduction of the temporal attention mechanism effectively enhances the model’s ability to capture critical information within the assembly process, leading to more accurate error prediction.

We further investigated the model’s prediction behavior in the region of extremely small target values. Figure 8 illustrates the trend of the R² score as the true error value changes. It can be observed that as the true error value approaches zero, the R² metric becomes extremely sensitive and exhibits a non-linear decline, even yielding negative values. This is not indicative of model failure but is an inherent characteristic of the R² metric itself, where any minute prediction deviation from a near-zero baseline can result in an R² score far less than 1. This analysis confirms the rationality and foresight of our choice to use MAE as a core evaluation metric and to design a combined loss function.

To provide a more granular view of the model’s prediction performance, we have included additional error analyses in the appendix. Appendix A.1 presents a scatter plot of the predicted versus actual orthogonal axis errors on the test set, which visually demonstrates the high correlation between the model’s predictions and the ground truth values. Furthermore, Appendix A.2 contains a histogram of prediction errors, illustrating that the errors are tightly centered around zero with a quasi-normal distribution, confirming the absence of systematic bias in our model.

When evaluating computational efficiency, we also compared the training time and single-prediction latency of each model in Table 3. All deep learning-based surrogate models can complete a prediction at the millisecond level, whereas a traditional FEM simulation requires approximately 14 min for a single run. Taking our proposed AB-BiLSTM model as an example, its single-prediction latency is merely 1.1753 ms. By calculation, this yields a speed-up ratio of approximately 714,711, representing an efficiency improvement of several orders of magnitude over FEM. The comparison results are shown in Table 9.

In summary, although the AB-BiLSTM exhibits a slightly longer training time (189.8061 s) and prediction latency (1.175323 ms) due to its more complex architecture, it achieves a decisive advantage in prediction accuracy. This characteristic—trading a marginal cost in efficiency for a substantial gain in precision—highlights the immense potential and practical value of the proposed framework for engineering applications that require rapid design optimization and iteration.

5.2.3. Analysis of the Attention Mechanism

To dissect and validate the pivotal role of the attention mechanism within the AB-BiLSTM model, a comprehensive evaluation involving an ablation study and visualization analysis was performed.

Ablation Study: First, the contribution of the attention mechanism was quantified by comparing the performance of the AB-BiLSTM model against a standard BiLSTM model from which the attention module was removed, as shown in Table 10. The results indicate that the AB-BiLSTM achieved a 3.92% relative improvement in MAE, while reducing MSE, RMSE, and R² score by 8.54%, 4.18%, and 1.95%, respectively. This significant performance gain confirms that the introduction of the attention mechanism is a key factor in the model’s superior predictive accuracy.

Attention Weight Visualization and Pattern Analysis: Second, to investigate how the model leverages the attention mechanism to learn physically meaningful patterns, we visualized the attention weight distributions for multiple test samples, as shown in Figure 9 (attention_visualization.jpg).

Global Pattern: A Focus on the “Bookends” of the Process. We first visualized the attention weight heatmaps for ten representative test samples, as shown in Figure 9 (attention_visualization.jpg). The results reveal a consistent global pattern: firstly, the attention weights are significantly concentrated in the final stage of the assembly process (steps 19–24) across most samples (7 of 10), with the final step (step 24) always receiving one of the highest weights across all samples. Secondly, the first step in many samples also receives relatively high attention in early steps, forming a “bookend” pattern of focus. This pattern is highly interpretable from a physical standpoint: the first step establishes the initial baseline for the entire process, while the final few steps “lock in” the ultimate stress state and micro-deformations, making both ends critically influential on the final error.
Local Pattern: The Non-linear Nature of Attention Allocation. To further investigate the details, we performed a case study on a representative sample (corresponding to Sample 15 in Figure 9), overlaying the preload curve, preload changes (ΔPreload), and attention weights, as shown in Figure 10.

It is visually apparent that the fluctuations in attention weights (bar chart) do not directly correspond to the magnitude of the ΔPreload values (numbers above the curve). To quantitatively verify this observation, we conducted a Pearson correlation analysis between ΔPreload and the change in attention weights (ΔAttention) for this sample, as depicted in Figure 11. The result confirms a near-zero linear correlation (r = −0.0251, p = 0.907), providing strong evidence that the model’s attention allocation is not a simple reaction to drastic changes in force but rather a more complex, non-linear decision process contingent on global context and step position.

3.: Structural Knowledge Learning: Capturing the “Rhythm” of Assembly. Given that attention is not merely a response to force values, we hypothesized that it might have learned the inherent structural knowledge of the assembly process, particularly the grouping of bolts. To test this, we statistically analyzed the absolute change in attention (|ΔAttention|) across all test samples, comparing the changes that occur within a bolt group to those at the transition between groups. The box plot in Figure 12 reveals a profound finding: the variance of |ΔAttention| is significantly lower when switching between groups (right box in Figure 12) compared to the changes within a group (left box in Figure 12).
This indicates that the model has learned a generalizable and standardized “group-tightening” template or meta-strategy. The smooth transition in attention at group switches suggests the model perceives the process for different component groups as highly consistent. The greater variance within a group implies that the model’s more fine-grained decision-making is focused on assessing the relative importance of the steps inside a single component’s assembly. This discovery confirms that the attention mechanism has learned not only “which steps are important” but, on a deeper level, the very “rhythm” and “modularity” of the assembly task.

5.2.4. Anti-Noise Robustness Test

To evaluate the robustness of the proposed method under noisy input conditions, which simulates the sensor measurement errors common in industrial assembly lines, we conducted a systematic anti-noise test. Gaussian noise was injected into the normalized preload features at four intensity levels: 1%, 5%, 10%, and 15%, relative to the feature’s standard deviation.

As illustrated in Figure 13 and detailed in Table 11, the models exhibited two distinct response patterns to the introduced noise. The first group, comprising all sequence-aware models (AB-BiLSTM, BiLSTM, LSTM, GRU, and RNN), demonstrated superior resilience. These models started with a low Mean Absolute Error (MAE) baseline (approximately 4.0e−5 in 1% noise) and showed only a graceful, gradual degradation in performance as noise levels increased. For instance, our proposed AB-BiLSTM model, which had the lowest MAE across all noise levels, saw its MAE only rise from 3.91 × 10⁻⁵ to 4.45 × 10⁻⁵ when the noise level increased from 1% to 15%. This confirms its stable predictive capability in potentially noisy environments. Conversely, the second group of non-sequential models (MLP and 1D-CNN) had a much higher initial MAE (approximately 1.0e−4) and showed relative insensitivity to increasing noise. This suggests their failure to learn the underlying physical patterns from the data in the first place, making their performance consistently poor regardless of the noise level.

These results clearly reveal that models capable of effectively capturing the temporal dependencies of the assembly process universally exhibit stronger robustness. The ability to understand the full context of the sequence is key to resisting local data perturbations. Among all high-performing models, our proposed AB-BiLSTM not only achieves the highest accuracy in the absence of noise but also maintains its performance advantage across all tested noise levels, demonstrating its suitability for real-world industrial applications where measurement uncertainties are unavoidable.

5.3. Performance Verification of Optimization Algorithms

To comprehensively assess the global searchability and robustness of the proposed SEG-HHO algorithm, we conducted numerical experiments on the well-established CEC2005 benchmark suite, as shown in Table 12. This suite comprises 23 classical functions categorized into unimodal (F1–F7), basic multimodal (F8–F13), and expanded/composite multimodal functions (F14–F23), thus providing a standardized platform to assess these core attributes across a wide spectrum of optimization complexities. While CEC2005 focuses on unconstrained scenarios, it serves as a preliminary validation before applying SEG-HHO to constrained, mixed-variable engineering problems.

SEG-HHO was benchmarked against nine competitive metaheuristic algorithms: Ant Lion Optimizer (ALO) [39], Bayesian Optimization (BO) [40], Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [41] Grey Wolf Optimizer (GWO) [42], Moth-Flame Optimization (MFO) [43], Particle Swarm Optimization (PSO) [44], Sine Cosine Algorithm (SCA) [45], Whale Optimization Algorithm (WOA) [46], Waterwheel Plant Algorithm (WWPA) [47], and the original HHO [31]. The population size and the number of iterations of all algorithms were set to 30 and 500 on each test function under identical experimental settings. For each function, the best (Min), mean (Mean), worst (Max), and standard deviation (Std) values were recorded, along with the Friedman rank statistics. The detailed numerical results are summarized in Table 13.

In terms of average performance (Mean), SEG-HHO achieved the best results on 17 out of 23 functions, outperforming all baseline algorithms. Furthermore, SEG-HHO consistently exhibited lower standard deviations across the majority of functions, indicating superior stability and repeatability in convergence behavior.

The statistical ranking based on the Friedman test further confirms SEG-HHO’s advantage, with the lowest average rank (AvgRank = 2.0435) among all methods, indicating its consistently superior performance across diverse problem types.

Figure 14 displays the convergence curves of all algorithms on selected functions. As can be clearly observed in this figure, the proposed SEG-HHO algorithm demonstrates superior convergence performance on the majority of the test functions.

Specifically, on unimodal functions (e.g., F1 and F2), the convergence curve of SEG-HHO descends more rapidly and converges to a lower final fitness value, indicating that its exploitation capability has been effectively enhanced. On more complex multimodal functions with numerous local optima (e.g., F9 and F11), SEG-HHO not only converges faster but is also capable of escaping local optima to find higher-quality solutions. This is attributed to its stratified evolution and hybrid search strategies, which effectively balance the algorithm’s global exploration and local exploitation capabilities. On the hybrid composite function (e.g., F21 and F23), several baseline methods, including the original HHO, suffer from early stagnation. SEG-HHO, however, sustains an active search throughout the entire iteration process and converges smoothly to a high-quality solution, demonstrating robustness in navigating complex multimodal landscapes.

Figure 15 shows the distribution box plots of the results of each algorithm under 23 benchmark functions. It can be observed that SEG-HHO has the lowest box height among the 23 benchmark algorithms, and compared to other algorithms, SEG-HHO has fewer outliers in the result distribution, further demonstrating its advantages in convergence accuracy and stability.

In addition to optimization performance, we evaluated the computational efficiency of SEG-HHO and the baselines. Detailed runtime comparisons are provided in Appendix B (Table A1). On average, SEG-HHO requires approximately 0.15 s per run across the CEC2005 suite, which is modestly higher than simpler algorithms like SCA (0.04 s) due to its enhanced mechanisms. However, this trade-off is advantageous, as SEG-HHO delivers superior solution quality and convergence reliability, making it suitable for engineering scenarios where evaluation costs (e.g., via surrogates) are low compared to direct simulations (detailed in Section 5.4).

In summary, SEG-HHO achieves consistent and competitive convergence behavior across diverse function types, combining convergence speed, solution accuracy, and global optimization stability.

5.4. Analysis of the Optimal Assembly Scheme

To further evaluate the practical applicability of the proposed SEG-HHO algorithm in constrained engineering optimization, building on its foundational performance demonstrated in the CEC2005 benchmarks, we applied it to the gyroscope bolt assembly sequence and preload optimization problem described in prior work. This experiment highlights SEG-HHO’s capability in handling real-world constraints via our encoding–decoding mechanism, with a focus on comparing its convergence performance and final solution quality against the original HHO algorithm.

The overall optimization objective is to minimize the orthogonal axis error induced by the bolt assembly process, which is achieved by jointly optimizing the tightening sequence and the preload forces of all bolts. This contributes directly to improving the spatial attitude accuracy of the gyroscope system.

Both SEG-HHO and HHO were configured with a population size of 30 over 500 iterations. Figure 16 presents the convergence comparison. SEG-HHO not only converges faster in early stages but also continues improving during later iterations. The final fitness value reached by SEG-HHO is 2.2398 × 10⁻⁵, outperforming HHO’s result of 2.7149 × 10⁻⁵, corresponding to a relative error reduction of approximately 17.5%, computed as

\frac{2.7149 \times 10^{- 5} - 2.2398 \times 10^{- 5}}{2.7149 \times 10^{- 5}} \times 100 % \approx 17.5 %

.

The decoded optimal tightening plan is summarized in Table 14, detailing each tightening step, bolt ID, and assigned preload. The solution demonstrates a non-linear sequence with a balanced preload distribution ranging from 3922 N to 7500 N. Notably, the algorithm tends to assign higher preload forces in the later tightening steps while maintaining moderate loads in the initial steps. This strategy may help compensate for cumulative deviations introduced earlier, thus aiding in reducing the final orthogonal axis error.

It is worth highlighting that using high-fidelity FEM simulations as the fitness function would require approximately 15,000 evaluations (30 individuals over 500 iterations), taking over 3500 h in total and making the process computationally infeasible. In contrast, this study utilizes an AB-BiLSTM-based surrogate model to replace FEM, enabling millisecond-level evaluations and significantly reducing optimization overhead. To further quantify efficiency, runtime comparisons for SEG-HHO and HHO on this problem are detailed in Appendix B (Table A2). SEG-HHO completes in 91 s total (0.18 s per iteration), a slight increase over HHO’s 105 s due to its hybrid strategies, versus infeasible FEM-based optimization (>3500 h).

In summary, SEG-HHO, empowered by a hybrid framework that combines surrogate-based modeling with intelligent optimization, effectively solves the preload and sequencing optimization problem without relying on costly simulations, offering a practical and efficient approach for high-precision assembly systems.

6. Discussion

This study introduces and validates a data-driven optimization framework for precision bolted assembly. Our findings confirm two central hypotheses. First, by conceptualizing the bolt tightening process as a time series, our AB-BiLSTM surrogate model accurately predicted the final orthogonal axis error, significantly outperforming non-sequential models. The attention mechanism proved crucial, not as a black box, but as an interpretable tool that learned physically meaningful patterns, such as focusing on the initial and final stages of the process. Second, the proposed SEG-HHO algorithm, with its stratified population structure, demonstrated superior global search capabilities, mitigating the premature convergence often seen in standard metaheuristics when applied to complex, constrained engineering problems.

While the proposed framework shows significant promise, we acknowledge several limitations. These highlight critical gaps between simulation and real-world application, providing a clear direction for future research to address.

Simulation-to-Reality Gap: The current study is based entirely on high-fidelity FEM simulation data. This was a deliberate and necessary first step due to the extreme manufacturing cost and low production volume of the specialized gyroscope under study, which makes extensive physical experimentation infeasible at this early stage. This simulation-first approach allowed us to establish a robust proof-of-concept and pre-train a foundational model. The immediate and most critical next step is to bridge the simulation-to-reality gap by incorporating scarce real-world sensor data (e.g., torque readings from instrumented torque wrenches) to fine-tune and calibrate the surrogate model, enhancing its real-world applicability. Future work will also explore the impact of manufacturing tolerances (e.g., rotor coaxiality and hole clearances) and material property variations, which were simplified in the current model.
Generalizability and Scalability:
(1)
The model was validated on a specific 24-bolt gyroscope assembly. To prove broader generalizability, future work should test the framework on diverse bolt configurations and assembly structures, both in simulation and on physical systems (e.g., satellite payload assembly and semiconductor equipment). Beyond the test case, the general design of the framework enhances transferability: the AB-BiLSTM surrogate processes sequential data agnostic to specific patterns, while the SEG-HHO algorithm incorporates an encoding–decoding mechanism that decouples problem-specific constraints (e.g., symmetry and grouping) from the search procedure, enabling adaptation to new constraint types in other domains (e.g., multi-stage tightening, contact uniformity requirements, or tool accessibility constraints in aerospace assembly).
(2)
Regarding scalability, the inference time of the AB-BiLSTM model scales linearly with the number of bolts (O(n) complexity) [48], as RNN/LSTM architectures process sequences step-by-step. This suggests efficiency for larger systems, but current experiments are limited to 24 bolts. Scaling to more bolts (e.g., 48) or complex constraints (e.g., dynamic preload interactions) requires further validation of training time growth (potentially quadratic with dataset size expansion [48]) and overfitting risks. While the AB-BiLSTM surrogate scales efficiently, the underlying optimization problem for SEG-HHO faces exponential growth in search space with the number of bolts. To manage this complexity for larger systems, hierarchical optimization or advanced optimizers (e.g., evolutionary algorithms with adaptive operators) will be necessary.
(3)
Finally, cross-domain validation remains an open task. A potential roadmap includes the following: (1) within-domain tests on unseen bolt configurations (e.g., varying preload ranges or expanding to 48 bolts); (2) testing domain-specific datasets (e.g., aerospace or semiconductor assemblies); (3) recalibrating physics-based constraints to match new conditions; and (4) collaborating with industrial partners to verify transferability on real production data.
From Single-Objective to Multi-Objective Optimization: Our current framework minimizes orthogonal axis error as a single objective. In real-world manufacturing, engineers must balance competing objectives such as minimizing error, ensuring uniform stress distribution, reducing assembly time, and minimizing tool wear. A critical future direction is extending the framework to multi-objective optimization. This would require modifying the AB-BiLSTM surrogate to output multiple objectives (e.g., error and stress) and integrating it with optimizer algorithms like NSGA-II or an improved HHO variant. The resulting Pareto front of optimal solutions would enable engineers to make informed, context-aware decisions.
Path to Industrial Deployment: For practical deployment, this framework could be integrated into Computer-Aided Process Planning (CAPP) systems to generate optimal assembly plans offline. A more advanced implementation would involve creating a near real-time monitoring system on the assembly line. By integrating with IoT sensors on smart tools, the system could continuously feed live data to the AB-BiLSTM model for dynamic calibration, generating context-aware suggestions for process adjustments. This would bridge the gap between simulation predictions and physical execution, enabling iterative refinement of assembly quality.

7. Conclusions

This study successfully demonstrates that a tightly integrated framework of sequence-aware deep learning and advanced metaheuristic optimization can effectively solve complex, constrained assembly problems. We move beyond simple “compare-and-verify” approaches to offer a fully automated, data-driven optimization loop.

The key lessons learned are threefold. First, modeling the assembly process as a sequence is not just beneficial but critical for capturing the underlying physics that non-sequential models miss. Second, attention mechanisms can serve as a powerful bridge between predictive accuracy and engineering insight, revealing some certain process steps are more critical than others. Third, structured, multi-strategy optimization algorithms like SEG-HHO are essential for navigating the complex search spaces typical of real-world engineering problems, providing more robust and higher-quality solutions than their monolithic counterparts.

This work establishes a foundational component for digital twin implementation in precision manufacturing. By replacing costly physical trials or slow simulations with a fast and accurate surrogate, this approach dramatically accelerates the design-optimization cycle. The immediate path forward involves calibrating this framework with limited physical sensor data to bridge the simulation-to-reality gap. Subsequently, we aim to expand this single-objective system into a multi-objective decision-support tool. To enhance predictive power while addressing practical constraints, future work will focus on the following two key directions: developing lightweight Transformers adapted for small datasets through techniques such as transfer learning from pre-trained models and knowledge distillation and implementing model compression strategies (e.g., quantization, pruning, and parameter sharing) to enable deployment on resource-constrained systems, such as edge devices in Industrial Internet of Things (IIoT) environments.

Author Contributions

Conceptualization, D.L.; methodology, D.L. and Y.J.; validation, D.L.; formal analysis, D.L.; investigation, D.L.; resources, D.L. and H.Y.; data curation, D.L.; writing—original draft preparation, D.L.; writing—review and editing, D.L. and H.Y.; visualization, D.L.; supervision, H.Y.; project administration, D.L. and H.Y.; funding acquisition, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Defense Basic Scientific Research Program of China, grant number JCKY2023203A004.

Data Availability Statement

The dataset used in this study was generated via high-fidelity finite element simulations. Due to confidentiality agreements with the collaborating research institute and the sensitive engineering context of the underlying CAD and simulation models, the data cannot be deposited in a public repository. However, the data are available from the corresponding author (yhg@njupt.edu.cn) upon reasonable request for academic verification purposes. The source code for the core algorithms implemented in this study is also available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of this study; in the collection, analyses, or interpretation of data; in the writing of this manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

FEM	Finite Element Method
FEA	Finite Element Analysis
LHS	Latin Hypercube Sampling
SVR	Support Vector Regression
RSM	Response Surface Methodology
RNN	Recurrent Neural Network
LSTM	Long Short-Term Memory
GRU	Gated Recurrent Unit
MLP	Multi-Layer Perceptron
CNN	Convolutional Neural Network
AB-BiLSTM	Attention-based Bidirectional Long Short-Term Memory
GA	Genetic Algorithm
PSO	Particle Swarm Optimization
ACO	Ant Colony Optimization
ALO	Ant Lion Optimizer
HHO	Harris Hawks Optimization
LF	Levy Flight
DE	Differential Evolution
SEG-HHO	Stratified Evolutionary Group-based Harris Hawks Optimization
SCA	Sine Cosine Algorithm
BO	Bayesian Optimization
GWO	Grey Wolf Optimizer
MFO	Moth-Flame Optimization
WOA	Whale Optimization Algorithm
WWPA	Waterwheel Plant Algorithm
MSE	Mean Squared Error
MAE	Mean Absolute Error
RMSE	Root Mean Squared Error
R²	Coefficient of Determination
TPE	Tree-structured Parzen Estimator
CAPP	Computer-Aided Process Planning
IIoT	Industrial Internet of Things

Appendix A

Appendix A.1

Figure A1 visually confirms the strong linear relationship and high correlation between our model’s predictions and the ground truth, with data points clustering tightly around the y = x line.

Figure A1. Scatter plot of AB-BiLSTM.

Appendix A.2

Figure A2 shows that the prediction errors are centered around zero and approximate a normal distribution, indicating that the model is well-calibrated and does not suffer from systematic bias.

Figure A2. Prediction error distribution of AB-BiLSTM.

Appendix B

To provide a comprehensive view of computational efficiency, this appendix presents runtime data for the compared algorithms. All measurements are averaged over 30 independent runs in the environment described in Section 5.1.2. For brevity, CEC2005 results (Table A1) aggregate averages across all 23 functions. Standard deviations are included to reflect variability.

Table A1. Average runtime (in seconds) per run across all CEC2005 functions.

Algorithm	Mean Runtime(s)	Std (s)
ALO	1.1273	0.0247
BO	0.1032	0.0036
CMA-ES	0.2100	0.0072
GWO	0.0370	0.0020
MFO	0.0376	0.0014
PSO	0.0599	0.0024
SCA	0.0360	0.0021
WOA	0.0312	0.0014
WWPA	0.0958	0.0060
HHO	0.0830	0.0028
SEG-HHO	0.1534	0.0060

To further quantify efficiency in the gyroscope assembly optimization problem, runtime comparisons for SEG-HHO and HHO on this problem are detailed in Table A2.

Table A2. Runtime comparison for the gyroscope assembly optimization problem (500 iterations, population size 30).

Project	Total Execution Time	Average Iteration Time (s)
FEM simulation	About 3500 h	About 70 h
Surrogate-assisted Standard HHO	104.9993 s	0.2099 s
Surrogate-assisted SEG-HHO	91.1565 s	0.1823 s

Note: SEG-HHO’s slightly lower total time here (despite higher per-iteration complexity) results from faster convergence in practice, reducing effective iterations needed for stabilization. When paired with the AB-BiLSTM surrogate, the full framework runs in under 2 min, versus infeasible FEM-based optimization (>3500 h).

References

Are Manufacturing Errors Hurting Your Business? Reliability Solutions. Available online: https://reliabilitysolutions.net/articles/the-cost-of-assembly-and-installation-errors/ (accessed on 16 August 2025).
Shafi, I.; Mazhar, M.F.; Fatima, A.; Alvarez, R.M.; Miró, Y.; Espinosa, J.C.M.; Ashraf, I. Deep Learning-Based Real Time Defect Detection for Optimization of Aircraft Manufacturing and Control Performance. Drones 2023, 7, 31. [Google Scholar] [CrossRef]
Li, Z.; Li, X.; Han, Y.; Zhang, P.; Zhang, Z.; Zhang, M.; Zhao, G. A Review of Aeroengines’ Bolt Preload Formation Mechanism and Control Technology. Aerospace 2023, 10, 307. [Google Scholar] [CrossRef]
Zhang, J.; Wang, Y.; Wang, J.; Cao, R.; Xu, Z. Online O-Ring Stress Prediction and Bolt Tightening Sequence Optimization Method for Solid Rocket Motor Assembly. Machines 2023, 11, 387. [Google Scholar] [CrossRef]
Zhang, W.; Liu, Z.; Song, Y.; Lu, Y.; Feng, Z. Prediction of Multi-Physics Field Distribution on Gas Turbine Endwall Using an Optimized Surrogate Model with Various Deep Learning Frames. Int. J. Numer. Methods Heat Fluid Flow 2023, 34, 2865–2889. [Google Scholar] [CrossRef]
Kirkpatrick, S.; Gelatt, C.D.; Vecchi, M.P. Optimization by Simulated Annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef]
Marino, R. Learning from Survey Propagation: A Neural Network for MAX-E-3-SAT. Mach. Learn. Sci. Technol. 2021, 2, 035032. [Google Scholar] [CrossRef]
Tatar, E.; Alper, S.E.; Akin, T. Quadrature-Error Compensation and Corresponding Effects on the Performance of Fully Decoupled MEMS Gyroscopes. J. Microelectromech. Syst. 2012, 21, 656–667. [Google Scholar] [CrossRef]
Grzejda, R. Analysis of the Tightening Process of an Asymmetrical Multi-Bolted Connection. Mach. Dyn. Res. 2016, 39, 25–32. [Google Scholar]
Wang, Y.; Liu, Y.; Wang, J.; Zhang, J.; Zhu, X.; Xu, Z. Research on Process Planning Method of Aerospace Engine Bolt Tightening Based on Digital Twin. Machines 2022, 10, 1048. [Google Scholar] [CrossRef]
Shi, M.; Lv, L.; Sun, W.; Song, X. A Multi-Fidelity Surrogate Model Based on Support Vector Regression. Struct. Multidisc. Optim. 2020, 61, 2363–2375. [Google Scholar] [CrossRef]
Gogu, C.; Passieux, J.-C. Efficient Surrogate Construction by Combining Response Surface Methodology and Reduced Order Modeling. Struct. Multidisc. Optim. 2013, 47, 821–837. [Google Scholar] [CrossRef]
Zhou, Y.; Lu, Z. An Enhanced Kriging Surrogate Modeling Technique for High-Dimensional Problems. Mech. Syst. Signal Process. 2020, 140, 106687. [Google Scholar] [CrossRef]
Durantin, C.; Rouxel, J.; Désidéri, J.-A.; Glière, A. Multifidelity Surrogate Modeling Based on Radial Basis Functions. Struct. Multidisc. Optim. 2017, 56, 1061–1075. [Google Scholar] [CrossRef]
Li, Z.; Li, X.-C.; Wu, Z.-M.; Zhu, Y.; Mao, J.-F. Surrogate Modeling of High-Speed Links Based on GNN and RNN for Signal Integrity Applications. IEEE Trans. Microw. Theory Techn. 2023, 71, 3784–3796. [Google Scholar] [CrossRef]
Xu, L.; Xu, Y.; Wang, K.; Ye, L.; Zhang, W. Two-Stream Bolt Preload Prediction Network Using Hydraulic Pressure and Nut Angle Signals. Eng. Appl. Artif. Intell. 2024, 136, 109029. [Google Scholar] [CrossRef]
Li, X.; Peng, C.; Zhao, Y.; Xia, X. A Hybrid DSCNN-GRU Based Surrogate Model for Transient Groundwater Flow Prediction. Appl. Sci. 2025, 15, 4576. [Google Scholar] [CrossRef]
Song, B.; Liu, Y.; Fang, J.; Liu, W.; Zhong, M.; Liu, X. An Optimized CNN-BiLSTM Network for Bearing Fault Diagnosis under Multiple Working Conditions with Limited Training Samples. Neurocomputing 2024, 574, 127284. [Google Scholar] [CrossRef]
Xie, Y.; Wu, D.; Qiang, Z. A Unifying View for the Mixture Model of Sparse Gaussian Processes. Inf. Sci. 2024, 660, 120124. [Google Scholar] [CrossRef]
Schweidtmann, A.M.; Bongartz, D.; Grothe, D.; Kerkenhoff, T.; Lin, X.; Najman, J.; Mitsos, A. Deterministic Global Optimization with Gaussian Processes Embedded. Math. Prog. Comp. 2021, 13, 553–581. [Google Scholar] [CrossRef]
Zhongbo, Y.; Hien, P.L. Pre-Trained Transformer Model as a Surrogate in Multiscale Computational Homogenization Framework for Elastoplastic Composite Materials Subjected to Generic Loading Paths. Comput. Methods Appl. Mech. Eng. 2024, 421, 116745. [Google Scholar] [CrossRef]
Wan, X.; Liu, K.; Qiu, W.; Kang, Z. An Assembly Sequence Planning Method Based on Multiple Optimal Solutions Genetic Algorithm. Mathematics 2024, 12, 574. [Google Scholar] [CrossRef]
Zheng, Y.; Chen, L.; Wu, D.; Jiang, P.; Bao, J. Assembly Sequence Planning Method for Optimum Assembly Accuracy of Complex Products Based on Modified Teaching–Learning Based Optimization Algorithm. Int. J. Adv. Manuf. Technol. 2023, 126, 1681–1699. [Google Scholar] [CrossRef]
Malek, N.G.; Eslamlou, A.D.; Peng, Q.; Huang, S. Effective GA Operator for Product Assembly Sequence Planning. Comput.-Aided Des. Appl. 2024, 21, 713–728. [Google Scholar] [CrossRef]
Zhang, W. Assembly Sequence Intelligent Planning Based on Improved Particle Swarm Optimization Algorithm. Manuf. Technol. 2023, 23, 557–563. [Google Scholar] [CrossRef]
Ab Rashid, M.F.F.; Tiwari, A.; Hutabarat, W. Integrated Optimization of Mixed-Model Assembly Sequence Planning and Line Balancing Using Multi-Objective Discrete Particle Swarm Optimization. Artif. Intell. Eng. Des. Anal. Manuf. 2019, 33, 332–345. [Google Scholar] [CrossRef]
Yang, J.; Liu, F.; Dong, Y.; Cao, Y.; Cao, Y. Multiple-Objective Optimization of a Reconfigurable Assembly System via Equipment Selection and Sequence Planning. Comput. Ind. Eng. 2022, 172, 108519. [Google Scholar] [CrossRef]
Han, Z.; Wang, Y.; Tian, D. Ant Colony Optimization for Assembly Sequence Planning Based on Parameters Optimization. Front. Mech. Eng. 2021, 16, 393–409. [Google Scholar] [CrossRef]
Tseng, H.-E.; Chang, C.-C.; Lee, S.-C.; Huang, Y.-M. Hybrid Bidirectional Ant Colony Optimization (Hybrid BACO): An Algorithm for Disassembly Sequence Planning. Eng. Appl. Artif. Intell. 2019, 83, 45–56. [Google Scholar] [CrossRef]
Xing, Y.; Wu, D.; Qu, L. Parallel Disassembly Sequence Planning Using Improved Ant Colony Algorithm. Int. J. Adv. Manuf. Technol. 2021, 113, 2327–2342. [Google Scholar] [CrossRef]
Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris Hawks Optimization: Algorithm and Applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
Devan, P.A.M.; Ibrahim, R.; Omar, M.; Bingi, K.; Abdulrab, H. A Novel Hybrid Harris Hawk-Arithmetic Optimization Algorithm for Industrial Wireless Mesh Networks. Sensors 2023, 23, 6224. [Google Scholar] [CrossRef]
Yagmur, N.; Dag, İ.; Temurtas, H. Classification of Anemia Using Harris Hawks Optimization Method and Multivariate Adaptive Regression Spline. Neural Comput. Appl. 2024, 36, 5653–5672. [Google Scholar] [CrossRef]
Pavan, G.; Ramesh Babu, A. Enhanced Randomized Harris Hawk Optimization of PI Controller for Power Flow Control in the Microgrid with the PV-Wind-Battery System. Sci. Technol. Energy Transit. 2024, 79, 45. [Google Scholar] [CrossRef]
Ryalat, M.H.; Dorgham, O.; Tedmori, S.; Al-Rahamneh, Z.; Al-Najdawi, N.; Mirjalili, S. Harris Hawks Optimization for COVID-19 Diagnosis Based on Multi-Threshold Image Segmentation. Neural Comput. Appl. 2023, 35, 6855–6873. [Google Scholar] [CrossRef] [PubMed]
Akl, D.T.; Saafan, M.M.; Haikal, A.Y.; El-Gendy, E.M. IHHO: An Improved Harris Hawks Optimization Algorithm for Solving Engineering Problems. Neural Comput. Appl. 2024, 36, 12185–12298. [Google Scholar] [CrossRef]
Ni, J.; Tang, W.C.; Pan, M.; Qiu, X.; Xing, Y. Assembly Sequence Optimization for Minimizing the Riveting Path and Overall Dimensional Error. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2017, 232, 2605–2615. [Google Scholar] [CrossRef]
Xu, X.; Li, J.; Yang, Y.; Shen, F. Toward Effective Intrusion Detection Using Log-Cosh Conditional Variational Autoencoder. IEEE Internet Things J. 2021, 8, 6187–6196. [Google Scholar] [CrossRef]
Mirjalili, S. The Ant Lion Optimizer. Adv. Eng. Softw. 2015, 83, 80–98. [Google Scholar] [CrossRef]
Frazier, P.I. A Tutorial on Bayesian Optimization. arXiv 2018, arXiv:1807.02811. [Google Scholar] [CrossRef]
Hansen, N. The CMA Evolution Strategy: A Tutorial. arXiv 2016, arXiv:1604.00772. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Mirjalili, S. Moth-Flame Optimization Algorithm: A Novel Nature-Inspired Heuristic Paradigm. Knowl.-Based Syst. 2015, 89, 228–249. [Google Scholar] [CrossRef]
Wang, D.; Tan, D.; Liu, L. Particle Swarm Optimization Algorithm: An Overview. Soft Comput. 2018, 22, 387–408. [Google Scholar] [CrossRef]
Mirjalili, S. SCA: A Sine Cosine Algorithm for Solving Optimization Problems. Knowl.-Based Syst. 2016, 96, 120–133. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Abdelhamid, A.A.; Towfek, S.K.; Khodadadi, N.; Alhussan, A.A.; Khafaga, D.S.; Eid, M.M.; Ibrahim, A. Waterwheel Plant Algorithm: A Novel Metaheuristic Optimization Method. Processes 2023, 11, 1502. [Google Scholar] [CrossRef]
Kratzert, F.; Klotz, D.; Shalev, G.; Klambauer, G.; Hochreiter, S.; Nearing, G. Benchmarking a Catchment-Aware Long Short-Term MemoryNetwork (LSTM) for Large-Scale Hydrological Modeling. Hydrol. Earth Syst. Sci. Discuss. 2019, 2019, 1–32. [Google Scholar]

Figure 1. Framework overview.

Figure 2. Gyroscope model in ANSYS: (a) different components of the inertial gyroscope; (b) 24 key bolts are displayed in green (highlighted by ANSYS to indicate the bolt models).

Figure 3. The total architecture of AB-BiLSTM.

Figure 4. Flowchart of SEG-HHO. The blue line represents odd iterations, and the green line represents even iterations.

Figure 5. Conceptual diagram of the encoding–decoding mechanism.

Figure 6. Encoding and decoding mechanisms with a concrete numerical example.

Figure 7. The loss curve of the proposed AB-BiLSTM.

Figure 8. Non-linear drop of R².

Figure 9. Attention visualization heatmap.

Figure 10. Preload curve with attention weights for sample 15.

Figure 11. Correlation between ΔPreload and ΔAttention.

Figure 12. Boxplot of the variance of attention changes.

Figure 13. MAE comparison across models under different noise levels.

Figure 14. Convergence curves of SEG-HHO and comparative algorithms on representative functions from CEC2005. (a): F1; (b): F2; (c): F3; (d): F4; (e): F5; (f): F6; (g): F7; (h): F8; (i): F9; (j): F10; (k): F11; (l): F12; (m): F13; (n): F14; (o): F15; (p): F16; (q): F17; (r): F18; (s): F19; (t): F20 (u): F21; (v): F22; (w): F23.

Figure 15. Boxplot of solution distributions for all algorithms on 23 CEC2005 benchmark functions. Colors are used only for visual distinction among the algorithms. (a): F1; (b): F2; (c): F3; (d): F4; (e): F5; (f): F6; (g): F7; (h): F8; (i): F9; (j): F10; (k): F11; (l): F12; (m): F13; (n): F14; (o): F15; (p): F16; (q): F17; (r): F18; (s): F19; (t): F20 (u): F21; (v): F22; (w): F23.

Figure 16. Convergence curve of SEGHHO and standard HHO.

Table 1. R² scores of models with/without initial preload.

Models	Without Initial Preload	With Initial Preload
RNN	0.2925	0.1693
LSTM	0.5651	−0.0165
BiLSTM	0.5353	−0.0311

Table 2. Input–output format of the AB-BiLSTM model.

Component	Description	Data Type/Dimension	Normalization/Encoding
Bolt ID Sequence	Order of bolts tightened in process	Integer, length 24	Trainable embedding (dim = 32)
Preload Sequence	Final preload value for each bolt	Float, length 24	Z-score normalization
Output	Predicted orthogonal axis error	Float, scalar	log(1 + p) + Min–Max normalization

Table 3. Summary of SEG-HHO evolutionary phases.

Group (Fitness-Based)	Primary Goal (Intuitive Explanation)	Core Mechanism
Top Group	Deep Exploitation: Quickly refine the best-known solutions.	Standard HHO’s highly efficient “Besiege” strategies.
Middle Group	Balanced Search: Adaptively explore new regions while also exploiting promising ones.	Alternating Hybrid Strategy: HHO for exploitation and Differential Evolution (DE) for injecting diversity.
Bottom Group	Broad Exploration: Search widely to escape local optima and discover entirely new promising areas.	Sine Cosine Algorithm (SCA) guided the search for structured, wave-like exploration.

Table 4. Comparison of SEG-HHO with HHO, DE, and SCA.

Algorithm	Core Idea	Exploration Strategy	Exploitation Strategy	Key Contribution in SEG-HHO
HHO	Mimics cooperative hunting of hawks.	Hawks track prey based on random perches or average location.	Four “besiege” strategies based on prey energy.	Provides the foundational framework and powerful exploitation operators (used by Top and Middle Groups).
DE	Uses vector differences between individuals to create new candidates.	“DE/rand/1” mutation creates diverse trial vectors.	Crossover and selection retain better solutions.	Injects significant population diversity to prevent stagnation (used by Middle Group).
SCA	Uses sine/cosine functions to explore/exploit space around the best solution.	Large-amplitude oscillations for global search.	Small-amplitude oscillations for local search.	Provides a structured, directional exploration mechanism, superior to random walks (used by Bottom Group).
SEG-HHO	Stratified, multi-strategy co-evolution.	Systematic and layered: SCA for broad search (Bottom); DE for diversity (Middle).	Focused and intensified: Standard HHO for deep exploitation (Top); HHO for refinement (Middle).	Synthesizes the strengths of HHO, DE, and SCA into a cohesive, role-based framework that dynamically balances the search.

Table 5. Hyperparameter search space for Optuna.

Hyperparameter	Search Range/Type
Batch Size	Categorical [16, 32, 64, 128]
Learning Rate	Log-uniform [1e−5, 1e−3]
Optimizer	Categorical [Adam, AdamW]
Number of Layers	Integer [1, 5]
Embedding Dimension	Categorical [16, 32, 64, 128]
Hidden Dimension	Categorical [32, 64, 128, 256]
Dropout Rate	Categorical [0.0, 0.1, 0.2, 0.3]

Table 6. Optimal hyperparameter configuration for the AB-BiLSTM model.

	Value
Batch Size	16
Learning Rate	8.8051E−5
Optimizer	AdamW (weight_decay = 1e−4)
Num of Layers	5
Embed Dimension	32
Hidden Dimension	256
Dropout Rate	0.1

Table 7. Architectural parameter configurations for all comparative models.

Model	Architecture Configuration	Key Hyperparameters
MLP	Flatten () → Dense (256, ReLU) → Dropout (0.1) → Dense (128, ReLU) → Dropout (0.1) → Dense (1, Softplus)	Neurons: 256 (L1), 128 (L2) Dropout: 0.1
1D-CNN	Conv1D (256, kernel = 3, ReLU) → Dropout (0.1) → Conv1D (256, kernel = 3, ReLU) → AdaptiveAvgPool1D (1) → Flatten () → Dense (1, Softplus)	Filters: 256 Kernel: 3 Dropout: 0.1
RNN	Embedding (32) → RNN (256, dropout = 0.1) × 5 → Dense (1, Softplus)	Embed Dim: 32 Hidden Dim: 256 Layers: 5 Dropout: 0.1
LSTM	Embedding (32) → LSTM (256, dropout = 0.1) × 5 → Dense (1, Softplus)	Embed Dim: 32 Hidden Dim: 256 Layers: 5 Dropout: 0.1
BiLSTM	Embedding (32) → BiLSTM (256, dropout = 0.1) × 5 → Dense (1, Softplus)	Embed Dim: 32 Hidden Dim: 256 Layers: 5 Dropout: 0.1

Table 8. Test performance of the models.

Model	MSE	RMSE	MAE	R²	Latency (ms)	Train Time (s)
AB-BiLSTM	2.8444E−09	5.3333E−05	3.5949E−05	0.8173	1.175323	189.8061
BiLSTM	3.0872E−09	5.5562E−05	3.7415E−05	0.8017	1.256722	224.274
MLP	1.5446E−08	1.2428E−04	9.8472E−05	0.0079	1.130914	55.7333
1D-CNN	1.5641E−08	1.2506E−04	9.8688E−05	−0.0045	1.021677	68.3319
RNN	1.4731E−08	1.2137E−04	9.5717E−05	0.0538	1.007865	60.1832
LSTM	2.9932E−09	5.4710E−05	3.6671E−05	0.8077	1.357545	175.3624
GRU	2.9833E−09	5.4619E−05	3.6816E−05	0.8083	1.362813	155.0558

Note: The values in bold highlight the optimal performance achieved for each respective metric.

Table 9. Comparison results of FEM vs. AB-BiLSTM.

Project	Value
ANSYS Average Simulation Time	14 min/Sample
AB-BiLSTM Single Sample Prediction Time	1.3413 (ms)
Total Training Time	189.8061 (s)
Model Accuracy (MAE)	3.5949E−05
Speedup Ratio	714,711

Table 10. Result of ablation study.

	AB-BiLSTM	BiLSTM	Improvement
MSE	2.8444E−09	3.0872E−09	8.54%
RMSE	5.3333E−05	5.5562E−05	4.18%
MAE	3.5949E−05	3.7415E−05	3.92%
R²	0.8173	0.8017	1.95%

Note: The values in bold highlight the optimal performance achieved for each respective metric.

Table 11. Detailed performance metrics of models under varying noise levels.

Noise Level	Model	RMSE	MAE	R²
1%	AB-BiLSTM	6.3361E−05	3.9414E−05	0.7282
	BiLSTM	6.5385E−05	4.0155E−05	0.7165
	MLP	1.2535E−04	9.8554E−05	−0.0053
	1D-CNN	1.2792E−04	1.0122E−04	−0.0469
	RNN	1.2701E−04	9.7619E−05	−0.0321
	LSTM	6.6070E−05	4.1412E−05	0.7107
	GRU	6.5368E−05	4.1342E−05	0.7166
5%	AB-BiLSTM	6.3589E−05	3.9748E−05	0.7263
	BiLSTM	6.5466E−05	4.1059E−05	0.7258
	MLP	1.2779E−04	1.0108E−04	−0.0448
	1D-CNN	1.2535E−04	9.8513E−05	−0.0052
	RNN	1.2699E−04	9.7622E−05	−0.0317
	LSTM	6.7211E−05	4.2610E−05	0.7110
	GRU	6.5344E−05	4.1658E−05	0.7168
10%	AB-BiLSTM	6.7938E−05	4.1324E−05	0.7047
	BiLSTM	6.7908E−05	4.3254E−05	0.7150
	MLP	1.2775E−04	1.0095E−04	−0.0441
	1D-CNN	1.2536E−04	9.8523E−05	−0.0054
	RNN	1.2699E−04	9.7630E−05	−0.0317
	LSTM	6.8729E−05	4.3845E−05	0.6978
	GRU	6.6685E−05	4.3753E−05	0.7050
15%	AB-BiLSTM	6.9643E−05	4.4518E−05	0.6897
	BiLSTM	6.9893E−05	4.6727E−05	0.6875
	MLP	1.2816E−04	1.0136E−04	−0.0508
	1D-CNN	1.2537E−04	9.8543E−05	−0.0056
	RNN	1.2737E−04	9.8130E−05	−0.0379
	LSTM	6.8646E−05	4.5889E−05	0.6885
	GRU	6.9545E−05	4.5893E−05	0.6806

Note: The values in bold highlight the optimal performance achieved for each respective metric.

Table 12. Test functions.

Function	Dimension	Domain	Theoretical Optimum
$F_{1} (x) = \sum_{i = 1}^{n} x_{i}^{2}$	30	$[- 100,100]$	0
$F_{2} (x) = \sum_{i = 1}^{n} \|x_{i}\| + \prod_{i = 1}^{n} \|x_{i}\|$	30	$[- 10,10]$	0
$F_{3} (x) = \sum_{i = 1}^{n} {(\sum_{j = 1}^{i} x_{j})}^{2}$	30	$[- 100,100]$	0
$F_{4} (x) = \underset{i}{m a x} \{\|x_{i}\|, 1 \leq i \leq n\}$	30	$[- 100,100]$	0
$F_{5} (x) = \sum_{i = 1}^{n - 1} [100 {(x_{i + 1} - x_{i}^{2})}^{2} + {(x_{i} - 1)}^{2}]$	30	$[- 30,30]$	0
$F_{6} (x) = \sum_{i = 1}^{n} {([x_{i} + 0.5])}^{2}$	30	$[- 100,100]$	0
$F_{7} (x) = \sum_{i = 1}^{n} i x_{i}^{4} + r a n d o m [0,1)$	30	$[- 1.28,1.28]$	0
$F_{8} (x) = \sum_{i = 1}^{n} - x_{i} s i n (\sqrt{\| x_{i} \|})$	30	$[- 500,500]$	−12,569.4
$F_{9} (x) = \sum_{i = 1}^{n} [x_{i}^{2} - 10 c o s (2 π x_{i}) + 10]$	30	$[- 5.12,5.12]$	0
$F_{10} (x) = - 20 e x p (- 0.2 \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}) - e x p (\frac{1}{n} \sum_{i = 1}^{n} c o s (2 π x_{i})) + 20 + e$	30	$[- 32,32]$	0
$F_{11} (x) = \frac{1}{4000} \sum_{i = 1}^{n} x_{i}^{2} - \prod_{i = 1}^{n} c o s (\frac{x_{i}}{\sqrt{i}}) + 1$	30	$[- 600,600]$	0
$F_{12} (x) = \frac{π}{n} \{10 s i n (π y_{1}) + \sum_{i = 1}^{n - 1} (y_{i} - 1)^{2} [1 + 10 {s i n}^{2} (π y_{i + 1})] + {(y_{n} - 1)}^{2}\} + \sum_{i = 1}^{n} u (x_{i}, 10,100, 4)$	30	$[- 50,50]$	0
$F_{13} (x) = 0.1 {s i n}^{2} (3 π x_{i}) + \sum_{i = 1}^{n} {(x_{i} - 1)}^{2} [1 + {s i n}^{2} (3 π x_{i} + 1)]$	30	$[- 50,50]$	0
$F_{14} (x) = {(\frac{1}{500} + \sum_{j = 1}^{25} \frac{1}{j + \sum_{i = 1}^{2} {(x_{i} - a_{i j})}^{6}})}^{- 1}$	2	$[- 65,65]$	1
$F_{15} (x) = \sum_{i = 1}^{11} {[a_{i} - \frac{x_{i} (b_{i}^{2} + b_{1} x_{2})}{b_{i}^{2} + b_{1} x_{3} + x_{4}}]}^{2}$	4	$[- 5,5]$	0.00003075
$F_{16} (x) = 4 x_{1}^{2} - 2.1 x_{1}^{4} + \frac{1}{3} x_{1}^{6} + x_{1} x_{2} - 4 x_{2}^{2} + 4 x_{2}^{4}$	2	$[- 5,5]$	−1.0316285
$F_{17} (x) = {(x_{2}^{⊺} - \frac{5.1}{4 π^{2}} x_{1}^{2} + \frac{5}{π} x_{1} - 6)}^{2} + 10 (1 - \frac{1}{8 π}) c o s x_{1} + 10$	2	$[- 5,5]$	0.398
$\begin{matrix} F_{18} (x) = ⌊1 + {(x_{1} + x_{2} + 1)}^{2} (19 - 14 x_{1} + 3 x_{1}^{2} - 14 x_{2} + 6 x_{1} x_{2} + 3 x_{2}^{2})⌋ \\ \times ⌊30 + {(2 x_{1} - 3 x_{2})}^{2} \times (18 - 32 x_{1} + 12 x_{1}^{2} + 48 x_{2} - 36 x_{1} x_{2} + 27 x_{2}^{2})⌋ \end{matrix}$	2	$[- 2,2]$	3
$F_{19} (x) = - \sum_{i = 1}^{4} c_{i} e x p (- \sum_{j = 1}^{3} a_{i j} {(x_{j} - p_{i j})}^{2})$	3	$[0,1]$	−3.86
$F_{20} (x) = - \sum_{i = 1}^{4} c_{i} e x p (- \sum_{j = 1}^{6} a_{i j} {(x_{j} - p_{i j})}^{2})$	6	$[0,1]$	−3.32
$F_{21} (x) = - \sum_{i = 1}^{5} {[(X - a_{i}) {(X - a_{i})}^{T} + c_{i}]}^{- 1}$	4	$[0,10]$	−10
$F_{22} (x) = - \sum_{i = 1}^{7} {[(X - a_{i}) {(X - a_{i})}^{T} + c_{i}]}^{- 1}$	4	$[0,10]$	−10
$F_{23} (x) = - \sum_{i = 1}^{10} {[(X - a_{i}) {(X - a_{i})}^{T} + c_{i}]}^{- 1}$	4	$[0,10]$	−10

Table 13. Performance comparison of SEG-HHO and nine metaheuristic algorithms on the CEC2005 test suite (30 independent runs).

Function		SEG-HHO	HHO	ALO	BO	CMA-ES	GWO	MFO	PSO	SCA	WOA	WWPA
F1	Mean	4.0221E−139	4.3237E−97	1.2531E−08	1.0253E−11	2.1337E−22	3.3835E−57	4.5905E−13	1.7904E+01	1.0187E−11	5.0654E−75	9.6433E−01
	Std	2.1893E−138	2.2390E−96	1.2114E−08	1.3546E−12	1.8426E−22	9.5492E−57	1.2655E−12	1.6916E+01	4.2301E−11	2.7739E−74	1.9933
	Min	1.6133E−171	9.5971E−115	4.3733E−09	7.3048E−12	1.3552E−23	1.8406E−60	3.4952E−15	2.5417	7.2846E−19	9.8147E−92	1.4099E−05
	Max	1.1994E−137	1.2279E−95	4.9236E−08	1.1980E−11	7.7019E−22	4.9406E−56	6.1571E−12	7.6199E+01	2.2515E−10	1.5193E−73	9.7573E+00
	Rank	1	2	9	8	5	4	6	11	7	3	10
F2	Mean	1.3457E−73	1.8175E−50	4.3569E−01	4.4464E−09	1.1494E−01	7.4493E−33	6.7998E−09	1.8576	1.7365E−09	1.1527E−50	2.1362
	Std	4.4841E−73	9.4932E−50	8.8104E−01	4.1546E−10	6.2953E−01	1.5590E−32	9.5554E−09	2.5486	3.8080E−09	6.2569E−50	1.7541
	Min	8.8058E−88	7.5733E−62	2.1778E−05	3.2226E−09	3.6201E−12	3.5334E−35	4.3657E−10	3.6887E−01	1.1886E−12	5.1848E−59	7.0394E−02
	Max	1.9042E−72	5.2069E−49	3.2766E+00	5.0923E−09	3.4481	7.9383E−32	3.9520E−08	1.1626E+01	2.0214E−08	3.4280E−49	5.6360
	Rank	1	3	9	6	8	4	7	10	5	2	11
F3	Mean	2.2224E−113	8.3627E−87	7.5129E−02	1.0715E−11	8.5932E+02	1.5744E−24	7.5142E−02	7.3812E+01	5.2861E−03	2.5802E+02	1.8125E+01
	Std	1.2173E−112	3.3771E−86	2.8736E−01	1.4833E−12	2.4910E+03	4.1683E−24	1.1806E−01	7.5241E+01	2.5573E−02	6.0142E+02	3.5604E+01
	Min	3.0890E−163	1.5349E−103	1.1691E−05	6.6643E−12	3.3521E−16	2.4941E−30	3.0860E−04	3.4035	1.9934E−07	2.7756E−01	1.9237E−02
	Max	6.6672E−112	1.7628E−85	1.5559	1.3073E−11	1.1638E+04	1.9458E−23	4.9815E−01	3.8305E+02	1.4050E−01	3.0595E+03	1.8765E+02
	Rank	1	2	6	4	11	3	7	9	5	10	8
F4	Mean	1.2864E−70	2.5824E−50	7.5664E−03	5.1141E−09	2.4604E−10	2.3031E−18	2.3634	3.1349	4.8785E−04	5.3139	2.8517E−01
	Std	4.1288E−70	6.2700E−50	2.6260E−02	5.6416E−10	1.3137E−10	4.7874E−18	4.7209	1.3879	7.8757E−04	9.8575	2.4805E−01
	Min	2.4386E−88	4.7659E−59	1.2017E−04	3.6044E−09	7.4107E−11	8.6002E−21	2.5910E−03	7.9789E−01	1.9837E−06	4.4480E−03	1.5187E−02
	Max	1.7298E−69	3.0726E−49	1.4461E−01	6.2106E−09	5.4165E−10	2.4514E−17	2.0960E+01	6.1695	3.5907E−03	4.1523E+01	9.2019E−01
	Rank	1	2	7	5	4	3	9	10	6	11	8
F5	Mean	1.3699E−02	3.6246E−03	1.4427E+02	8.9375	1.1373E+03	6.8806	2.9057E+02	8.7679E+02	7.3977	1.4575E+01	3.4891E+01
	Std	2.0436E−02	6.5977E−03	3.0888E+02	2.2660E−02	4.6894E+03	6.1221E−01	7.5627E+02	9.4807E+02	3.9958E−01	4.2675E+01	9.8493E+01
	Min	1.1309E−04	7.3646E−08	3.3591E−04	8.8810	2.8821E−02	5.5997	5.3872E−01	6.5256E+01	6.7225	6.0650	6.3616E−02
	Max	8.1651E−02	2.6413E−02	1.4097E+03	8.9830	2.4505E+04	8.0688	3.0219E+03	3.5664E+03	8.7127	2.4052E+02	5.4888E+02
	Rank	2	1	8	5	11	3	9	10	4	6	7
F6	Mean	7.1125E−04	3.9286E−05	9.5960E−09	1.1660	1.8457E−22	8.3539E−03	2.1482E−13	1.9341E+01	4.4334E−01	1.1771E−03	1.5115
	Std	1.0798E−03	4.8862E−05	7.3957E−09	3.6229E−01	1.8178E−22	4.5738E−02	3.4699E−13	2.0632E+01	1.2892E−01	1.8088E−03	2.7255
	Min	2.7693E−06	2.1451E−08	1.4970E−09	6.0281E−01	2.2973E−23	1.3786E−06	1.6529E−16	6.5394E−01	1.3759E−01	1.6681E−04	6.4583E−04
	Max	4.8326E−03	1.8459E−04	3.2171E−08	1.9152	7.1981E−22	2.5052E−01	1.3355E−12	9.3079E+01	7.9438E−01	9.3304E−03	1.1056E+01
	Rank	5	4	3	9	1	7	2	11	8	6	10
F7	Mean	2.4234E−04	1.3449E−04	2.5568E−02	1.3099E−03	9.4317E−02	6.4183E−04	1.0582E−02	1.2460E−02	3.3207E−03	4.1173E−03	5.5540E−01
	Std	2.1518E−04	1.1525E−04	1.5359E−02	5.6530E−04	4.8975E−01	4.0287E−04	5.7722E−03	8.3088E−03	3.0142E−03	4.7981E−03	1.0855
	Min	2.1300E−05	6.7575E−06	5.9484E−03	3.7901E−04	1.9734E−03	1.7332E−04	2.5292E−03	1.4248E−03	8.3565E−05	4.2644E−05	7.4487E−03
	Max	4.2719E−04	5.2581E−04	6.4417E−02	2.4076E−03	2.6873	1.8263E−03	2.5607E−02	3.6523E−02	1.2389E−02	1.8394E−02	5.1779
	Rank	2	1	9	4	10	3	7	8	5	6	11
F8	Mean	−4.1898E+03	−4.1337E+03	−2.4084E+03	−2.1373E+03	−2.0065E+03	−2.7073E+03	−3.2070E+03	−3.0933E+03	−2.1513E+03	−3.3437E+03	−4.1179E+03
	Std	7.7913E−02	2.1435E+02	5.6280E+02	2.5236E+02	1.0598E+02	3.9449E+02	3.3510E+02	3.0495E+02	1.7189E+02	5.9841E+02	9.7660E+01
	Min	−4.1898E+03	−4.1898E+03	−4.1898E+03	−1.7517E+03	−2.2205E+03	−3.9528E+03	−3.8345E+03	−3.7059E+03	−2.5033E+03	−4.1897E+03	−4.1897E+03
	Max	−4.1895E+03	−3.2607E+03	−1.8059E+03	−2.6788E+03	−1.8059E+03	−1.7848E+03	−2.5226E+03	−2.5019E+03	−1.8031E+03	−2.2658E+03	−3.8155E+03
	Rank	1	2	8	10	11	7	5	6	9	4	3
F9	Mean	0.0000	0.0000	2.1193E+01	2.5615E+01	8.1279E+01	1.0194	2.4883E+01	2.2596E+01	3.5449E−01	4.6969E−01	2.5361E+01
	Std	0.0000	0.0000	9.2015	1.9668E+01	1.0483E+01	1.7340	1.3398E+01	7.5508	1.4250	2.5726	2.1420E+01
	Min	0.0000	0.0000	5.9698	0.0000	5.9464E+01	0.0000	5.9698	6.4443	0.0000	0.0000	3.4409
	Max	0.0000	0.0000	4.1788E+01	5.1699E+01	1.0049E+02	4.5936	5.0814E+01	4.1797E+01	7.6396	1.4091E+01	1.0022E+02
	Rank	1.5	1.5	6	10	11	5	8	7	3	4	9
F10	Mean	8.8818E−16	8.8818E−16	1.1556E−01	2.2528E−09	7.2736E+00	7.9936E−15	9.3379E−02	3.5630	2.5722E−02	4.6777E−15	2.1494
	Std	0.0000	0.0000	3.5245E−01	9.6568E−10	9.7253E+00	2.2853E−15	3.6117E−01	8.3052E−01	1.4087E−01	2.2726E−15	1.5578
	Min	8.8818E−16	8.8818E−16	1.8640E−05	8.2867E−10	4.4702E−12	4.4409E−15	1.6146E−08	1.8003	4.6100E−10	8.8818E−16	1.9373E−02
	Max	8.8818E−16	8.8818E−16	1.1551	4.7654E−09	1.9955E+01	1.5099E−14	1.6462	5.2139	7.7157E−01	7.9936E−15	4.5925
	Rank	1.5	1.5	8	5	11	4	7	10	6	3	9
F11	Mean	0.0000	0.0000	1.7695E−01	1.7097E−13	1.5612E−03	1.8052E−02	1.5688E−01	1.2383	1.3477E−01	4.9390E−02	1.4376E−01
	Std	0.0000	0.0000	1.0286E−01	9.9430E−14	3.2032E−03	2.5002E−02	1.2108E−01	2.8534E−01	2.0162E−01	1.2370E−01	2.0097E−01
	Min	0.0000	0.0000	3.9459E−02	5.3957E−14	0.0000	0.0000	2.9555E−02	6.8737E−01	4.6629E−15	0.0000	1.1735E−04
	Max	0.0000	0.0000	4.9965E−01	5.0770E−13	9.8573E−03	9.5327E−02	6.3256E−01	2.0188	7.8036E−01	4.8854E−01	7.5767E−01
	Rank	1.5	1.5	10	3	4	5	9	11	7	6	8
F12	Mean	1.6329E−04	2.3626E−05	3.0189	1.2885E−01	3.0273E−20	6.4181E−03	2.3710E−01	1.5398	9.9768E−02	8.0953E−03	6.6295E−01
	Std	2.0591E−04	3.8893E−05	2.9063	8.6162E−02	4.6130E−20	1.1701E−02	7.4049E−01	1.4066	4.1493E−02	1.1521E−02	9.1868E−01
	Min	3.2988E−08	8.9026E−08	1.3255E−08	2.4600E−02	8.6676E−22	3.2591E−07	2.2052E−16	7.1929E−02	3.3306E−02	1.8571E−04	2.1025E−03
	Max	9.0597E−04	2.1115E−04	9.4435E+00	3.3026E−01	1.9621E−19	3.9660E−02	3.6897	6.2637	1.7897E−01	4.0584E−02	3.2552
	Rank	3	2	11	7	1	4	8	10	6	5	9
F13	Mean	1.3790E−03	8.6393E−05	1.7998E−03	4.1957E−01	4.8367E−20	2.3475E−02	4.7612E−03	1.4157	3.4762E−01	2.5076E−02	1.2071E−01
	Std	2.4770E−03	1.9683E−04	4.9380E−03	1.7401E−01	7.5554E−20	4.3276E−02	8.9788E−03	1.2985	8.8778E−02	3.1795E−02	1.4885E−01
	Min	1.3859E−05	1.3577E−07	2.3292E−09	1.5318E−01	1.5840E−21	2.9422E−06	4.3275E−16	1.3607E−01	1.6382E−01	4.3244E−04	1.5739E−03
	Max	1.2234E−02	8.7306E−04	2.1027E−02	7.8560E−01	3.0408E−19	1.0476E−01	4.3949E−02	5.2993	5.0547E−01	1.0903E−01	6.5415E−01
	Rank	3	2	4	10	1	6	5	11	9	7	8
F14	Mean	9.9800E−01	1.5156	2.1861	1.3808	5.6075	6.9591	2.9717	9.9842E−01	1.7240	3.0226	1.6062
	Std	5.7011E−10	1.2596	1.4746	5.8174E−01	3.4400	4.6046	2.3075	1.4717E−03	1.8861	3.5889	1.4332
	Min	9.9800E−01	9.9800E−01	9.9800E−01	9.9800E−01	1.1420	9.9800E−01	9.9800E−01	9.9800E−01	9.9800E−01	9.9800E−01	9.9800E−01
	Max	9.9800E−01	5.9288	6.9033	3.1916	1.3619E+01	1.2671E+01	8.8408	1.0053	1.0763E+01	1.0763E+01	6.9155
	Rank	1	4	7	3	10	11	8	2	6	9	5
F15	Mean	4.5043E−04	4.5478E−04	2.1646E−03	4.3284E−04	3.3546E−03	3.1136E−03	1.5957E−03	6.3656E−03	9.0330E−04	6.3855E−04	4.5102E−03
	Std	2.6430E−04	3.1606E−04	5.0239E−03	1.2105E−04	3.2217E−03	6.8860E−03	3.5600E−03	8.5956E−03	3.3178E−04	3.8501E−04	2.6013E−03
	Min	3.1107E−04	3.0775E−04	3.5349E−04	3.2073E−04	1.2444E−03	3.0756E−04	5.7500E−04	5.8275E−04	3.5141E−04	3.0750E−04	6.5236E−04
	Max	1.4880E−03	1.4592E−03	2.0885E−02	8.4493E−04	1.7946E−02	2.0363E−02	2.0363E−02	2.0371E−02	1.5359E−03	1.8719E−03	1.0673E−02
	Rank	2	3	7	1	9	8	6	11	5	4	10
F16	Mean	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0315	−1.0316	−1.0316	−9.1372E−01
	Std	1.3115E−09	3.9422E−09	1.7408E−13	2.2544E−06	6.7752E−16	1.7814E−08	6.7752E−16	1.7847E−04	3.9793E−05	7.3095E−10	8.7428E−02
	Min	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0151
	Max	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0316	−1.0308	−1.0315	−1.0316	−6.6497E−01
	Rank	5	5	5	5	5	5	5	10	5	5	11
F17	Mean	3.9789E−01	3.9789E−01	3.9789E−01	4.0133E−01	4.0372E−01	3.9789E−01	3.9789E−01	3.9911E−01	4.0019E−01	3.9790E−01	4.2217E−01
	Std	1.1558E−05	2.1284E−05	1.3221E−13	1.0291E−02	2.0633E−02	4.0530E−06	0.0000	2.0462E−03	2.2699E−03	1.8988E−05	2.1706E−02
	Min	3.9789E−01	3.9789E−01	3.9789E−01	3.9792E−01	3.9789E−01	3.9789E−01	3.9789E−01	3.9790E−01	3.9807E−01	3.9789E−01	3.9883E−01
	Max	3.9795E−01	3.9798E−01	3.9789E−01	4.5511E−01	5.0270E−01	3.9790E−01	3.9789E−01	4.0639E−01	4.0767E−01	3.9796E−01	4.7649E−01
	Rank	3	5	1.5	9	10	4	1.5	7	8	6	11
F18	Mean	3.0000	3.0000	3.0000	3.2091	5.7331	3.0000	3.0000	3.0088	3.0001	3.9006	4.1195
	Std	1.9351E−07	3.7305E−07	4.9870E−13	5.3784E−01	1.4970E+01	5.3590E−05	1.2148E−15	1.1239E−02	2.0373E−04	4.9325	8.6770E−01
	Min	3.0000	3.0000	3.0000	3.0004	3.0000	3.0000	3.0000	3.0003	3.0000	3.0000	3.0674
	Max	3.0000	3.0000	3.0000	5.9236	8.4992E+01	3.0002	3.0000	3.0416	3.0010	3.0016E+01	6.1939
	Rank	3.5	3.5	1.5	8	11	5	1.5	7	6	9	10
F19	Mean	−3.8628	−3.8594	−3.8628	−3.7976	−3.8628	−3.8616	−3.8628	−3.8618	−3.8528	−3.8518	−3.6951
	Std	7.0773E−05	3.3573E−03	7.6683E−13	5.6929E−02	2.7101E−15	2.4787E−03	2.7101E−15	1.9228E−03	2.6319E−03	1.2384E−02	1.0024E−01
	Min	−3.8628	−3.8628	−3.8628	−3.8628	−3.8628	−3.8628	−3.8628	−3.8628	−3.8576	−3.8621	−3.8404
	Max	−3.8624	−3.8476	−3.8628	−3.6714	−3.8628	−3.8549	−3.8628	−3.8548	−3.8437	−3.8138	−3.4705
	Rank	4	7	2	10	2	6	2	5	8	9	11
F20	Mean	−3.2886	−3.0837	−3.2701	−3.0421	−3.1923	−3.2612	−3.2284	−3.2379	−2.9662	−3.2073	−2.3642
	Std	6.3050E−02	1.2127E−01	6.0326E−02	1.7531E−01	2.8053E−01	7.4955E−02	5.3868E−02	1.0683E−01	1.9522E−01	1.1681E−01	3.1541E−01
	Min	−3.3220	−3.3088	−3.3220	−3.3175	−3.3220	−3.3220	−3.3220	−3.3140	−3.1538	−3.3217	−3.1143
	Max	−3.1196	−2.8477	−3.1981	−2.6698	−1.9084	−3.0816	−3.1376	−2.8946	−2.2364	−3.0092	−1.7193
	Rank	1	8	2	9	7	3	5	4	10	6	11
F21	Mean	−1.0136E+01	−5.2134	−6.3718	−4.5390	−7.1262	−8.6158	−7.1338	−9.2629	−2.6131	−8.5277	−5.3737
	Std	3.3253E−02	8.8795E−01	3.0741	3.0250E−01	3.1973	2.6461	3.1867	2.0800	1.9487	2.5277	1.7623
	Min	−1.0153E+01	−9.9147	−1.0153E+01	−4.9442	−1.0153E+01	−1.0153E+01	−1.0153E+01	−1.0140E+01	−6.2452	−1.0151E+01	−1.0081E+01
	Max	−1.0002E+01	−5.0282	−2.6305	−3.5246	−2.6305	−2.6301	−2.6305	−2.6038	−4.9652E−01	−2.6298	−3.5762
	Rank	1	9	7	10	6	3	5	2	11	4	8
F22	Mean	−1.0398E+01	−5.2393	−6.7556	−4.2772	−9.0340E+00	−1.0185E+01	−6.5457	−1.0268E+01	−3.2287	−7.0900	−5.7023
	Std	5.9680E−03	8.5907E−01	3.5373	4.0367E−01	2.8140E+00	1.1856	3.5539	1.9469E−01	2.0636	3.0621	1.9896
	Min	−1.0403E+01	−9.7876	−1.0403E+01	−4.8239	−1.0403E+01	−1.0403E+01	−1.0403E+01	−1.0400E+01	−9.2492	−1.0401E+01	−1.0198E+01
	Max	−1.0382E+01	−5.0599	−1.8376	−3.3661	−2.7519	−3.9070	−1.8376	−9.5134	−5.2104E−01	−1.8374	−3.7635
	Rank	1	9	6	10	4	3	7	2	11	5	8
F23	Mean	−1.0529E+01	−4.8521	−5.7379	−4.2550	−1.0043E+01	−1.0084E+01	−7.7387	−1.0233E+01	−3.4865	−6.9959	−5.8250
	Std	9.5381E−03	8.3014E−01	3.3166	1.0075	1.8878	1.7514	3.5573	1.0586	1.4188	3.5046E+00	1.7594
	Min	−1.0536E+01	−5.1284	−1.0536E+01	−7.5831	−1.0536E+01	−1.0536E+01	−1.0536E+01	−1.0536E+01	−5.0160	−1.0536E+01	−1.0141E+01
	Max	−1.0496E+01	−2.3986	−1.8595	−2.3182	−2.4273	−2.4217	−1.6766	−4.6748	−9.4350E−01	−1.6756	−3.2967
	Rank	1	9	8	10	4	3	5	2	11	6	7
	Average Rank	2.0435	3.8261	6.3043	7.0000	6.8261	4.7391	5.8696	7.6522	7.0000	5.9130	8.8261
	Total Rank	1	2	6	8.5	7	3	4	10	8.5	5	11

Note: The values in bold highlight the optimal performance achieved for each respective metric.

Table 14. Decoded optimal tightening plan.

Tightening Step	Bolt ID to Tighten	Optimal Preload (N)
1	1	3922.0301
2	4	6018.8798
3	3	4227.532
4	0	3936.155
5	2	7498.363
6	5	3974.7681
7	6	5928.3233
8	9	5408.6954
9	11	6107.7574
10	8	4014.669
11	7	6215.32
12	10	5646.2751
13	13	4208.5797
14	16	3977.4113
15	17	7092.6078
16	14	6096.561
17	12	5722.6305
18	15	3957.5649
19	21	5594.9603
20	18	7500
21	23	7365.7369
22	20	4090.1051
23	19	7498.3514
24	22	7320.901

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, D.; Jian, Y.; Yang, H. A Sequence-Aware Surrogate-Assisted Optimization Framework for Precision Gyroscope Assembly Based on AB-BiLSTM and SEG-HHO. Electronics 2025, 14, 3470. https://doi.org/10.3390/electronics14173470

AMA Style

Lin D, Jian Y, Yang H. A Sequence-Aware Surrogate-Assisted Optimization Framework for Precision Gyroscope Assembly Based on AB-BiLSTM and SEG-HHO. Electronics. 2025; 14(17):3470. https://doi.org/10.3390/electronics14173470

Chicago/Turabian Style

Lin, Donghuang, Yongbo Jian, and Haigen Yang. 2025. "A Sequence-Aware Surrogate-Assisted Optimization Framework for Precision Gyroscope Assembly Based on AB-BiLSTM and SEG-HHO" Electronics 14, no. 17: 3470. https://doi.org/10.3390/electronics14173470

APA Style

Lin, D., Jian, Y., & Yang, H. (2025). A Sequence-Aware Surrogate-Assisted Optimization Framework for Precision Gyroscope Assembly Based on AB-BiLSTM and SEG-HHO. Electronics, 14(17), 3470. https://doi.org/10.3390/electronics14173470

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Sequence-Aware Surrogate-Assisted Optimization Framework for Precision Gyroscope Assembly Based on AB-BiLSTM and SEG-HHO

Abstract

1. Introduction

2. Related Work

3. Surrogate Modeling for Assembly Error Prediction

3.1. Dataset Generation via FEM Simulation

3.2. Physical Constraints of the Assembly Process

3.3. Data Preprocessing and Feature Engineering

3.3.1. Feature Selection

3.3.2. Data Transformation and Normalization

3.4. The Proposed AB-BiLSTM Predictive Model

3.4.1. Input Embedding Layer

3.4.2. BiLSTM Feature Extraction Layer

3.4.3. Temporal Attention Mechanism

3.4.4. Output Layer and Loss Function

4. Process Parameter Optimization via SEG-HHO

4.1. Mathematical Formulation of the Optimization Problem

4.2. The Improved Harris Hawks Optimization Algorithm, SEG-HHO

4.2.1. Standard HHO

4.2.2. The Proposed Stratified and Hybrid Search Strategies

4.3. Encoding and Decoding Mechanism for Assembly Constraints

4.3.1. Definition of the Hybrid Encoding Vector

4.3.2. The Decoding Process

4.3.3. Discussion on the Constraint-Handling Strategy

5. Experiments and Result Analysis

5.1. Experimental Setup

5.1.1. Dataset Overview and Partitioning

5.1.2. Experimental Environment

5.1.3. Performance Metrics

5.2. Performance Evaluation of Predictive Models

5.2.1. Model Selection

5.2.2. Experiment Results

5.2.3. Analysis of the Attention Mechanism

5.2.4. Anti-Noise Robustness Test

5.3. Performance Verification of Optimization Algorithms

5.4. Analysis of the Optimal Assembly Scheme

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1

Appendix A.2

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI