Prediction of Array Antenna Assembly Accuracy Based on Auto-Encoder and Boosting-OSKELM

: As a critical component for space exploration, navigation, and national defense, array antenna secures an indispensable position in national strategic signiﬁcance. However, various parts and complex assembly processes make the array antenna hard to meet the assembly standard, which causes repeated rework and delay. To realize the accurate and efﬁcient prediction of the assembly accuracy of array antenna, a prediction method based on an auto-encoder and online sequential kernel extreme learning machine with boosting (Boosting-OSKELM) is proposed in this paper. The method is mainly divided into two steps: Firstly, the auto-encoder with the ﬁne-tuning trick is used for training and representation reduction of the data. Then, the data are taken as the input of Boosting-OSKELM to complete the initial training of the model. When new sample data is generated, Boosting-OSKELM can realize the online correction of the model through rapid iteration. Finally, the test shows that the average MSE of Boosting-OSKELM and ANN is 0.061 and 0.12, and the time consumption is 0.85 s and 15 s, respectively. It means that this method has strong robustness in prediction accuracy and online learning ability, which is conducive to the development of array antenna assembly.


Introduction
The electronic information industry is an important driving force for today's economic and social development.It is a strategic, basic, and leading pillar industry of the national economy, which plays an important role in promoting economic growth, industrial structure, changing development mode, and maintaining national security [1,2].In recent years, high-precision integrated antenna (array antenna), as an indispensable part of early warning and detection systems, has become the core of the national major project "Space-Earth Integration Network" [3].The basic structure of the array antenna body is shown in Figure 1.It can be mainly divided into three parts, namely the antenna subarray element, the function layer, and power.However, the specific configuration of the array antenna is rather complex.Taking the new exploration satellite as an example, its array antenna is composed of 500 subarrays and more than 1 million parts, with over 100 million assembly welding points.What's more, it needs to be in trouble-free service for more than 8 years in the harsh space environment, which features extremely high requirements.There have been many related researches in this field.In order to study the effect of assembly error on the antenna gain, Guo et al. [6] proposed an accurate gain prediction model using an improved XGBoost algorithm and the transfer learning method, based on the simulation data and experience.Combined with the fact that the geometric character- Due to the complexity of the structure and high requirements for quality, array antennas are mostly assembled manually or with the help of mechanical equipment [4].With the uncertainty of assembly activities, the assembly process is often unqualified or the performance of the mechanical and electrical cannot meet the design standard, which results in an average of more than 12 months of repeated adjustment, seriously reducing the assembly efficiency [5].Therefore, in order to shorten the assembly cycle of array antennae and guarantee product quality, it is urgent to introduce an effective method of assembly accuracy prediction to the manufacturing of array antennae.
There have been many related researches in this field.In order to study the effect of assembly error on the antenna gain, Guo et al. [6] proposed an accurate gain prediction model using an improved XGBoost algorithm and the transfer learning method, based on the simulation data and experience.Combined with the fact that the geometric characteristics of parts/components of the aero-engine rotor are not related to the measurement datum, Liu et al. [7] proposed a datum error elimination method that makes the rotor characteristic matrix and assembly model more accurate, thereby improving the prediction effect of assembly accuracy.Mu et al. [8] studied the construction method of composite processing components considering the manufacturing error and deformation factors of parts and proposed a new prediction method for aero-engine high-pressure rotor systems.Aiming at the goals of Zero-defect Manufacturing, Elisa et al. [9] established a diagnostic tool that provides an in-line identification of critical steps of assembly processes.The methodology is based on a self-adaptive defect prediction model of the process, which can be updated with the input of new data.The research mentioned above either established a data-driven model in regard to historical data and simulation data or built a mechanical model based on physical principles.However, both of them are off-line prediction models, which is difficult to achieve rapid prediction in some complex cases.Moreover, it also lacks the ability of model iteration, leading to a low efficiency in dealing with new sample data for online correction of the model.
In recent years, digital twin technology has been widely studied and applied in advanced manufacturing.As a virtual-physical fusion technology, it can realize the virtualphysical interaction, data fusion, decision analysis, and iterative optimization of the whole assembly process by using the twin data of assembly context, based on the virtual assembly information model and quantitative calculation of assembly quality [10].Obviously, the implementation of digital twin technology is inseparable from artificial intelligence technology, which is equipped with the ability of high-performance data analysis and real-time prediction.However, the assembly process of an array antenna has the following features, which bring difficulties to assembly accuracy prediction: (1) The assembly process of an array antenna is complex and variable, and the data of its assembly samples are mostly in high-dimensional space, so a single sample may have redundant or even contradictory information between sample features, which is a disadvantage to the prediction of assembly accuracy.(2) Array antenna belongs to the small batch production mode, and its historical sample data has limitations, which will disturb the training of the machine learning model.
As to the problem of high-dimension, data mining technology can be used to extract valuable information [11].The commonly used data mining method is representation reduction, which is a dimension-reducing or feature extraction method.It can reduce the distance between sample points in high-dimensional space and preserve the valuable information of samples as much as possible, which is conducive to improving the inference speed and accuracy of the prediction algorithm [12].In traditional machine learning, the commonly used feature extraction methods are mainly divided into two parts, that is, linear dimension reduction represented by PCA [13] and nonlinear dimension reduction based on manifold learning.PCA may bring more information loss in nonlinear problems, while manifold learning (such as LLE [14], t-SNE [15], etc.) starts from the relationship between samples, which leads to a high effectiveness of dimension reduction but also a high computational complexity.So the traditional method cannot calculate the dimension reduction at the second level.
For Few-shot learning problems in engineering projects, the Kriging algorithm is often used.Although this method does not rely on the amount of data, it still depends on the distribution of data.If the distribution of data is poor, the prediction accuracy of Kriging may be greatly affected [16].In addition, statistical learning methods such as the K-nearest neighbor algorithm (KNN) [17] and Support Vector Regression (SVR) [18] can also adapt to Few-shot learning problems.But these algorithms often use offline learning, and there is no good online incremental learning method for new sample data generated in the future.
In view of the issues above, the main contributions of the study in this paper are as follows: (1) An improved auto-encoder is proposed to implement feature extraction, which not only has high computational efficiency but also can adapt to small sample problems.(2) An online sequential extreme learning machine with Boosting strategy(Boosting-OSKELM) is proposed to adapt to online learning.This method possesses a fast speed of learning and model iteration, which can meet online learning requirements.

Dimension Reduction Principle of Auto-Encoder
In recent years, neural networks have made remarkable achievements in many fields, such as genetics [19], graph classification [20], medical diagnosis [21], fault diagnosis [22], and so on.Based on the general approximation theorem [23], it can fit any function in theory.Its general structure is mainly divided into three parts: input layer, hidden layer, and output layer.The input layer and output layer correspond to the input data and prediction results respectively, and the middle hidden layer introduces a nonlinear activation function, which enables the neural network to learn more hidden feature information.Therefore, the neural network is good at multi-level representation and data prediction of nonlinear systems.
In the field of data dimension reduction, the common model of the neural network is auto-encoder (AE), which is a neural network aiming at restoring the input data to the greatest extent [24].Figure 2 is the structure of a single-layer auto-encoder (SAE), which is fully connected among layers.Assuming that the sample i is x(i) ∈ R d , the AE will map this sample to a new feature space z(i) ∈ R m ; then, the AE will reconstruct the new sample representation into the original one, and the reconstructed sample is defined as only has high computational efficiency but also can a (2) An online sequential extreme learning machine w OSKELM) is proposed to adapt to online learning.Th of learning and model iteration, which can meet onli

Dimension Reduction Principle of Auto-Encoder
In recent years, neural networks have made remarka such as genetics [19], graph classification [20], medical dia and so on.Based on the general approximation theorem [23 Its general structure is mainly divided into three parts: inpu layer.The input layer and output layer correspond to the i respectively, and the middle hidden layer introduces a non enables the neural network to learn more hidden feature in network is good at multi-level representation and data pre In the field of data dimension reduction, the common auto-encoder (AE), which is a neural network aiming at res est extent [24].Figure 2 is the structure of a single-layer au connected among layers.Assuming that the sample i is sample to a new feature space ( ) m i R ∈ z ; then, the AE will resentation into the original one, and the reconstructed sam  As can be seen from the above figure, the AE can be divided into two parts: encoder f : R d → R m and decoder g : R m → R d .The learning goal of the AE is to minimize the reconstruction loss shown in Equation (1).
where ||•|| is the vector-2 norm, and the mapping relationship f from x(i) to z(i) is repre- sented in Equation (2), where W (1) and b (1) respectively represent the connection weight and bias of the encoder part.Similarly, there is a mapping relationship g from z(i) to x (i), which is represented by Equation (3). (1)x (i) + b (1)  (2) If the dimension number of the feature space m is less than the dimension of the original space d, the AE can be regarded as a feature extraction method.The following is a sample representation reduction method.If necessary, in order to prevent over-fitting, the binding weight can be considered, that is W (1) = W (2) .Then the regularization is used, as shown in Equation ( 4), where ||•|| F represents the F norm of the matrix, λ is the coefficient of regularization.
In some scenarios, the number of hidden layers can be increased to form a deep autoencoder (DAE).Theoretically, deeper layers of neural networks mean more neurons and stronger learning performance.However, in practice, blindly stacking layers may lead to gradient disappearance or gradient explosion, resulting in the non-convergence of neural training.Therefore, the construction of a neural network requires skills and experience.

Fine-Tuning Trick
In early neural network training, if the random initialization strategy is adopted for the model parameters, the training of deep networks will be difficult.In this regard, when training a deep belief network, Hinton et al. [25] proposed a greedy pre-training method, to pre-train the restricted Boltzmann machine of each layer, and finally add the output layer.This method is called fine-tuning, which is a common and important deep-learning training skill.At present, it has been widely used in many fields of artificial intelligence, especially in the field of transfer learning.The basic idea of fine-tuning is that, based on network pre-training, the network is modified through the traditional global learning algorithm to make the model converge to a better local optimum.
This paper uses the fine-tuning trick based on supervised learning.The basic process is shown in Figure 3. Firstly, the decoder output of AE is combined with the input of multi-layer perceptron (MLP), namely the output of the encoder is used as the input of MLP.Then the network structure is used to train the regression problem on the dataset, which is called pre-training.The reason for choosing MLP is that the network weight parameters of AE can be updated through gradient backpropagation.After pre-training, the AE structure is separated and connection weights in the pre-training stage are retained.On this basis, the weight of AE is retrained, which is called tuning.After the fine-tuning is completed, the decoder part of AE is removed, while the encoder is retained, and the representation reduction model of the sample data is obtained.

Online Learning Model
In theory, the network structure of the encoder and MLP can be directly used to predict assembly accuracy.However, due to the lack of available training samples, the generalization performance of the trained neural network may be poor.In addition, from the perspective of model online correction, although neural networks can correct the model through the gradient descent method, it heavily relies on batch data.If it is corrected by only a small number of samples, the prediction performance of the model may be sharply reduced due to the problem of "data poison" [26].
In order to solve the problem of model online learning, scholars have proposed a large number of online learning algorithms.Among them, the online sequential extreme learning machine (OSELM) is favored by many scholars because of its fast training speed and good generalization performance, which can achieve online prediction of sample data efficiently [27].It has the following advantages: (1) The learning speed of OSELM is very fast, which avoids the disadvantage of slow back-gradient updates of traditional BP neural networks.(2) It is easy for OSELM to obtain the global optimal solution.The optimization model, namely the least square method, is used to solve the network weight.(3) OSELM has few parameters, which avoids the great influence of learning rate parameters on the performance of the BP neural network; (4) As OSKELM is free from the gradient descent method, while updating parameters through matrix transformation, its calculation speed is faster so it has a strong online learning ability.Additionally, because of its simple structure, OSKELM features a lower chance of overfitting, which makes it adapt to small sample data.OSELM derives from extreme learning machines (ELM) [28].Based on this, the incremental learning formula of new samples is achieved.ELM is a feedforward neural network, and its basic structure is shown in Figure 4, where w represents the weight be- tween the input layer and hidden layer neurons, b stands for biases, β is the weight between the hidden layer and output layer neurons, and g is the activation function of hidden layer neurons.

Online Learning Model
In theory, the network structure of the encoder and MLP can be directly used to predict assembly accuracy.However, due to the lack of available training samples, the generalization performance of the trained neural network may be poor.In addition, from the perspective of model online correction, although neural networks can correct the model through the gradient descent method, it heavily relies on batch data.If it is corrected by only a small number of samples, the prediction performance of the model may be sharply reduced due to the problem of "data poison" [26].
In order to solve the problem of model online learning, scholars have proposed a large number of online learning algorithms.Among them, the online sequential extreme learning machine (OSELM) is favored by many scholars because of its fast training speed and good generalization performance, which can achieve online prediction of sample data efficiently [27].It has the following advantages: (1) The learning speed of OSELM is very fast, which avoids the disadvantage of slow back-gradient updates of traditional BP neural networks.(2) It is easy for OSELM to obtain the global optimal solution.The optimization model, namely the least square method, is used to solve the network weight.(3) OSELM has few parameters, which avoids the great influence of learning rate parameters on the performance of the BP neural network; (4) As OSKELM is free from the gradient descent method, while updating parameters through matrix transformation, its calculation speed is faster so it has a strong online learning ability.Additionally, because of its simple structure, OSKELM features a lower chance of overfitting, which makes it adapt to small sample data.
OSELM derives from extreme learning machines (ELM) [28].Based on this, the incremental learning formula of new samples is achieved.ELM is a feedforward neural network, and its basic structure is shown in Figure 4, where w represents the weight between the input layer and hidden layer neurons, b stands for biases, β is the weight between the hidden layer and output layer neurons, and g is the activation function of hidden layer neurons.
Different from the current popular deep neural network, it work weight through back-gradient propagation but solves th Moore-Penrose generalized inverse.Define dataset with n , and rons in the hidden layer is L and its activation function is g , on the data set is shown in Equation ( 5), and convert it to a matrix represen put shown in Equation ( 6).
( ) ELM training is mainly divided into two stages.The first st where ELM randomly initializes the weight i w and bias i b fro hidden layer.The second stage is linear parameter solving.Acco bias in the first stage, combined with the optimization problems and ( 7), then β can be solved.Different from the current popular deep neural network, it does not update the network weight through back-gradient propagation but solves the weight value through Moore-Penrose generalized inverse.Define dataset with n samples (X, T), where X = [x (1) , x (2) , . . ., x (N) ] T , x (n) ∈ R d , and T = [t (1) , t (2) , . . ., If the number of neurons in the hidden layer is L and its activation function is g, the final output of ELM on the data set is shown in Equation ( 5), Define h i = g i (w i X + b i ) and convert it to a matrix representation to obtain the output shown in Equation (6).
f L (X) = Hβ (6) where ELM training is mainly divided into two stages.The first stage is random mapping, where ELM randomly initializes the weight w i and bias b i from the input layer to the hidden layer.The second stage is linear parameter solving.According to the weight and bias in the first stage, combined with the optimization problems shown in Equations ( 6) and ( 7), then β can be solved. min In order to enhance the stability of H (the nonsingular matrix), the regularization coefficient C and identity matrix I are introduced.And because the matrix H is often row full rank, so the optimal value β is shown in Equation (8), which H + represents the Moore-Penrose generalized inverse matrix H.
Although the learning speed of ELM is very fast, its prediction performance still lags behind the popular deep neural network.In order to enhance the nonlinear fitting ability of ELM, a kernel function is introduced to form a kernel extreme learning machine (KELM) [29].
The kernel function K x (i) , x (j) = h x (i) •h x (j) is a common method to solve nonlinear problems.It maps the data in the original feature space to the new high-dimensional one.Learning is implicit in the new feature space, and there is no need to explicitly define the kernel mapping function in the feature space.The kernel matrix Ω = HH T is defined according to Mercer condition [30], as shown in Equation (9). (1), x (1)  . . .K x (1) , x (N)   . . . . . . . . .K x (N) , x (1)  . . .K x (N) , Common kernel functions are as follows: (1) Polynomial kernel function (2) Gaussian kernel function (3) Linear kernel function (i.e., no kernel) where a, b, p, σ are constants.
It can be clearly seen that the introduction of kernel function makes ELM no longer affected by random weights w b, and the prediction of the new sample x can be calculated directly according to Equation (13). (1), . . ., K x, x (N)  HH When the kernel function is introduced into OSELM, an online sequential kernel extreme learning machine (OSKELM) is formed.For the dataset (x (i) , , which is set up to time t, the prediction result is f (x) = h(x)β t , where β t is β at time t.Define k t (x) and θ t , as is shown in Equations ( 14) and ( 15), then Equation ( 16) can be derived.
where t+1) .Thus, the iterative formula of the kernel function coefficient vector is obtained, as shown in Equation (21).In this way, the new samples can be predicted according to Equation (16).
To sum up, the online training of OSKELM does not need to organize the old and new data together for retraining, but to absorb the new sample information by updating the matrix A t+1 .After updating, the old dataset information will not be needed, which greatly reduces the computational complexity and improves efficiency.

Boosting-OSKELM
The goal of the supervised learning algorithm is to train stable models that perform well in all aspects.However, most of the time, the performance of supervised learning algorithms can only have decent performance in specific fields, which is also called weak learning.According to ensemble learning theory, weak learners and strong learners are actually equivalent, as several weak learners can obtain the same prediction performance as strong learners through a special combination method.Among them, boosting is a common ensemble learning skill.Its basic idea is to correct the wrong prediction of other weak learners through weak learners.
Based on the idea of boosting, in order to squeeze the performance of the learner as much as possible, this paper ensembles multiple limit learners with different kernels, and proposes the Boosting-OSKELM algorithm, which is shown in Figure 5. Firstly, the polynomial kernel OSKELM is used on the original dataset for training.Then, calculate the residual between the fitting result and the real result on the training set, replace the label of the original data with the residual, and then use Gaussian kernel OSKELM to learn and predict the new residual label.Finally, calculate the residual again, namely the 'residual' of the residual, and so on.
The inverse matrix of the matrix ( ) . Thus, the iterative formula of the kernel function coefficient vector is obtained, as shown in Equation (21).In this way, the new samples can be predicted according to Equation ( 16).
To sum up, the online training of OSKELM does not need to organize the old and new data together for retraining, but to absorb the new sample information by updating the matrix 1 t+ A .After updating, the old dataset information will not be needed, which greatly reduces the computational complexity and improves efficiency.

Boosting-OSKELM
The goal of the supervised learning algorithm is to train stable models that perform well in all aspects.However, most of the time, the performance of supervised learning algorithms can only have decent performance in specific fields, which is also called weak learning.According to ensemble learning theory, weak learners and strong learners are actually equivalent, as several weak learners can obtain the same prediction performance as strong learners through a special combination method.Among them, boosting is a common ensemble learning skill.Its basic idea is to correct the wrong prediction of other weak learners through weak learners.
Based on the idea of boosting, in order to squeeze the performance of the learner as much as possible, this paper ensembles multiple limit learners with different kernels, and proposes the Boosting-OSKELM algorithm, which is shown in Figure 5. Firstly, the polynomial kerne OSKELM is used on the original dataset for training.Then, calculate the residual between the fitting result and the real result on the training set, replace the label of the original data with the residual, and then use Gaussian kernel OSKELM to learn and predict the new residua label.Finally, calculate the residual again, namely the residual' of the residual, and so on.
The pseudo-code of Boosting-OSKELM is shown in Algorithm 1.In practice, the training of the model does not have to follow the lifting order of the Polynomial kernel → Gaussian kernel → Linear kernel, but can be adjusted appropriately, or even reuse a kernel, such as Gaussian kernel → Gaussian kernel → Gaussian kernel.The pseudo-code of Boosting-OSKELM is shown in Algorithm 1.In practice, the training of the model does not have to follow the lifting order of the Polynomial kernel → Gaussian kernel → Linear kernel, but can be adjusted appropriately, or even reuse a kernel, such as Gaussian kernel → Gaussian kernel → Gaussian kernel.

Data Description
The simplified array antenna subarray unit is shown in Figure 6.It is a stacking structure of three-layer flexible plates, including a soaking plate, PCB plate, and backing plate from bottom to top.The surface of the soaking plate has circular bosses with different heights, which are used to insert different panels.There will be slightly raised bosses on the surface of the PCB for welding the connector, and the other end of the connector is assembled with the insertion pins at the bottom of the backing plate.The connection between the panels is fixed by screws, while different screw preloads make different degrees of deformation between the panels, resulting in greater stress on the welding position of the connector, which seriously affects the assembly quality of the array antenna.

Data Description
The simplified array antenna subarray unit is shown in Figure 6.It is a stacking structure of three-layer flexible plates, including a soaking plate, PCB plate, and backing plate from bottom to top.The surface of the soaking plate has circular bosses with different heights, which are used to insert different panels.There will be slightly raised bosses on the surface of the PCB for welding the connector, and the other end of the connector is assembled with the insertion pins at the bottom of the backing plate.The connection between the panels is fixed by screws, while different screw preloads make different degrees of deformation between the panels, resulting in greater stress on the welding position of the connector, which seriously affects the assembly quality of the array antenna.Based on ANSYS simulation software, deformation simulations under different preload were carried out, one of which is shown in Figure 7.It can be seen that because the backing plate will be arched to the middle under the action of preload, the middle area features large relative displacement, namely the main source of assembly error.In order to measure the assembly accuracy conveniently, this paper selects the relative displacement in the X and Y direction of the position shown in Figure 8 as the prediction target.Based on ANSYS simulation software, deformation simulations under different preload were carried out, one of which is shown in Figure 7.It can be seen that because the backing plate will be arched to the middle under the action of preload, the middle area features large relative displacement, namely the main source of assembly error.In order to measure the assembly accuracy conveniently, this paper selects the relative displacement in the X and Y direction of the position shown in Figure 8 as the prediction target.

Data Description
The simplified array antenna subarray unit is shown in Figure 6.It is a stacking structure of three-layer flexible plates, including a soaking plate, PCB plate, and backing plate from bottom to top.The surface of the soaking plate has circular bosses with different heights, which are used to insert different panels.There will be slightly raised bosses on the surface of the PCB for welding the connector, and the other end of the connector is assembled with the insertion pins at the bottom of the backing plate.The connection between the panels is fixed by screws, while different screw preloads make different degrees of deformation between the panels, resulting in greater stress on the welding position of the connector, which seriously affects the assembly quality of the array antenna.Based on ANSYS simulation software, deformation simulations under different preload were carried out, one of which is shown in Figure 7.It can be seen that because the backing plate will be arched to the middle under the action of preload, the middle area features large relative displacement, namely the main source of assembly error.In order to measure the assembly accuracy conveniently, this paper selects the relative displacement in the X and Y direction of the position shown in Figure 8 as the prediction target.MinMax normalization strategy is more suitable for data normalization.As a result, the data in both Tables 1 and 2 are normalized according to Equation (22).

Auto-Encoder Training
The training of neural networks is different from traditional machine learning.It involves a large number of hyper-parameters, including the network structure, learning rate, and the selection of learners.If using traditional grid search, it will occupy a large amount of computing resources.Therefore, when training the AE, this paper gives empirical fixed values for some hyper-parameters.In this case, the input data of the model is the preload of each screw (13 screws in total), and the output is the relative displacement of each position shown in Figure 7.This paper uses numerical simulation in ANSYS simulation, and 40 corresponding data are obtained as shown in Tables 1 and 2. According to the simulation results, it was found that the deformation between each sample point is relatively close, which shows that the MinMax normalization strategy is more suitable for data normalization.As a result, the data in both Tables 1 and 2 are normalized according to Equation (22).

Auto-Encoder Training
The training of neural networks is different from traditional machine learning.It involves a large number of hyper-parameters, including the network structure, learning rate, and the selection of learners.If using traditional grid search, it will occupy a large amount of computing resources.Therefore, when training the AE, this paper gives empirical fixed values for some hyper-parameters.
(1) As to network architecture, based on the number of input features, the structure of the hidden layer should be as simple as possible.Otherwise, the complex structure will easily cause overfitting.Based on experiences, the initial AE architecture is determined as 13-8-13; (2) As to activation functions, the commonly used nonlinear activation functions are the ReLU function, Sigmoid function, and Tanh function.Compared with the other two functions, the Tanh function has a relatively wide output range, which is more conducive to distinguishing the reduced representation between samples; (3) As to optimizer, the adaptive moment estimation (Adam) proposed by Kingma et al. [31] is used, which retains the advantages of SGD (stochastic gradient descend) and introduces the momentum, so that the convergence speed of the neural network is accelerated and the learning rate can gradually decline with the number of iterations, which helps find a better local optimal solution.The initial learning rate is 0.01; (4) Due to the small sample size in the pre-training stage, regularization should be considered in order to prevent over-fitting.The regularization form shown in Equation ( 4) is adopted, taking λ = 20.
To sum up, the empirical values for parameters are shown in Table 3.The tensorflow 2 AI framework based on Python is used in this paper.Under the given parameters, the loss function changes with epochs, as shown in Figure 9.It can be seen that after 500 epochs, the minimum loss (loss*) on the test set is 0.040.
ReLU function, Sigmoid function, and Tanh function.Compared w functions, the Tanh function has a relatively wide output range, wh ducive to distinguishing the reduced representation between sampl (3) As to optimizer, the adaptive moment estimation (Adam) proposed [31] is used, which retains the advantages of SGD (stochastic gradi introduces the momentum, so that the convergence speed of the n accelerated and the learning rate can gradually decline with the num which helps find a better local optimal solution.The initial learning (4) Due to the small sample size in the pre-training stage, regularizatio sidered in order to prevent over-fitting.The regularization form sh ( 4) is adopted, taking 20 λ = .
To sum up, the empirical values for parameters are shown in Table The tensorflow 2 AI framework based on Python is used in this p given parameters, the loss function changes with epochs, as shown in F seen that after 500 epochs, the minimum loss (loss*) on the test set is 0.04 In order to make the whole network develop towards optimizing the loss function can be modified, which is shown in Equation (23).In the stage of fine-tuning, the MLP network architecture is set to 13-16-32-16.As to the selection of activation function, the neural network regression problem generally chooses the ReLU activation function, which can alleviate the problem of gradient disappearance or explosion in deeper neural networks.Other parameters of the network are consistent with AE.
In order to make the whole network develop towards optimizing the AE network, the loss function can be modified, which is shown in Equation (23).
where L 1 represents the reconstruction loss of AE and L 2 represents the predicted loss of MLP.η 1 and η 2 are balance coefficients, used to balance the weight between reconstruction and prediction losses.The purpose of fine-tuning is to obtain a more refined low-dimensional representation of the sample, so the reconstruction loss should be fully considered.This paper takes η 1 = 0.4 and η 2 = 0.6.Since the pre-training has adjusted the network weight to a reasonable range, the learning rate can be appropriately reduced at the training stage.In this paper, the initial learning rate is adjusted from 0.01 to 0.002, and the change of loss with epoch is shown in Figure 9. Due to the role of pre-training, AE shows low loss at the beginning of training.As the number of iterations increases, the loss firstly decreases and then begins to rise, which means that the model shows overfitting.After tuning, the optimal loss of AE is 0.037, which is less than the loss in Figure 10 (0.004).It can be seen that fine-tuning is effective.
where L1 represents the reconstruction loss of AE and L2 represents the MLP.η1 and η2 are balance coefficients, used to balance the weight betwe and prediction losses.The purpose of fine-tuning is to obtain a more re sional representation of the sample, so the reconstruction loss should be This paper takes η1 = 0.4 and η2 = 0.6.
Since the pre-training has adjusted the network weight to a reasonabl ing rate can be appropriately reduced at the training stage.In this paper, t rate is adjusted from 0.01 to 0.002, and the change of loss with epoch is s Due to the role of pre-training, AE shows low loss at the beginning of trai ber of iterations increases, the loss firstly decreases and then begins to r that the model shows overfitting.After tuning, the optimal loss of AE is 0 than the loss in Figure 10 (0.004).It can be seen that fine-tuning is effectiv Then the encoder of AE is extracted after training, with the weight r is used to reduce the representation of screw preloads and the results a 4, where HD means hidden dimension.

Online Prediction of Assembly Error
After the representation reduction of the dataset, the number of fe from 13 to 8, and then the data is passed to KELM as input.For KELM, th needs to be determined first.Among the three kernel functions, the P function and Gaussian kernel function need the corresponding constant and these values to consider are shown in Table 5.Then the encoder of AE is extracted after training, with the weight retained.This part is used to reduce the representation of screw preloads and the results are shown in Table 4, where HD means hidden dimension.

Online Prediction of Assembly Error
After the representation reduction of the dataset, the number of features is reduced from 13 to 8, and then the data is passed to KELM as input.For KELM, the kernel function needs to be determined first.Among the three kernel functions, the Polynomial kernel function and Gaussian kernel function need the corresponding constant coefficient values, and these values to consider are shown in Table 5.Although three kinds of kernel functions are used here, it is easy to cause the problem of combinatorial explosion when considering hyper-parameters of the kernel function.In order to show the process and effectiveness of the method, this paper only tests 9 kinds of lifting orders, and selects the optimal lifting order from them.
Based on the optional constant values provided in Table 4 and 9 lifting sequences, the training set was first trained and then the prediction performance was tested on the test set.In order to further verify the effectiveness of representation reduction by AE, the experiment also compared the prediction result with the one without representation reduction.The final results are shown in Table 6, where L, G, and P in the table represent Linear kernel, Gaussian kernel, and Polynomial kernel, respectively.It can be seen from Table 6 that the performance of the nine lifting sequences is similar.Taking G-P-L as the lifting sequence can bring the smallest prediction error, and the corresponding optimal parameters are a = 3, b = 3, p = 1, σ = 100.In addition, by comparing the prediction results before and after representation reduction, it can be found that the prediction accuracy after reduction is slightly higher, which shows that the representation reduction of sample data is effective.
The above experimental data is only the prediction result of KELM under the boosting strategy.It is an offline data prediction to determine the parameters of Boosting-OSKELM.In order to verify the online prediction performance of Boosting-OSKELM, a simple sequential addition principle is adopted, as shown in Figure 11.The new sample points generated at a time t + 1 can be directly added to the training set, and incremental learning is carried out according to Equation (20).Finally, the error between the prediction and ground truth will be obtained.
OSKELM.In order to verify the online prediction performance of Boostingsimple sequential addition principle is adopted, as shown in Figure 11.The points generated at a time 1 t + can be directly added to the training set, and learning is carried out according to Equation (20).Finally, the error between th and ground truth will be obtained.For further verifying the online learning ability of Boosting-OSKELM, the artificial neural network (ANN) is selected for comparison.Table 7 shows the comparisons of calculation efficiency between Boosting-OSKELM and ANN, while the time and iteration are the average value under 10 repeated experiments.As the former model relies on matrix transformation rather than gradient descent, the time consumed by which is within 1s (0.85 s), while the latter one accounts for nearly 7.2 s, with an early stop mechanism and converging at the 15th iteration.Also, the average MSE of Boosting-OSKELM and ANN are 0.061 and 0.122.Thus it can be seen that the proposed model is superior to traditional ANN at computing speed, which is more suitable in the field of online predicting.The online prediction results are shown in Figure 12.According to this figure, with the increase of samples, the prediction error shows a downward trend.From the perspective of the online prediction process, due to the randomness of sample distribution, there are some fluctuations in the online prediction process, but the fluctuations of Boosting-OSKELM is smaller than ANN, which shows that Boosting-OSKELM has stronger online learning adaptability.For further verifying the online learning ability of Boosting-OSKELM neural network (ANN) is selected for comparison.Table 7 shows the comp culation efficiency between Boosting-OSKELM and ANN, while the time an the average value under 10 repeated experiments.As the former model re transformation rather than gradient descent, the time consumed by which is s), while the latter one accounts for nearly 7.2 s, with an early stop mechanism ing at the 15th iteration.Also, the average MSE of Boosting-OSKELM and and 0.122.Thus it can be seen that the proposed model is superior to trad computing speed, which is more suitable in the field of online predicting.T diction results are shown in Figure 12.According to this figure, with the incre the prediction error shows a downward trend.From the perspective of the on process, due to the randomness of sample distribution, there are some fluc online prediction process, but the fluctuations of Boosting-OSKELM is sma which shows that Boosting-OSKELM has stronger online learning adaptabil

Conclusions
As to the assembly of the array antenna, due to the production mode, t of the array antenna presents the characteristics of high dimension and which brings difficulties to the prediction of its assembly accuracy.Theref

Conclusions
As to the assembly of the array antenna, due to the production mode, the sample data of the array antenna presents the characteristics of high dimension and small samples, which brings difficulties to the prediction of its assembly accuracy.Therefore, this paper presents a data representation reduction method based on AE with fine-tuning trick and an online prediction method based on Boosting-OSKELM.The experiment results show that the average MSE of Boosting-OSKELM and ANN is 0.061 and 0.12, and the time consumption is 0.85 s and 15 s respectively.After analysis and discussion, the main conclusions are as follows.
(1) The representation reduction by AE can not only remove the redundant information in the original data but also meet the real-time requirements in the digital twin.(2) With the help of multiple kernel functions and ensemble learning, can better adapt to online nonlinear learning problems.Compared with traditional ANN, its generalization performance is relatively stable.Therefore, the proposed method shows potential in other small sample problems.
As to the research in the future, although this paper has taken care of the small sample problem, more data needs to be sampled in order to improve the accuracy and robustness of the model.Also, the implementation of the presented model requires support from hardware and software.Therefore, the communication mode between intelligent prediction algorithms, parallel computing of industrial big data, and efficient storage can be the next research direction.

Figure 1 .
Figure 1.The basic structure of array antenna body.

Figure 1 .
Figure 1.The basic structure of array antenna body.

Figure 3 .
Figure 3.The basic process of fine-tuning.

Figure 3 .
Figure 3.The basic process of fine-tuning.

Figure 8 .
Figure 8. Selected assembly accuracy prediction position: (a) positions in the PCB plate; (b) positions in the backing plate.

Figure 8 .
Figure 8. Selected assembly accuracy prediction position: (a) positions in the PCB plate; (b) positions in the backing plate.

Figure 10 .
Figure 10.AE loss at the tuning stage.

Figure 10 .
Figure 10.AE loss at the tuning stage.

Figure 12 .
Figure 12.Online prediction performance over time.

Figure 12 .
Figure 12.Online prediction performance over time.

Table 3 .
Empirical values of AE parameters.

Table 3 .
Empirical values of AE parameters.

Table 4 .
Representation reduction of screw preloads.

Table 4 .
Representation reduction of screw preloads.

Table 5 .
Optional hyper-parameters of the kernel function.

Table 6 .
Optimal results under nine lifting sequences.

Table 7 .
Average time consumption between two models under 10 repeated experiments.

Table 7 .
Average time consumption between two models under 10 repeated experi