1. Introduction
The piezoelectric actuator (PEA) serves as a typical precision positioning device that uses piezoelectric ceramics as the driving element and a compliant mechanism for motion guidance [
1]. It has been widely employed in micro/nanoscale positioning systems owing to its superior characteristics, such as high resolution, high output force, and rapid response [
2]. However, the inherent hysteretic nonlinearity and low damping vibration characteristics of its mechanical structure severely impair both the positioning accuracy and dynamic response speed of PEAs. In general, under open-loop control, the tracking error induced by the hysteretic nonlinearity of a PEA can reach up to 15% of its full-scale range; this error may even exceed 35% as the frequency of the input signal increases [
3,
4]. Therefore, the development of an accurate mathematical model and a practical control strategy for PEAs is essential to improve their overall performance.
In recent years, neural networks have emerged as a promising approach for hysteresis modeling. Their superior approximation capabilities enable them to effectively capture the intricate dynamic behaviors of nonlinear systems. Long short-term memory (LSTM) networks exhibit the ability to model hysteresis across a wide frequency range, as they leverage the networks’ long-term memory characteristics [
5]. A sequence-to-sequence LSTM (LSTMseq2seq) framework was developed by [
6] to model PEA systems, which effectively alleviates the common issues of gradient explosion and gradient vanishing associated with recurrent neural networks (RNNs). Additionally, an inversion model based on RNNs was proposed by [
7], denoted as RNNinv, for compensating nonlinearities in PEAs. Although high modeling accuracy is desirable for neural network-based frameworks, their generalization ability is crucial for the accurate modeling and control of PEAs under diverse operating conditions. Current neural network-based approaches typically rely on two key assumptions: (1) training and testing data are derived from the same feature space and follow an identical probability distribution; (2) sufficient data are available to train an effective model [
8]. However, these assumptions do not always hold in practical scenarios, creating an urgent need for methods to address challenges related to data distribution discrepancies.
Challenges associated with distribution discrepancies have spurred the development of transfer learning, a methodology that transfers knowledge from a well-trained domain (source domain) to another domain (target domain). Notably, transfer learning does not strictly require the data of the source and target domains to match an identical distribution [
9]. For instance, a fine-tuning deep transfer learning method based on LSTM networks was proposed by [
10] to address the problem of insufficient training data for measurements from new air quality monitoring sites. A domain-adversarial neural network (DANN) was employed to predict the remaining service life of aero engines [
11]. Most existing transfer learning methods only use a single-source domain; accordingly, multi-source transfer learning (MSTL) has been proposed to effectively leverage knowledge from multiple domains [
12,
13]. A set-based boosting technique enhanced the performance of each source task while assigning higher weights to tasks with stronger positive transferability [
14]. An ensemble learning and tri-transfer model was introduced by [
15] to develop a multi-source ensemble transfer learning (METL) approach for the initial diagnosis of Alzheimer’s disease. These studies demonstrate that multi-source transfer learning outperforms single-source transfer learning approaches. However, while using multi-source domains offers significant advantages, it also poses the challenge of identifying and selecting valuable knowledge from these domains. A transfer learning framework was designed as source-selection-free transfer learning (SSFTL), which utilizes tags from the delicious website to construct a semantic similarity relationship between the source and target domains via Laplace feature mapping, thereby enabling automatic source domain selection [
16]. In the context of transfer learning, several similarity metrics are available, including proxy A-distance (PAD) [
17,
18], maximum mean discrepancy (MMD) [
19], soft dynamic time warping (Soft-DTW) [
20], and CORAL [
21]. These metrics, often paired with classifiers, assess the alignment between the source and target domains by analyzing classifier error. In behavior recognition, the similarity between the source and target domains can be assessed by integrating the similarity of sensor data from body parts with the semantic correlations of the corresponding body parts [
22].
To the best of our knowledge, relatively few studies have focused on the development of a selective multi-source ensemble transfer learning (SMETL) algorithm and its application to PEAs. Existing research on transfer learning, ensemble learning, and their integration primarily emphasizes classification tasks and adversarial domain adaptation, while little attention has been paid to parameter sharing (including pre-training and fine-tuning) and multi-source ensemble learning methods specifically designed for PEAs. In this study, a framework SMETL is proposed. The key technologies of the proposed SMETL framework are summarized as below:
(1) The potential of transfer learning lies in its ability to adapt across different domains. As demonstrated by methods such as MSTL and SSFTL, multi-source approaches offer promising ways to improve performance. However, despite the availability of multiple datasets, the relevance between these source datasets and the target dataset is often ambiguous. In this context, PAD is employed as a similarity metric between the source and target domains. By analyzing the correlation between PAD values and the evaluation metrics of single-source transfer learning models, it is demonstrated that PAD can effectively quantify the similarity between actuators.
(2) Since blind knowledge transfer from target-irrelevant source datasets often results in deteriorated displacement control performance, the SMETL framework innovatively adopts a greedy ensemble transfer learning strategy. Specifically, this strategy constructs the ensemble by first sorting all candidate transfer learning models in ascending order of the PAD value between their respective source domains and the target domain, then sequentially adding each candidate model and only retaining those that improve the ensemble’s performance on the target domain validation set. Evaluation and analysis results demonstrate that this strategy not only enhances performance compared to individual single-source transfer learning models but also effectively avoids negative transfer.
This paper is organized as follows:
Section 2 presents the research framework, model architecture, and methodologies used in this study.
Section 3 elaborates on the relationship between the performance of single-source transfer learning models and PAD values, and verifies the necessity of multi-source transfer learning.
Section 4 investigates the influences of source domain data volume, target domain data volume, and ensemble strategy on the multi-source transfer learning model; additionally, a comparison with representative transfer learning frameworks is conducted to validate the effectiveness of the proposed SMETL framework.
Section 5 concludes this paper and outlines potential directions for future research.
2. Methodology
As illustrated in
Figure 1, the SMETL framework consists of two core steps, which are detailed as follows:
Step 1: Source domain selection.
Given N source domains (denoted as ) and one target domain (denoted as ), the PAD algorithm is firstly employed to calculate the data distribution distance between each source domain and the target domain. Subsequently, the strong linear correlation between PAD values and transfer learning-based feedforward control performance is verified. For each source domain selected based on PAD screening, a pre-trained GRU-CNN model is constructed using the source domain’s dataset. These pre-trained models are then fine-tuned on the target domain’s data to adapt their parameters to the target domain’s distribution characteristics.
Step 2: Multi-source greedy ensemble transfer learning.
After Step 1, a greedy ensemble transfer learning strategy is used to develop a hybrid model tailored to the target domain. The primary objective of this step is to enhance the transferability of knowledge from multi-source domains to the target domain, while simultaneously mitigating the risk of negative transfer that may arise from individual source domains with low similarity to the target. To validate the effectiveness of SMETL, the control performance of the proposed SMETL model is evaluated and compared with that of each individual single-source transfer learning model.
Figure 1.
Schematic diagram of selective multi-source ensemble transfer learning.
Figure 1.
Schematic diagram of selective multi-source ensemble transfer learning.
2.1. Similarity Between Domains
The higher the similarity between a source domain and the target domain, the more effectively knowledge from the source domain can be transferred to the target domain, thereby avoiding the risk of negative transfer [
23]. Given the availability of multi-source domains, identifying a reliable metric to quantify inter-domain similarity is essential for selecting source domains with high transfer potential. In this study, PAD is employed to calculate the distribution distance between each source domain and the target domain, enabling the quantification of similarity between these two domains. Specifically, a smaller PAD value indicates a smaller distribution discrepancy between the source and target domains, and thus a higher degree of inter-domain similarity.
Given a source domain
and a target domain
, let a labeled source sample
be drawn from
, and a target sample
be drawn from
, where
denotes the input space and
represents the set of two possible labels. Specifically, all instances in the source sample
are assigned the label 0, while all instances in the target sample
are assigned the label 1. For a symmetric hypothesis class
, the empirical
-divergence between
and
is defined as
where
denotes the total number of samples, and
represents the indicator function, which takes a value of 1 if the predicate
holds true and 0 otherwise.
The risk of the classifier trained on the new data set approximates the “min” component of Equation (1). Given a classification error
associated with the task of discriminating between source and target examples, the Proxy A-Distance
can be defined as
2.2. GRU-CNN
As shown in
Figure 2, the GRU-CNN framework comprises an input layer, one gated recurrent unit (GRU) layer, two convolutional neural network (CNN) layers, three dense layers and an output layer. The input layer accepts sequences in a three-dimensional format, typically denoted as batch size × sequence length × feature dimension. These sequences are then processed by the GRU layer, which captures temporal correlations within the sequences. Specifically, the GRU layer is designed to characterize the long-term dependence and nonlinear memory behaviors of hysteresis, while alleviating the issues of gradient vanishing or gradient exploding when handling long sequences. Subsequently, the multi-layer CNN employs sliding convolution kernels to extract local trend features from the entire output of the GRU layer. The CNN layer enhances the model’s ability to capture local patterns, thereby contributing to improved control accuracy. The framework concludes with three dense layers: the first incorporates a rectified linear unit (ReLU) activation function to prevent vanishing gradient, and the third is used for data size reshaping. When the batch size and feature dimension are set to 1, given an input sequence
and an output sequence
, the GRU-CNN model aims to establish a mapping from the input sequence to the output sequences. The mean square error (MSE) is selected as objective function to minimize error between the true output sequence
and the predicted output sequence
.
For feedforward control, as depicted in
Figure 3, the controlled variable corresponds to the output displacement of the PEA, and the manipulated variable refers to the input voltage applied to the PEA. The reference displacement of the PEA is first input to the GRU-CNN inverse model. This inverse model computes the corresponding driving voltage required to achieve the reference displacement, thereby compensating for the hysteretic nonlinearity and low damping vibration characteristics of the PEA. Aligned with this control logic, the GRU-CNN model takes the PEA’s measure displacement information as input and outputs the corresponding driving control voltage.
2.3. Ensemble Transfer Learning
Transfer learning schemes are designed to leverage pre-existing empirical knowledge to develop new models, which can be applied to either similar or entirely distinct PEAs. As documented in numerous studies, the advantages of transfer learning include faster convergence speed, enhanced generalization capability, improved control accuracy, and increased robustness, with the latter two advantages being particularly prominent in scenarios characterized by data scarcity [
24]. Parameter sharing [
25] is the most widely used model-based transfer learning approach. This method comprises two core steps: pre-training and fine-tuning. In the pre-training step, a source model is obtained by training a neural network on source domain data. In the fine-tuning step, only the final several layers are fine-tuned using a smaller volume of target domain data to generate the target model. By leveraging the pre-trained source model, the target model avoids the need for training from scratch, thereby reducing the computational cost and time required for model training.
Ensemble learning focuses on integrating multiple weak learners into a more powerful ensemble learner, rather than pursuing a single sophisticated model to achieve optimal performance [
26]. However, an excessively large number of models in an ensemble significantly increases computational overhead [
27]. Thus, minimizing this computational burden while ensuring predictive performance is crucial. An ensemble may include models with differing predictive performance levels, and integrating these models may fail to deliver the desired performance gains. Furthermore, an excessive number of high-performance models in an ensemble can lead to overfitting. To address these challenges, a greedy ensemble transfer learning strategy is proposed. This strategy constructs the ensemble by sequentially adding each candidate transfer learning model: a model is retained only if it improves the ensemble’s performance on the target domain validation set. Prior to initiating this process, all candidate models are sorted in ascending order of the PAD value between their corresponding source domains and the target domain. This ranking logic guarantees that the ensemble will not perform worse than the best individual transfer learning model on the validation set.
Algorithm 1 presents the pseudo-code for the proposed greedy ensemble transfer learning strategy in detail. Each GRU-CNN-based single-source transfer learning model processes the target domain input independently and generates individual predictive outputs. Given that the selected transfer learning models exhibit comparable performance on the target domain validation set, a simple arithmetic averaging method is adopted to compute the mean of all individual outputs. This averaged value is then used as the final output of the SMETL framework.
| Algorithm 1. Greedy Ensemble Transfer Learning |
| Input: Potential source domain sets (each sorted by in ascending order) |
| Target domain is divided into the training set and validation set |
| GRU-CNN model architecture |
| Output: Ensemble prediction function of selected transfer models |
|
|
| for to do |
| |
| |
| |
| |
| |
| if |
| |
| |
| return |
3. Experiments and Analysis
This section demonstrates the effectiveness of the proposed SMETL framework. First, a GRU-CNN is employed to model the mapping relationship between the input displacement and output driving voltage of PEAs. Next, single-source transfer learning models are constructed for each candidate source domain. The PAD algorithm is then used to calculate the data distribution distance between each source domain and the target domain. By analyzing the correlation between the control performance of these single-source transfer learning models and their corresponding PAD values, the rationality and accuracy of using PAD to quantify inter-actuator data similarity are verified. Finally, SMETL is implemented to integrate valid knowledge from the screened high-similarity source domains, yielding the final SMETL-based multi-source transfer learning model. Notably, SMETL exhibits flexibility in adapting to an arbitrary number of candidate source domains. For the case study in this work, three candidate source domains are selected to form the source domain set.
3.1. Data Acquisition
To address the integration of distributions across multiple source domains and validate the effectiveness of the proposed SMETL framework, three source domains and four target domains were utilized. As depicted in
Figure 4, two types of PEAs were employed in the source domain, differentiated by the length of their flexible hinges. Source domain #1, source domain #2, and target domain #1 were constructed using data from PEAs #1, #2, and #4, respectively. These three actuators share the same mechanical structure but originate from different production batches. Source domain #3 and target domain #2 were constructed using data from PEAs #3 and #5 respectively. These actuators also share an identical mechanical structure, which is distinct from that of PEAs #1, #2, and #4. For the PEA shown in
Figure 4c, the longitudinal voltage and displacement are designated as data for target domain #3, and the lateral voltage and displacement are designated as data for target domain #4.
Figure 5 presents the Gaussian kernel density estimates of all datasets corresponding to the PEAs. Even under the same operating conditions, the Gaussian kernel density estimates of actuators with the same mechanical structure but different production batches exhibit discrepancies. This phenomenon is attributed to manufacturing factors such as assembly errors and preload variations. The preload value of the PEAs is determined based on the recommended range 400–600 N. Notably, the kernel density estimates of all datasets exhibit roughly similar distributions. This similarity satisfies the core prerequisite for transfer learning.
Each source domain contains 15,000 samples, while each target domain contains 1000 samples, as shown in
Table 1. All data samples were split into 10 folds for cross-validation. The validation subsets were employed to tune the near-optimal hyperparameters of each transfer learning model. Furthermore, the generalization ability of the models was evaluated individually with an additional 1200 test samples per target domain. These test samples were excluded from both model training and fine-tuning, thereby ensuring an unbiased evaluation of model performance.
All samples were collected from the piezoelectric actuator test platform, as illustrated in
Figure 6. Each sample comprises a continuous voltage sequence and a corresponding displacement sequence, each spanning 1000 time steps. Each PEA was equipped with a laser displacement sensor LK-H020 (Keyence Corporation, Osaka, Japan), featuring a measurement range of ±3 mm and a measurement accuracy of 0.02 μm (±0.02% F.S.). The sensor was paired with a controller LK-G5001 (Keyence Corporation, Osaka, Japan) that provides analog output signals corresponding to displacement, with a voltage range of 0 V to 10 V. To drive the PEAs, a power amplifier module with a fixed gain of 15 was used. Input voltage curves were generated using a MATLAB (v2024a) program, with amplitudes ranging from 0 V to +10 V; accordingly, the power amplifier module’s output voltage ranged from 0 V to +150 V. All input and output signals were synchronously acquired using a 16-bit data acquisition card NI USB-6218 (National Instruments, Austin, TX, USA), with the sampling frequency set to 10 kHz.
Furthermore, all experimental samples underwent normalization to standardize the scales of different variables, using the min-max normalization method. Its mathematical expression is given by
where
denotes the original variable,
and
are the maximum and minimum values of
, and
denotes the normalized variable. The min-max method scales the original variable to the range [0, 1], making it particularly suitable for experimental scenarios where variables do not follow a normal distribution.
3.2. Signal-Source Transfer Learning
The effectiveness of transfer learning is critically dependent on the degree of similarity between the source and the target domains. Only when shared knowledge exists between them can transfer learning be performed effectively. Conversely, if the similarity is low, knowledge acquired from the source domain may have a detrimental impact on the target domain, resulting in negative transfer. Therefore, it is essential to ensure data similarity between the source and target domains and to identify transferable components via an appropriate approach.
All models were trained on a computer equipped with Intel(R) Core(TM) i9-12900K CPU @3.19 GHz (Intel Corporation, Santa Clara, CA, USA) and NVIDIA GTX 3080Ti GPU (NVIDIA Corporation, Santa Clara, CA, USA). The Adamax optimizer [
28] was employed to minimize the objective function during training, and all experiments were implemented using Python 3.6.13 with the PyTorch 1.10.2 + CUDA 11.3.1. A fixed random seed of 1024 was adopted to ensure consistent initialization of model parameters across multiple training sessions. For each pre-trained model from the source domains, its key hyperparameters, including the learning rate, the number of neural units in the GRU and dense layers, the kernel size of CNN layers, the batch size, and the dropout are optimized using a Bayesian optimization algorithm, with the MSE on the validation set adopted as the optimization objective function. A dropout rate of 0.1 was incorporated to mitigate overfitting by randomly setting a subset of model parameters to zero during training. The number of training epochs was set to 100, the batch size was configured to 100, and an initial learning rate of 0.001 was adopted. Detailed hyperparameter settings for the GRU-CNN structure are provided in
Table 2.
Each pre-trained model was paired with the target domains to construct single-source transfer learning models. In this study, for target domain #1 and #2, the pre-trained model was fine-tuned while fully retaining the structure and parameters of its GRU and CNN layers to mitigate overfitting. The existing parameters of these layers served as initial values for training, whereas the parameters in the dense layers were randomly initialized. For target domain #3 and #4, all layer parameters of the pre-trained model were directly adopted as initialization parameters for the fine-tuning process. The learning rate was adjusted automatically based on the model’s performance on the validation set. If the validation performance did not improve within the set 10 epochs, the learning rate was reduced by a factor of 0.5. For all fine-tuning processes, the initial learning rate was set to 5 × 10−4, with other parameters remaining unchanged, and an early stopping mechanism was employed to further prevent overfitting.
Table 3 presents the test results of single-source transfer learning models across the four target domains. Models trained exclusively on the respective target domain datasets are denoted as M.1, M.2, M.3, and M.4. For M.1–M.4, the training epochs were set to 100, with an initial learning rate of 0.001. Notably, all single-source transfer learning models outperformed the target-domain-only trained models. The relatively inferior displacement control performance of M.1 stems from the scarcity of training data, as only 1000 samples were available. M.2 exhibited a similar performance constraint, which underscores the critical role of data volume in training GRU-CNN models for PEA displacement control. Target domains #3 and #4 exhibited domain shift relative to #1 and #2, thereby degrading the performance of all single-source transfer learning models. Nevertheless, M.3 and M.4 still achieved improvements over the target-domain-only trained models. These results validate that transfer learning is an effective strategy for addressing the challenges associated with limited training data and mismatched data distributions for target PEAs.
Table 4 presents the PAD calculation results. A smaller PAD value indicates a higher similarity between the two domain distributions, whereas a larger value signifies a greater discrepancy between the domains. As evident from
Table 3 and
Table 4, the control performance of single-source transfer learning models gradually degrades as the similarity between the source and target domains decreases. For target domain #4, the PAD values corresponding to source domain #2 and source domain #3 are both 1.985. However, the single-source model built with source domain #2 and target domain #4 achieves an MAE of 0.245 and an RMSE of 0.361, outperforming the model constructed with source domain #3 and target domain #4. As noted in existing literature, similarity metrics only serve as references for source domain selection, and their results are not always accurate. Nevertheless, after excluding individual outliers, a strong linear correlation between PAD values and control performance is verified. Thus, it is reasonable to employ PAD to quantify the similarity between source and target domains for PEA control tasks.
3.3. Multi-Source Ensemble Learning
To verify the superior performance of multi-source transfer learning models compared with single-source transfer learning models, evaluation metrics were extracted from the results presented in
Table 5. For target domain #1, performance varies across different source combinations. Among all single-source transfer learning models, the model utilizing source #1 achieves the optimal performance, followed by that using source #2, while the model based on source #3 exhibits the lowest performance. For two-source models, the combination of source #1 and #2 achieves the best performance, outperforming the combinations of source #1 and #3 as well as source #2 and #3. Notably, the three-source model (source #1, #2, and #3) exhibits slightly degraded performance compared to the optimal two-source model, indicating that simply increasing the number of source domains does not always enhance performance. In summary, multi-source ensemble learning can effectively leverage data from compatible source domains to enhance the learning process for target PEA control tasks corresponding to target domain #1.
A similar yet distinct trend is observed across target domain #2, #3, and #4. For target domain #2, among all single-source models, source #3 yields the optimal performance, outperforming source #1 and source #2. For two-source models, the combination of source #1 and #3 achieves the best performance, while the three-source model exhibits marginally inferior performance. This phenomenon underscores that the similarity between the added source domains and the target domain is a critical factor for model performance, rather than merely the number of source domains.
In multi-source transfer learning, exhaustive search is a conventional method for identifying optimal source domain combinations, as it evaluates all possible combinations of candidate source domains. For 3 candidate source domains, exhaustive search requires testing a total of 7 combinations. In contrast, the PAD-ranking-guided source domain selection strategy follows a similarity-based descending order, with source #1 being the most similar to the target domain, followed by source #2 and then source #3, and only requires testing a maximum of 3 combinations. Notably, for all target domains #1–#4, this PAD-guided strategy identified the same optimal source domain combinations as exhaustive search. Specifically, source #1 and #2 are optimal for target domains #1, #3, and #4, and source #1 and #3 are optimal for target domain #2. These findings validate that the PAD-ranking-guided strategy achieves an optimal trade-off between computational efficiency and model performance.