1. Introduction
Wireless power transfer (WPT) technology has emerged as a transformative solution to eliminate the constraints of physical connectors, revolutionizing power delivery across industries by offering unparalleled convenience, safety, and flexibility. Its applications span electric transportation [
1,
2], portable electronics [
3,
4], medical implants [
5,
6,
7], etc., where reliable and contactless power transfer is critical.
Constant-current (CC) and constant-voltage (CV) charging modes [
8,
9] of WPT systems are widely used in plenty of scenarios. Previous studies have achieved load-independent CC or CV output with input zero-phase angle (ZPA) by incorporating compensation networks operating at natural resonance frequencies, such as series–series (SS), parallel–parallel (PP), series–parallel (SP), parallel–series (PS) [
10], and higher-order compensation networks, such as LCC-S, LCC–LCC, LCL-S, and S-LCC [
11,
12].
However, the properties of such topologies are derived from the fundamental harmonic approximation (FHA) method, which neglects the higher-order harmonic components of the square-wave voltage generated by DC/AC inverters. Thus, the output DC current in CC mode exhibits deviation under varying load conditions. The effective resistance of the battery increases as the charging process progresses. When the load resistance rises, the output current in CC mode will attenuate due to heavy load, as shown in
Figure 1. This current drop entails consequences beyond prolonged charging time. Since the accuracy of the Coulomb Counting Method (
) heavily relies on a precise current profile [
13,
14], this attenuation will introduce significant State of Charge (SOC) estimation errors in the Battery Management System (BMS) if not taken into account. And it triggers premature CC-to-CV mode transitions, preventing the batteries from being fully charged [
15]. Prolonged exposure to suboptimal charging conditions accelerates battery degradation [
16].
Moreover, the change in the mutual inductance caused by misalignment [
17,
18] also leads to the deviation of the output. Numerous effective methods have been proposed to tackle this problem by optimizing the coupler structure [
19,
20,
21] and designing topologies with high tolerance to misalignment [
22,
23]. However, the methods mentioned above can only mitigate the effects of load variation and misalignment to a certain extent, owing to the lack of control.
Therefore, implementing basic closed-loop control is a widely adopted strategy to realize stable output for WPT systems. As shown in
Figure 2, a typical WPT system is composed of a DC/AC inverter, a coupler, an AC/DC rectifier and a compensation network. To maintain the DC output at a preset value, a DC/DC converter is cascaded with the inverter, such as Boost [
24] and Buck–Boost [
25]. An analog-to-digital converter (ADC) sampling module is placed at the output stage to acquire the feedback signal for the closed-loop control, which adjusts the conversion ratio of the DC/DC module. Some classical and effective control methods, including Proportional-Integral (PI) control, Model Predictive Control (MPC) and Sliding Mode Control (SMC), are widely adopted.
However, these methods require additional detection devices and a communication channel to transfer data between the transmitter side (TX) and the receiver side (RX), which increases cost and complexity of both hardware and software. Moreover, the variation of load resistance and mutual inductance can hardly be measured by detection devices at the RX. Consequently, considerable research efforts have been directed towards WPT control strategies based on information only from the TX [
26,
27,
28].
Deep learning (DL) is an emerging technology designed for nonlinear complex issues, which has achieved outstanding success in computer vision, natural language processing and advanced control theories, etc. Reference [
29] proposed a machine learning method using random forest and AdaBoost to estimate load resistance and coupling coefficient from transmitter-side measurements. However, this approach relies on manually extracted harmonic features via fast Fourier transform (FFT), which introduces additional computational overhead. While DL can bypass manual FFT, existing network architectures remain insufficient for analyzing harmonic-rich WPT waveforms. Image-based CNN [
30] relies on external hardware and inherently neglects electrical dynamics. Furthermore, although LSTM [
31] and TCN [
32] process temporal data, their structures are misaligned with WPT waveform analysis. LSTM focuses on sequential temporal dependencies. TCN prioritizes long-term dependencies. Both lack parallel multi-scale receptive fields. Thus, they fail to concurrently extract harmonic-rich features. Moreover, few prior works simultaneously predict the load resistance and mutual inductance using only TX information while achieving output regulation.
Moreover, the data used in the aforementioned studies to train and validate DL models is mostly generated by offline simulation tools such as MATLAB/Simulink, LTspice or PLECS, etc. Such data has an inherent discrepancy from physical scenarios due to component parameter tolerances, parasitic effects, and switching dynamics [
33,
34]. This limitation reduces the prediction accuracy under practical operating conditions. Transfer learning (TL) [
35] is a machine learning paradigm where knowledge acquired from solving the source domain is leveraged to improve learning efficiency and performance on a different but related target domain. A growing number of studies have utilized this algorithm in power converters [
36,
37], while its application in WPT parameter prediction and control remains relatively scarce.
To tackle these problems, this work proposes a Multi-Scale Parallel Convolutional (MSPC) deep learning model to achieve joint prediction of load resistance and mutual inductance, as well as deviation factor of the DC output and regulation coefficient of the DC input voltage in WPT systems utilizing the current waveform from only the TX. The MSPC model adopts a parallel structure of convolutional kernels with various dilation rates, which can capture multi-scale features from AC waveforms. This model is tailored to the harmonic-rich waveforms produced by power semiconductor switching. And TL is applied to enhance the prediction accuracy by minimizing the discrepancy between simulation data and physical data. Specifically, the MSPC model is first pre-trained using simulation datasets. The subsequent connection layers of the model are then updated via fine-tuning. With characteristic prediction finished, the regulation coefficient is used to optimize the output of the WPT system by regulating the input DC voltage. An SS-compensated experimental prototype is built to demonstrate that our method shows advantages and potential in joint characteristic prediction and output optimization using information from only the TX of WPT systems.
2. Analysis of SS-Compensated WPT System
The proposed data-driven method is designed and tested for an example of an SS-compensated WPT system. The schematic of the whole system is given in
Figure 3, where
and
are the self-inductances of the coupler and
M is the mutual inductance between two coils.
and
represent the compensation capacitors of the TX and RX, respectively. The full-bridge inverter consists of
–
, converting a DC input voltage
to an AC input voltage
with an operation frequency of angular frequency
. And
refers to the current flowing through the capacitor
. A full-bridge rectifier consists of
–
and the filtering capacitor
. It generates the DC output current
flowing through the load resistance
.
The coupler can be modeled as a loosely coupled transformer. The equivalent circuit of the SS compensated WPT system is shown in
Figure 4. To ensure input ZPA and load-independent ZPA, the values of
and
are subject to
where
is the resonant frequency of the system. The SS compensation topology is able to achieve CC output operating at
, which is the frequency of the fundamental wave of
. Based on the FHA method,
is given as
where
D is the duty cycle of the PWM wave. And the output current
is a load-independent expression given as
However, Equation (
3) is derived under the condition that the higher-order harmonics of
are neglected, whose standard notation should be given as
Therefore, the AC input of the subsequent stage can be viewed as a parallel combination of a fundamental voltage source and higher-order harmonic voltage sources. As shown in
Figure 4, the input current
of the rectifier can be expressed as
The higher-order harmonic components are not at resonance. Therefore, the output current consists of superimposed fundamental and harmonic components following rectification, which exhibits attenuation under heavy load conditions. The content of harmonic components can be calculated by
where
and
denote the amplitude of the n-th harmonic and fundamental components of
, respectively. This work simulates the third and fifth harmonic contents of
in the SS-compensated WPT system using MATLAB/Simulink 2024a. The simulation parameters are based on the design values listed in
Table 1, with
adjusted to 45 nF to ensure zero-voltage switching (ZVS). The variations in
and
M are set from 1 to 30 Ω and 3 to 7 μH, with steps of 1 Ω and 0.1 μH, respectively. The contents of the third and fifth harmonics are illustrated in
Figure 5. The result indicates that the harmonic content of
decreases as the load resistance and mutual inductance increase.
The majority of research treats the rectifier as an equivalent pure resistance, whose value equals
. In practice, the input impedance of the rectifier is inductive due to the nonlinearity of the diodes [
38,
39]. This occurs because the conduction of diodes introduces a phase shift, known as the rectifier angle
between the fundamental voltage and current. Therefore, the input impedance
can be modeled as an effective resistance
and an effective inductance
in series, whose value can be expressed as
As shown in
Figure 3,
and
represent the input voltage and current of the rectifier. Assuming the WPT system operates in steady state, the output voltage
remains constant. Therefore, assuming
D equals 1,
is a square wave with an amplitude of
, which can be expressed as
where
is the switching period and
is the forward voltage drop of each diode. Based on the FHA method, the fundamental component
of
can be expressed as
As shown in
Figure 6,
is the rectified current of
. And
is the fundamental component of
, which can be calculated by
where
is the amplitude of
, which can then be given as
Hence, by substituting Equations (
9) and (
11), the phasor
can be expressed as
By substituting Equation (
12),
can be expressed as
By substituting Equation (
12),
can be expressed as
Hence, the effective circuit of the SS-compensated WPT system can be modeled as shown in
Figure 7.
Under actual operating conditions, it is almost impossible for an SS-compensated WPT system to achieve strictly load-independent CC output. Both simulation and experimental studies confirm that the actual output current decreases with increasing load resistance .
3. MSPC Model and Transfer Learning Framework
To achieve characteristic prediction via information from the TX, as well as optimization of the decayed output current
, this work proposes an MSPC model assisted by TL. To precisely quantify the extent of output deviation and input regulation, this paper defines two scaling factors that serve as the prediction target of the deep learning model. Firstly, we define the output deviation factor of
as
where
and
represent the output current under actual and theoretical conditions derived from the FHA method, respectively. Then, we define the regulation coefficient of
as
where
denotes the regulated DC input voltage required to maintain the CC output, and
denotes the initial DC input voltage. By scaling the input voltage by a factor of
, the output current can be calibrated to approach its theoretical value. To achieve the joint prediction of
,
M,
and
, the data-driven method is discussed in the following sections.
3.1. Multi-Scale Parallel Convolutional Model
In this paper, the MSPC model is proposed to jointly predict load resistance
, mutual inductance
M, deviation factor
and regulation coefficient
from the transmitter-side current waveform
. As illustrated in
Figure 8, the proposed network employs a parallel multi-branch architecture where each branch adopts distinct dilation rates to capture dynamic information across different temporal scales. The model consists of two core modules: a Multi-Scale Feature Extractor and a Global Regressor.
3.1.1. Multi-Scale Feature Extractor
The Multi-Scale Feature Extractor serves as the core module of the MSPC model, utilizing one-dimensional dilated convolutions to capture multi-scale temporal dynamics in power electronics waveforms. For a given input sequence
, where
B denotes the batch size and
L denotes the sequence length, the dilated convolution operation at position
t with dilation rate
d is defined as
where
is the convolution kernel of size
k and
d controls the spacing between elements in the input sequence. This formulation enables the network to expand its receptive field exponentially without increasing the number of parameters or sacrificing resolution.
As shown in
Figure 8, the extractor adopts a parallel three-branch structure. Each branch comprises two stacked dilated convolutional layers with distinct dilation rates. As discussed in
Section 2,
contains superimposed fundamental and higher-order harmonic components, whose contents vary significantly with
and
M. The dilation rates are designed to cover the temporal scales corresponding to these components. The high-frequency branch employs dilation rates
and
, capturing rapid transients associated with switching events. The mid-frequency branch utilizes dilation rates
and
, extracting intermediate-scale features related to envelope variations. The low-frequency branch applies dilation rates
and
, capturing long-term trends and gradual changes in the waveform.
For the
j-th branch, the feature extraction process can be expressed as
where
and
denote the convolution kernels of the first and second layers in branch
j, respectively, while
and
are their corresponding dilation rates. The function
represents the ReLU activation function. Each branch outputs a feature map
, with 8 representing the number of output channels. WPT waveforms exhibit structured and sparse frequency-domain properties, making eight channels sufficient to encode essential harmonic information. Furthermore, this compact dimension acts as implicit regularization to prevent overfitting given the limited training data.
The multi-scale features from all three branches are then concatenated along the channel dimension:
where
represents the concatenated feature map, while
,
, and
denote the outputs of the high-frequency, mid-frequency, and low-frequency branches, respectively. A pointwise convolution (kernel size
) is subsequently applied to fuse the multi-scale features:
where
is the fusion kernel, and
denotes standard convolution with stride 1. This design enables the model to simultaneously capture high-frequency transients from switching events, mid-frequency variations from control dynamics, and low-frequency power trends.
3.1.2. Global Regressor
The Global Regressor transforms the multi-scale features into task-specific predictions. First, an adaptive global average pooling layer aggregates the temporal dimension:
This operation compresses the temporal information into a fixed-size vector while preserving the channel-wise characteristics. The pooled features are then passed through a fully connected layer:
where
and
are learnable parameters and
denotes the number of prediction targets. The final output vector is given as
To ensure fair and balanced learning across all tasks without bias, the loss weights for the four output variables (
,
M,
and
) are kept equal. The overall MSPC model thus achieves joint prediction by leveraging multi-scale temporal features extracted from the TX current waveform, enabling accurate characterization of system states under varying load and coupling conditions. The detailed architecture and parameters of the MSPC model are shown in
Appendix A,
Table A1.
3.2. Transfer Learning
Transfer learning is a machine learning paradigm that aims to transfer knowledge learned from one task (source domain) to a different but related task (target domain) to improve learning efficiency and performance. The core concept of TL can be formalized as
where
is the optimal model for the target task;
represents the target domain training data;
denotes the knowledge extracted from the source domain
;
represents the transfer learning algorithm that integrates target data with source knowledge. In this work, the source domain
consists of abundant simulation data. The target domain
contains limited experimentally detected waveforms of
. A fine-tuning strategy is employed to bridge the gap between simulated and physical scenarios. As shown in
Figure 9, the MSPC model is first pre-trained on
to extract general waveform features. The model parameters are divided into two parts based on their functions. The Multi-Scale Feature Extractor parameters
include the multi-scale parallel convolutional layers. The Global Regressor parameters
include the global averaging and fully connected layers. The pre-training stage on
solves
where
is the loss function on the source domain.
are the pre-trained parameters. In this work, the Mean Squared Error (MSE) is adopted as the loss function; thus, the above equation can be formulated in detail as
where
is the number of samples in the source domain dataset
,
denotes the ground-truth vector for the
i-th sample as defined in Equation (
26),
represents the corresponding predicted vector, and
denotes the squared L2 norm.
The model then adapts to the target domain using the knowledge
. The Multi-Scale Feature Extractor
is frozen to preserve its general feature extraction capability. Only the Global Regressor parameters are fine-tuned with the limited target data. This fine-tuning process is formulated as
where
is the loss function on the target domain.
represents the optimized regressor parameters adapted to the real-world data distribution. Similarly,
shares the same MSE formulation as
, calculated over
.
The final model for the target task is
. It combines a frozen feature extractor with a fine-tuned regressor. Pre-trained on abundant simulation data, the Multi-Scale Feature Extractor captures the parameter-dependent harmonic characteristics with respect to
and
M. The discrepancy of
between
and
mainly lies in amplitude and initial phases, while the variation laws of
in distinct scenarios remain constant. Therefore, only the Global Regressor requires adjustment to compensate for these physical offsets. This approach mitigates the discrepancy between
and
without requiring extensive experimental data. The overall framework of the MSPC model is illustrated in
Figure 9.
3.3. Data Augmentation
All fine-tuning data from
is derived from actual measurements taken on each WPT prototype. When the ranges of
and
M variations are relatively wide, obtaining sufficiently valid
data for fine-tuning still requires a substantial amount of work. To reduce the workload of collecting experimental data, data augmentation is applied to increase the diversity of data from
. Although these methods slightly alter the original information in the target domain data, they simulate the inevitable errors encountered in real-world conditions. This enhances the practical applicability of this data-driven approach. The specific data augmentation methods deployed are phase shifting, frequency distortion and amplitude scaling.
Figure 9 shows the framework of TL assisted by data augmentation. Details and mechanisms are as follows.
To begin with, since the input data is obtained using sliding windows, the initial phase of the current waveform is typically random. Phase shifting increases the amount of valid data by copying the waveform starting from different phase points in the range of . Moreover, the PWM signal driving inverter transistors is generated by a digital controller like DSP or FPGA. There is a slight, unavoidable discrepancy between the actual and preset resonance frequency. Therefore, data points are randomly added to or removed from the arrays to simulate a deviation of approximately in frequency. Finally, since the detection of the waveform has inevitable minor errors in amplitude, the amplitude of waveforms in is randomly scaled up or down by 0.01–0.1 times.
4. Online Validation of the Data-Driven Method
4.1. Benchmark Models
In this work, we compare four benchmark DL models with the proposed MSPC model to evaluate its predictive performance. The benchmark models are: Convolutional Neural Network (CNN), Temporal Convolutional Network (TCN), Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BiLSTM). They serve as basic and commonly used DL models for sequential prediction.
CNN is a widely used model initially designed for computer vision, which is also highly effective for time series processing in industrial scenarios. In this work, the 1D CNN consists of two convolutional layers (kernel sizes 7 and 5, filters = 16) with batch normalization, ReLU, and dropout (0.1); a max pooling layer (kernel size = 2) for downsampling; and global average pooling followed by a fully connected layer for final prediction.
The basic TCN consists of causal convolution and dilated convolution modules, serving as an enhanced CNN variant specifically designed for time series processing. It uses causal convolutions with dilation rates of 1 and 2 (kernel size = 3, filters = 16) to expand receptive fields without pooling. The remaining structure (global average pooling and fully connected layer) is identical to CNN.
LSTM employs unique gating units to control memory retention and updates, allowing it to effectively learn long-term dependencies in sequences, which has established it as a classic solution for various time-series tasks. In this work, the LSTM model is configured with a hidden size of 64, 2 layers, and a dropout rate of 0.3 to serve as a recurrent baseline for comparison.
As an extension of the standard LSTM, BiLSTM processes input sequences in both forward and backward directions, enabling it to capture contextual information from past and future states simultaneously. In this work, the BiLSTM is configured with a hidden size of 64, 2 layers, and a dropout rate of 0.3, serving as a bidirectional recurrent baseline for comparison.
4.2. Generation of Training Data in
Data from
is generated by MATLAB/Simulink 2024a, with
varied from 1 to 30
in steps of 1
and
M varied from 3 to 7 μH in steps of 0.1 μH.
in each scenario has a sequence length of 5000. A sliding window method is taken to segment the initial data, whose window size is 2000 and stride is 100. This method turns the shape of
into
, and the shape of labels is
according to Equation (
23).
To generate highly accurate ground-truth labels for the neural network, an automated Simulink-based simulation framework combined with Brent’s method is developed. Specifically, for a given combination of
and
M, Brent’s method is employed as an iterative optimizer to dynamically adjust
in the simulation model. The algorithm intelligently switches between inverse quadratic interpolation, the secant method, and bisection based on convergence conditions. For instance, when the interpolation fails or only two points are valid, the next candidate is approximated by the secant-based update rule as follows:
where
represents the output current error. This process of running a full simulation cycle and dynamically updating the input voltage repeats automatically until
converges to a predefined tolerance threshold. In this work, the iteration terminates when the voltage
variation between consecutive steps falls below 0.01 V, corresponding to a current error of approximately 1 mA. Through this mechanism, the exact
required to counteract the non-ideal current droop is reliably obtained for every operating scenario.
4.3. Validation Results of Prediction Accuracy
To evaluate the performance of time series predictive models, the following metrics are utilized to quantify the prediction accuracy. Mean Absolute Error (MAE) quantifies the average absolute value of prediction error, which is expressed as
where
n denotes the sample size and
denotes the average value of the actual samples. Root Mean Squared Error (RMSE) is defined as the square root of the mean squared errors, which is expressed as
Coefficient of Determination (
) quantifies how well the data fit the regression model. A value closer to 1 indicates superior predictive performance. A low
is generally an unacceptable sign for predictive models. It is calculated by
Data from
is split into a training set and a validation set at a 7:3 ratio. To prevent data leakage caused by the overlapping sliding windows, this split is performed at the scenario level (combinations of
and
M) rather than the window level, ensuring no overlap between training and validation scenarios. Both the training data and the labels were normalized. The hardware and software environment for training can be seen in
Table 2. To ensure the fairness of the comparison, all baseline models have been tuned to their optimal configurations. The adaptive average pooling layer, flatten operation and final fully connected layer are kept identical across the MSPC model with CNN and TCN. All models are trained for 500 epochs with a learning rate of 0.001. The validation results are shown in
Table 3. All three loss metrics perform best in the MSPC model. Moreover,
is extremely close to 1. The findings indicate that the MSPC model outperforms other models in prediction loss, due to its ability to extract dynamic features at multiple time scales of the WPT waveform, which contains high-frequency current components induced by power electronic switching devices.
4.4. Sensitivity Analysis on Dilation Rates
To justify the selection of dilation rates of
mentioned in
Section 3.1.1, a sensitivity analysis is conducted by comparing the proposed scheme with two alternative configurations. The architecture and other hyperparameters of the MSPC model are kept identical. The validation results are presented in
Table 4.
Scheme 1 yields the highest RMSE (0.0332) due to its insufficient receptive field to capture the long-term fundamental envelope of . Scheme 2 also degrades performance since the overly sparse dilation skips crucial sampling points of the higher-order harmonic components. The proposed exponential scheme () achieves the optimal balance, which best exploits the multi-scale feature extraction capability of the MSPC model.
4.5. Ablation Study on Multi-Scale Architecture
To verify the necessity of the multi-scale parallel design in the MSPC model, an ablation study is conducted. As given in Equation (
19),
is formed by concatenating
,
and
in parallel. The performance of the MSPC model employing one or two branches is assessed. All variants were trained and validated under identical conditions. The results are shown in
Table 5. The validation results indicate that relying on a single temporal scale yields suboptimal prediction accuracy. Specifically,
achieves the best single-branch performance (
). However, its prediction error remains relatively high (MAE = 0.0265). Conversely,
and
branches alone struggle to capture the comprehensive waveform dynamics, resulting in lower
values. This confirms that a single receptive field is insufficient to characterize the harmonic-rich WPT waveforms. Combining any two branches leads to a substantial performance improvement. Notably, the combination of
yields an
of 0.9901.
Ultimately, the proposed full architecture achieves the optimal performance across all schemes (MAE = 0.0105, RMSE = 0.0156, = 0.9969). This indicates that the multi-frequency features associated with switching events play a necessary role in completing the waveform representation and minimizing prediction errors. Therefore, the parallel multi-scale design is strictly necessary to comprehensively extract the multi-scale harmonic information in WPT systems.
4.6. Computational Complexity and Inference Latency
To evaluate the feasibility of deploying the MSPC model on resource-constrained embedded systems, a comparison of computational complexity is conducted. Floating Point Operations (FLOPs) represent the total number of addition and multiplication operations required to perform a single model inference. A lower FLOPs count indicates reduced computational demand, resulting in faster processing speed on embedded platforms. This work compares the FLOPs and parameter counts of the MSPC network with four benchmark models and three widely used algorithms for edge computing (TinyRNN, MiniRocket and 1D Transformer). The results are shown in
Table 6. The shapes of input and output data of validation models remain consistent.
As observed, LSTM, BiLSTM and 1D Transformer suffer from massive computational overhead, while the MSPC model maintains a lightweight profile with 2.096 FLOPs and 1.147 parameters. This demonstrates the potential of the MSPC model for future deployment on embedded devices. Furthermore, the inference latency of the proposed MSPC model was measured. The average single-inference time over 100 consecutive runs is merely 1.337 ms. This verifies the real-time feasibility of the MSPC model for embedded WPT controllers.