An Adaptive Multi-Task Gaussian Process Regression Approach for Harmonic Modeling of Aggregated Loads in High-Voltage Substations

Zheng, Jiahui; Song, Kun; Duan, Jiaqi; Wang, Yang

doi:10.3390/en18174670

Open AccessArticle

An Adaptive Multi-Task Gaussian Process Regression Approach for Harmonic Modeling of Aggregated Loads in High-Voltage Substations

¹

College of Electrical Engineering, Sichuan University, Chengdu 610065, China

²

Chengdu Electric Power Supply Company, State Grid Sichuan Power Supply Company, Chengdu 610041, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(17), 4670; https://doi.org/10.3390/en18174670

Submission received: 7 July 2025 / Revised: 8 August 2025 / Accepted: 18 August 2025 / Published: 3 September 2025

Download

Browse Figures

Versions Notes

Abstract

To address the challenges of complex harmonic characteristics, multi-source coupling, and strong time variability in aggregated loads downstream of high-voltage substations, this paper proposes an Adaptive Multi-Task Gaussian Process Regression (AMT-GPR) method for harmonic modeling. First, field measurements from the medium-voltage side of a 500 kV substation are denoised and analyzed using Fourier transform to reveal the dynamic patterns and interdependencies of harmonic current magnitudes. Then, a multi-task GPR framework is constructed, incorporating task correlation modeling and adaptive kernel functions to capture inter-task coupling and differences in feature scales. Finally, a probabilistic harmonic model is developed based on multiple sets of measured data, and the modeling performance of AMT-GPR is compared with single-task GPR, conventional MT-GPR, and mainstream machine learning approaches including RBF, LS-SVM, and LSTM. Simulation results demonstrate that traditional harmonic modeling methods are insufficient to capture the dynamic behavior and uncertainty of aggregated loads and AMT-GPR maintains strong robustness under small-sample conditions, significantly reduces prediction errors, and yields narrower uncertainty intervals, outperforming the baseline models. These findings validate the effectiveness of the proposed method in modeling harmonics of aggregated loads in high-voltage substations and provide theoretical support for subsequent harmonic assessment and mitigation strategies.

Keywords:

high-voltage substation; aggregated loads; gaussian process regression; multi-task learning; adaptive kernel function; probabilistic harmonic model

1. Introduction

With the widespread deployment of nonlinear loads and power electronic devices in power systems, the complexity and impact of harmonic issues have increased significantly. Harmonics not only degrade power quality but also pose systemic risks such as equipment resonance and protection malfunctions [1,2,3,4,5]. The mechanisms behind harmonic generation in modern power systems have evolved beyond traditional load-side or local circuit characteristics, now encompassing system-wide coupling effects involving power sources, network topology, and load behaviors. Against this backdrop, high-accuracy modeling of harmonic sources has become a critical requirement for ensuring the safe and stable operation of power systems.

Current harmonic source modeling methods can be broadly categorized into model-driven and data-driven approaches, each with distinct advantages and limitations. Model-driven methods are grounded in circuit theory of power electronic devices and aim to analytically describe the nonlinear behavior of harmonic sources through equivalent circuit models, such as switching function models and state-space models. These approaches can be further divided into two subcategories: (1) Generic models typically represent harmonic sources as harmonic-controlled sources based on fundamental current control [6] and describe harmonic voltage–current coupling characteristics using equivalent circuits or Norton’s theorem [7,8,9]. Recent studies have introduced coupled admittance matrices to capture inter-harmonic interactions [10,11,12,13,14,15] and applied optimization techniques such as partial least squares [14] to address parameter identification challenges caused by collinearity. However, such models often lack the flexibility to dynamically represent time-varying harmonic behaviors and require re-identification of parameters for different devices and operating conditions. (2) Specific models are developed based on the physical mechanisms of particular harmonic sources, such as electric arc furnaces or wind power inverters, offering higher modeling accuracy. For instance, electric arc furnace models based on stochastic processes and chaos theory [16,17,18,19,20] effectively characterize the dynamic uncertainties of arc behavior; harmonic models of doubly-fed induction generators (DFIGs) [21,22] reveal the relationship between stator-side harmonics and control parameters; and the dead-time effects and frequency coupling mechanisms in grid-connected inverters have been extensively investigated [23], leading to the development of dead-time-coupled harmonic source models that describe the propagation of harmonic voltages through outer control loops, inner current loops, and phase-locked loops (PLLs). Nonetheless, the applicability of specific models is often limited to their corresponding scenarios, making them difficult to generalize across diverse types of harmonic sources.

Data-driven methods leverage machine learning and statistical analysis techniques to extract nonlinear mapping relationships from harmonic monitoring data without relying on prior circuit knowledge, making them well-suited for scenarios involving multiple coupled harmonic sources. Traditional approaches such as weighted least squares [24,25], partial least squares [26,27], and complex linear regression [28,29] enhance noise resistance through optimization algorithms but fail to fully capture the dynamic characteristics of harmonics. In recent years, neural networks have emerged as a prominent research direction. Radial basis function (RBF) networks [30,31] reduce computational complexity through adaptive structural adjustments; improved autoregressive networks [32] and generalized regression networks [33] integrate multi-frequency features to enhance dynamic modeling capabilities; and deep learning techniques, such as hybrid models combining convolutional neural networks (CNNs) and long short-term memory (LSTM) networks [34], further strengthen the representation of time-varying features. Nevertheless, data-driven methods still face challenges including limited interpretability of physical mechanisms, insufficient online adaptability, and high computational demands.

Aggregated loads downstream of high-voltage substations typically consist of numerous heterogeneous nonlinear devices, whose harmonic characteristics exhibit the following complexities: (1) multi-source coupling effects, where the superposition of harmonic emissions from different loads leads to a nonlinear enhancement of the overall harmonic distribution; (2) dynamic uncertainty, where random factors such as load switching and operational condition changes cause the harmonic spectrum to vary over time; and (3) small-sample challenges, where the difficulty in obtaining harmonic data from certain loads hinders the generalization capability of modeling approaches. Traditional harmonic modeling methods are mostly based on deterministic assumptions, making them inadequate for capturing the dynamic behavior and uncertainties of aggregated loads. These limitations are particularly pronounced in high-voltage substation scenarios, where multiple stochastic factors coexist. Therefore, developing a modeling approach capable of accurately characterizing the harmonic behavior of aggregated loads in high-voltage substations has become a critical focus in harmonic modeling research.

Multi-Task Gaussian Process Regression (MT-GPR), as an advanced machine learning technique, is well-suited for handling multivariate, nonlinear, and uncertain data, offering strong generalization capabilities and high predictive accuracy. Its core advantage lies in leveraging the correlation information shared among tasks to enhance overall model performance, making it particularly suitable for harmonic modeling of aggregated loads with common characteristics and dynamic behavior.

Based on the complex harmonic relationships observed in actual data, where conventional MT-GPR fails to dynamically adjust the correlations among different harmonic current components, this paper proposes an aggregated load harmonic modeling method for high-voltage substations. To address the time-varying harmonic characteristics of aggregated loads in high-voltage substations, this paper proposes a harmonic modeling method based on Adaptive Multi-Task Gaussian Process Regression (AMT-GPR). First, field measurement data from a high-voltage substation are collected and analyzed, revealing that different harmonic orders exhibit distinct temporal features. Second, a correlation analysis of harmonic current components indicates the presence of coupling relationships, particularly among odd-order harmonics. Then, by incorporating adaptive kernel functions and a multi-task learning framework, the Gaussian Process Regression model effectively captures the inter-task coupling while adaptively accommodating feature differences across harmonic orders. This enables the construction of a probabilistic harmonic model for aggregated loads in high-voltage substations, significantly improving prediction accuracy and model robustness. Finally, the proposed method is validated through a comparative analysis between simulation results and real-world measurement data, demonstrating its accuracy and reliability in practical applications.

2. Measured Data Analysis

This study conducts a harmonic characteristic analysis based on actual operational data from multiple 500 kV high-voltage substations. The measurement points are located at the medium-voltage side (220 kV) of high-voltage transformers, directly capturing the harmonic characteristics on the load side. The data consist of three-phase voltage and current waveform recordings over several consecutive days from multiple substations, as illustrated in Figure 1. A total of 22 sets of 24 h measurement data were collected from eight different monitoring points, with individual recording durations ranging from 1 to 3 days. These datasets cover typical weekday load variations and comprehensively reflect the harmonic behavior under various loading conditions.

To ensure the accuracy and reliability of the harmonic analysis, the collected raw data were systematically preprocessed. The main steps and methods are as follows:

2.1. Denoising

The raw data often contain high-frequency noise and outliers, which can compromise the accuracy of harmonic analysis. Wavelet transform (WT) was employed to denoise the voltage and current waveform data. The wavelet transform is defined by the following equation:

W (a, b) = \frac{1}{\sqrt{a}} \int_{- \infty}^{\infty} x (t) ψ (\frac{t - b}{a}) d t

(1)

Here, x(t) represents the original signal, and ψ(t) denotes the mother wavelet function. The wavelet transform is governed by the scale parameter a and the translation parameter b. By selecting an appropriate wavelet basis function, the signal can be decomposed into sub-signals across different frequency bands, allowing high-frequency noise components to be removed while retaining the essential signal information. The denoised voltage and current waveforms, along with the residuals, are shown in Figure 2.

As shown in Figure 2, the residuals of both voltage and current exhibit small fluctuations with relatively uniform distribution, indicating that the denoised signals effectively retain their original characteristics while the noise components have been successfully separated. No apparent periodic patterns or abnormal spikes are observed, suggesting that the denoising process effectively removed high-frequency noise and random disturbances.

2.2. Harmonic Extraction

To investigate the harmonic characteristics on the low-voltage side of high-voltage substations, discrete Fourier transform (DFT) is applied to the measured waveform data for spectral analysis, enabling the extraction of fundamental and harmonic components of the voltage and current signals. For signals sampled at 256 points per cycle, the DFT is calculated using the following equation:

X (k) = \sum_{n = 0}^{N - 1} x (n) e^{- j \frac{2 π}{N} k n}

(2)

Here, N denotes the number of sampling points (with N = 256), k is the frequency index, and X(k) represents the signal in the frequency domain. To reduce spectral leakage effects, a Henning window (HW) is applied during the windowing process. The window function is defined as follows:

w (n) = 0.5 (1 - \cos (\frac{2 π n}{N - 1}))

(3)

The extracted harmonic frequency range spans from the fundamental frequency (50 Hz) up to the 12th harmonic (600 Hz).

DFT analysis was performed on one-minute monitoring data over the course of a full day. Figure 3 illustrates the variation in harmonic characteristics of aggregated loads at a typical high-voltage substation.

From the figure above, the following observations can be made: (1) the fundamental current and voltage exhibit non-stationary time series characteristics; (2) the amplitudes of the third and fifth harmonic currents and voltages are significantly higher than those of other harmonic orders; (3) harmonics with larger amplitudes, such as the third harmonic, show notable fluctuations in both voltage and current; and (4) even-order harmonic components are approximately zero. Therefore, this study focuses on the 3rd, 5th, 7th, 9th, and 11th harmonics.

In practical power system harmonic analysis, harmonic amplitude is a key indicator for evaluating the severity of harmonic distortion. It directly reflects the degradation level of power quality and has a significant impact on equipment operation, system stability, and user safety. In contrast, harmonic phase angles generally have a less pronounced effect on the overall system, especially in scenarios involving multiple harmonic sources, where their influence becomes increasingly complex due to variations in system topology and load characteristics. Hence, from an engineering perspective, this study concentrates on the analysis and modeling of harmonic amplitudes to streamline the research scope and enhance its feasibility and practical value.

For aggregated nonlinear loads in high-voltage substations, the time-varying characteristics of harmonic currents exhibit notable regularity and complexity. The harmonic current profiles from two consecutive days at the same substation are shown in Figure 4.

Analysis of the harmonic current curves reveals that the amplitude variations across different harmonic orders are relatively smooth. This indicates that the fluctuations in harmonic currents are primarily influenced by the overall system load conditions rather than by instantaneous variations in individual local loads, making it difficult to accurately distinguish between different operating states using traditional methods such as abrupt change detection. Additionally, the harmonic profiles for two consecutive days at the same substation exhibit strong similarity. For instance, the third harmonic current consistently peaks during the same time window (e.g., 18:00–22:00) on both days and shows comparable fluctuation patterns during other periods. This suggests that the variation in harmonic currents follows a temporal continuity and periodic pattern, likely driven by daily cyclic load behaviors, such as those associated with industrial activities or residential electricity consumption.

To further investigate the time-varying nature of harmonics, this study focuses on the statistical characteristics and variation patterns of harmonic amplitudes. Time-series analysis of harmonic currents over multiple consecutive days at the same substation is performed to extract statistical indicators such as mean and variance, thereby quantifying the fluctuation range. Figure 5 illustrates the statistical characteristics of the third harmonic current at a high-voltage substation, including its mean and variance, with the curves representing the temporal trends of these statistical features.

From Figure 5, the following observations can be made: (1) The mean curve of the harmonic current is relatively smooth, with the third harmonic current maintaining a mean value between approximately 91.7 A and 155.7 A over the 24 h period, indicating considerable long-term variation in the overall harmonic current level. (2) The variance curve reflects the degree of fluctuation in the harmonic current. The variance of the third harmonic current peaks between 18:00 and 22:00, suggesting that the fluctuation range is greatest during the evening load peak, while it remains relatively stable during other periods. By correlating harmonic amplitude data with load curves and relevant operational parameters, the relationship between harmonic amplitude and system operating conditions is analyzed. The results show that the time-varying characteristics of harmonic amplitudes are closely associated with load variations, with more pronounced fluctuations observed during peak load periods.

3. General Probabilistic Harmonic Model Based on Adaptive Multi-Task Gaussian Process Regression

3.1. Gaussian Process Regression

Gaussian Process Regression (GPR), as a non-parametric regression method based on Bayesian theory, is capable of handling complex nonlinear relationships while quantifying the uncertainty of predictions through Gaussian distributions. GPR has been widely applied across various fields such as machine learning, signal processing, and data mining, particularly in tasks requiring high prediction accuracy and uncertainty estimation. One of the core advantages of GPR is its ability to directly model the functional relationship between inputs and outputs while providing confidence intervals for the predicted values. Moreover, GPR does not assume any specific distribution for the data, offering strong flexibility and adaptability.

A Gaussian process defines the relationship among random variables in a function space using a Gaussian distribution. For an input vector

X = [x_{1}, x_{2}, \dots, x_{n}]

and output vector

y = [y_{1}, y_{2}, \dots, y_{n}]

, a Gaussian process is defined as follows:

y_{i} = f (x_{i}), f ~ G P (m (x), k (x, x^{'}))

(4)

where m(x) is the mean function, and

k (x, x^{'})

is the covariance function.

When constructing a general model of harmonic sources within the framework of Gaussian Process Regression (GPR), the key lies in the appropriate design of the mean and covariance functions. According to theoretical analysis, the functional relationship between harmonic current and voltage can be expressed as follows:

{\dot{I}}_{h} = F_{h} (C, {\dot{U}}_{1}, {\dot{U}}_{2}, \dots, {\dot{U}}_{s})

(5)

Here,

{\dot{U}}_{s}

denotes the phasor of the sth harmonic voltage, C represents the current operating condition parameters, F_h denotes the functional relationship, and

{\dot{I}}_{h}

denotes the phasor of the hth harmonic current.

However, in practical applications, due to the lack of standardized technical specifications for selecting the operating condition parameters $C$ and the function F_h and the fact that some devices only provide the magnitude information of harmonic voltages and currents, the modeling process primarily focuses on capturing the intrinsic relationship between the magnitudes of harmonic current and voltage. Moreover, for typical harmonic sources in power grids—such as PV inverters, induction furnaces, and computers—the emission of harmonic currents is mainly determined by the load operating state and is relatively insensitive to voltage distortion. To establish a generalized model, two additional parameters are introduced in Equation (5). Specifically, the fundamental current I₁ reflects the device’s operating power, and the constant term G_h describes its inherent harmonic characteristics. Based on this, the improved harmonic current–voltage relationship model is formulated as follows:

{\dot{I}}_{h} = F_{h} (I_{1}, {\dot{U}}_{1}, {\dot{U}}_{2}, \dots, {\dot{U}}_{d}) + G_{h}

(6)

For a harmonic dataset

D = {(x_{i}, I_{h, i})}_{i = 1}^{n}

, where

x_{i} \in ℝ^{d + 1}

denotes the ith input vector consisting of the fundamental voltage, fundamental current, and the harmonic voltages of various orders, and

I_{h, i} \in ℝ

represents the ith observation of the hth harmonic current, the input vector

X = {(x_{1}, \dots, x_{n})}^{T} \in ℝ^{n \times (d + 1)}

and the output vector

I_{h} = {(I_{h, 1}, \dots, I_{h, n})}^{T} \in ℝ^{n \times 1}

can be defined accordingly. Then, the Gaussian Process can be redefined as follows:

{\dot{I}}_{h, i} = F_{h} (x_{i}) + G_{h, i}, F_{h} (x) ~ G P (m (x), k (x, x^{'}))

(7)

Here,

x, x^{'} \in X

.

To simplify the problem without loss of generality, the mean function m(x) is commonly assumed to be zero. The covariance function, on the other hand, can take various forms, including constant, linear, Matern, radial basis function (RBF), or a combination of multiple kernels.

By further considering the influence of noise, the regression problem in Equation (7) can be formulated as a Gaussian Process Regression (GPR) problem. Under the Bayesian framework, GPR constructs a prior over functions using the training dataset X and updates this prior to a posterior distribution based on the test inputs. The joint prior distribution of the observed output vector

I_{h}

and the predicted output

I_{h}^{*}

is expressed as follows:

[\begin{matrix} I_{h} \\ I_{h}^{*} \end{matrix}] ~ N ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} K (X, X + σ_{n}^{2} E_{n}) & K (X, x^{*}) \\ K (x^{*}, X) & k (x^{*}, x^{*}) \end{matrix}])

(8)

Here,

K (X, X) \in ℝ^{n \times n}

is a symmetric and positive semi-definite covariance matrix with elements

K_{i j} = k (x_{i}, x_{j})

.

K (X, x^{*}) = K {(x^{*}, X)}^{T} \in ℝ^{n \times 1}

is the covariance matrix between the test input vector

x^{*}

and the training input matrix X.

k (x^{*}, x^{*})

is the covariance matrix of the test inputs.

σ_{n}^{2}

denotes the variance of the independent Gaussian white noise in GPR; and

E_{n}

is an n identity matrix. By conditioning the joint Gaussian prior distribution of the observed vector

I_{h}

, the posterior distribution of

I_{h}^{*}

can be analytically derived as follows:

\{\begin{cases} I_{h}^{*} | X, I_{h}, x^{*} ~ N ({\bar{I}}_{h}^{*}, c o v (I_{h}^{*})) \\ {\bar{I}}_{h}^{*} = K (x^{*}, X) {[K (X, X) + σ_{n}^{2} E_{n}]}^{- 1} I_{h} \\ c o v (I_{h}^{*}) = K (x^{*}, x^{*}) - K (x^{*}, X) \cdot {[K (X, X) + σ_{n}^{2} E_{n}]}^{- 1} K (X, x^{*}) \end{cases}

(9)

Here,

{\bar{I}}_{h}^{*}

and

c o v (I_{h}^{*})

represent the mean and variance of the predicted value

I_{h}^{*}

at the test point

x^{*}

, respectively. To further compute the predictive mean and variance, the hyperparameter set

θ = {σ_{f}^{2}, σ_{n}^{2}}

of the GPR model is inferred by minimizing the Negative Logarithmic Marginal Likelihood (NLML), which is given by the following:

θ_{o p t} = \underset{θ}{a r g m i n} NLML

(10)

NLML = - l o g p (I_{h} | X, θ) = \frac{1}{2} I_{h}^{T} C^{- 1} I_{h} + \frac{1}{2} l o g | C | + \frac{n}{2} l o g 2 π

(11)

Here,

C = K (X, X) + σ_{n}^{2} E_{n}

. The above expression can be solved by minimizing its partial derivatives using efficient gradient-based optimization algorithms, such as L-BFGS-B.

3.2. Multi-Task Gaussian Process Regression

In practical applications, harmonic sources typically involve multiple harmonic components, including currents and voltages at various orders. Therefore, harmonic source modeling can be regarded as a multi-output regression problem, where each harmonic current corresponds to one output. These harmonic components may exhibit complex coupling relationships. Although modeling each harmonic current individually using Gaussian Process Regression (GPR) can address the multi-output issue, this approach neglects the interdependence among different harmonic currents. As a result, it may lead to suboptimal prediction performance and fails to meet the practical engineering requirement of synchronous analysis across harmonic orders.

To investigate the coupling among different harmonics, this study utilizes harmonic monitoring data from a 500 kV substation. The Pearson Correlation Coefficient (PCC) and Spearman Correlation Coefficient (SCC) are computed among harmonic currents of various orders to evaluate their linear and monotonic correlations, respectively. A heatmap is then plotted, as shown in Figure 6, to visualize these relationships.

Pearson Correlation Coefficient (PCC)

ρ_{x, y} = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}}

(12)

Here, x and y represent the sampled data of different harmonic current components,

\bar{x}

and

\bar{y}

denote their sample means, and n is the total number of observations. The PCC measures the linear correlation between the two variables, with values ranging from −1 to 1. It is used to evaluate the linear trend in the variation of harmonic currents.

Spearman Correlation Coefficient (SCC)

ρ_{s} = \frac{\sum_{i = 1}^{n} (R_{i} - \bar{R}) (S_{i} - \bar{S})}{\sqrt{\sum_{i = 1}^{n} {(R_{i} - \bar{R})}^{2}} \sqrt{\sum_{i = 1}^{n} {(S_{i} - \bar{S})}^{2}}} = 1 - \frac{6 \sum d_{i}^{2}}{n (n^{2} - 1)}

(13)

Here, R_i and S_i denote the rank values of the ith observation for variables x and y, respectively.

\bar{R}

and

\bar{S}

are the average ranks of x and y, and d_i = R_i − S_i. The SCC is computed based on rank statistics and is used to evaluate the monotonic relationship between variables. It is suitable for analyzing nonlinear dependencies. Like PCC, SCC ranges from −1 to 1, making it useful for identifying potential nonlinear coupling characteristics among harmonic currents.

From the heatmaps, the following observations can be made: (1) There exists a certain degree of correlation among odd-order harmonic currents. This correlation mainly arises from their common origin—nonlinear loads in the power system—and the fact that their frequencies are all odd multiples of the fundamental frequency. (2) Even-order harmonic currents also exhibit some correlation, but it is relatively weaker. This is primarily due to the characteristics of the power system and the generation mechanisms of even harmonics. For example, even-order harmonics are typically negligible in a balanced three-phase system. (3) There is generally no significant correlation between odd- and even-order harmonic currents. This is because they are generated by different mechanisms, have distinct frequency characteristics, and may be influenced by different factors in the power system.

The correlation among harmonic currents may vary depending on system states, load characteristics, and operating conditions. Therefore, a detailed analysis tailored to specific scenarios is essential. To this end, the Single-Task Gaussian Process Regression (ST-GPR) model is extended to a Multi-Task Gaussian Process Regression (MT-GPR) model. By fully incorporating the interdependencies among harmonic currents in the modeling process, this approach not only significantly improves prediction accuracy but also enables the construction of a unified model that captures the joint behavior of multiple harmonic currents. The harmonic source modeling process based on MT-GPR is outlined as follows:

For homogeneous training sets X₁ = ⋯ = X_t = X, assume that the function set F = {F₁, F₂, …, F_t} follows a Gaussian process, i.e., F(x) ~ GP (0, κ(x, x′)). Then, the multi-output covariance function κ(x, x′) ∈ ℝ^T^×T is defined as follows:

κ (x, x^{'}) = (\begin{matrix} k_{11} (x, x^{'}) & \dots & k_{1 T} (x, x^{'}) \\ ⋮ & ⋱ & ⋮ \\ k_{T 1} (x, x^{'}) & \dots & k_{T T} (x, x^{'}) \end{matrix})

(14)

Here, x, x′ ∈ X, and k_tt_′(x, x′) (1 ≤ t, t′ ≤ T) denotes the covariance between outputs F_t(x) and F_t_′(x).

Similar to standard GPR, for a given harmonic training set

\tilde{X} = {X_{1}, \dots, X_{T}}^{T}

and the corresponding output observations

I = {I_{1}, \dots, I_{T}}^{T}

, the posterior distribution of the predicted output at a test point

x^{*}

is defined as follows:

I^{*} | \tilde{X}, I, x^{*} ~ N ({\bar{I}}^{*}, c o v (I^{*}))

(15)

where the predictive mean and variance are given by the following:

{\bar{I}}^{*} = κ (x^{*}, X) {[κ (X, X) + Σ_{n} E_{n}]}^{- 1} I

(16)

c o v ({\bar{I}}^{*}) = κ (x^{*}, x^{*}) - κ (x^{*}, X) \cdot {[κ (X, X) + Σ_{n} E_{n}]}^{- 1} κ (X, x^{*})

(17)

Here, κ(X, X) ∈ ℝ^nT^×nT, κ(X, x*) = (k_tt_′(X, x*)) ∈ ℝ^nT^×T, κ(x*, x*) = (ktt′(x*, x*))∈ℝT×T, and Σn∈ℝT×T is a diagonal matrix whose elements represent the observation noise variances for each task.

Similar to standard GPR, the hyperparameters in MT-GPR, including those in the kernel function {k_tt′}_{1≤t,t′≤T} and the noise covariance {

σ_{n . t}^{2}

}_1≤t≤T, can be optimized by minimizing the objective in Equation (10).

3.3. Adaptive Kernel Function Design

In kernel functions, the length-scale parameter l determines the degree of smoothness in the input data. A smaller length scale makes the kernel more sensitive to variations in the input, allowing it to capture local features, while a larger length scale results in smoother responses, favoring the modeling of global trends. In multi-task learning, different tasks (i.e., harmonic currents of different orders) may exhibit the following differences:

Different feature scales—for instance, as demonstrated by the analysis in Figure 3, lower-order harmonic currents typically have larger magnitudes, while higher-order components are relatively smaller.
Different data distributions—harmonic currents of different orders may follow distinct statistical patterns.

To accommodate these differences, assigning an independent and learnable length-scale parameter l_i for each task enables the kernel to better capture task-specific feature scales, thereby improving the model’s flexibility and adaptability. The Matern kernel can well adapt to multimodal data and data with steep changes, making it particularly advantageous for handling complex datasets. Saws-tooth waveforms often feature steep edges and sharp changes. The Matern kernel can effectively handle these steep changes by modifying its function characteristics, without overfitting the data. In this work, we adopt the Matern kernel, which is defined as follows:

k^{(i)} (x, x^{'}) = σ_{f}^{2} [1 + \frac{\sqrt{3} d (x, x^{'})}{l_{i}}] \exp (- \frac{\sqrt{3} d (x, x^{'})}{l_{i}})

(18)

Here,

σ_{f}^{2}

denotes the signal variance, l is the length-scale parameter, and

d (x, x^{'})

represents the Euclidean distance between inputs.

By combining task-specific length scales with a task relationship matrix, a multi-task kernel function is constructed as follows:

k ((x_{i}, t_{i}), (x_{j}, t_{j})) = A_{t_{i} t_{j}} \cdot k^{(t_{i})} (x_{i}, x_{j})

(19)

Here,

t_{i}

and

t_{j}

denote the task indices (i.e., the orders of harmonic currents), and

A_{t_{i} t_{j}}

is the task relationship matrix that characterizes the correlation between tasks

t_{i}

and

t_{j}

.

Similar to standard GPR, the task-specific length scales and the task relationship matrix A are jointly optimized by minimizing the Negative Logarithmic Marginal Likelihood (NLML).

Self-adaptive kernel functions adjust their parameters dynamically through optimization algorithms during the training process, thereby better fitting the data. During training, the parameters of the kernel function are adjusted through optimization algorithms to minimize the error in model fitting to the training data.

Initial setting: During the definition of the model, initial parameters may be set to some predefined values.
Optimization process: Parameters are adjusted through optimization algorithms. Specifically, first, the optimization algorithm adjusts the parameters based on the loss function (such as negative log-likelihood). Next, parameter updates occur, and in each iteration, the optimization algorithm computes the gradient of the loss function with respect to the parameters and adjusts the parameters accordingly.

This ensures that the model is optimized gradually during training, thereby achieving better data fitting.

3.4. Adaptive Multi-Task Gaussian Process Regression for General Probabilistic Harmonic Modeling

The general probabilistic harmonic modeling method based on Adaptive Multi-Task Gaussian Process Regression (AMT-GPR) provides an efficient and robust solution for harmonic analysis by jointly modeling multiple harmonic signals and producing probabilistic predictions. Starting from the characteristics of harmonic signals, the method constructs a multi-task framework and employs an adaptive multi-output covariance function to dynamically capture the correlations among different harmonic current components, thereby modeling the complex harmonic relationships observed in real-world data. The specific procedure consists of the following steps:

Data Preparation: Preprocess the acquired harmonic signals, construct the multi-task output matrix, and partition the dataset into training and testing sets;
Model Design: Select a covariance function suitable for multi-task modeling and adopt the adaptive Matern kernel to capture the smoothness and periodic characteristics of the harmonic signals;
Hyperparameter Optimization: Optimize the task-specific length scales, covariance matrix parameters, and noise variances using maximum likelihood estimation or Bayesian optimization techniques;
Model Training: Compute the covariance matrix of the training set and derive the posterior distribution based on Equations (16) and (17) to obtain the predictive mean and variance;
Model Evaluation: Assess the prediction accuracy using test data, with performance metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE).

4. Case Study

4.1. Performance Analysis of the Improved Model

To investigate the impact of the proposed improvements—namely, the adaptive kernel function and multi-output regression—on model accuracy and training time, this study compares three approaches: Single-Task GPR (ST-GPR), Multi-Task GPR (MT-GPR), and the proposed Adaptive Multi-Task GPR (AMT-GPR). These models are used to estimate the third-, fifth-, seventh-, and ninth-order harmonic currents in a high-voltage substation. (Higher-order harmonics typically exhibit low amplitudes and high randomness, making them less relevant for harmonic analysis in practical engineering applications, where more attention is paid to lower-order harmonics with higher amplitudes.)

The input features are constructed using the fundamental voltage, fundamental current, and the 3rd-, 5th-, 7th-, 9th-, and 11th-order harmonic voltages of phase A, extracted via fast Fourier transform (FFT) from high-frequency sampled data. A total of 1440 samples are used, and the corresponding harmonic currents serve as the output variables. A randomly selected 70% of the dataset is used as the training set for model validation.

Taking the third, fifth, and seventh harmonic currents as examples, the prediction results of the models are shown in Figure 7. The yellow line represents the ground truth, while the other colored lines represent the predictions of the three models, with shaded areas indicating the 95% confidence intervals of each method. Additionally, the performance metrics of the models are summarized in Table 1.

The simulation results indicate that MT-GPR, by introducing joint modeling across multiple tasks, effectively leverages the correlations among different harmonic components and demonstrates clear advantages in handling multi-task harmonic prediction problems. Taking the 3rd- and 11th-order harmonic current predictions as examples, the MT-GPR model reduces the MPE, MAPE, and MSE of the third-order harmonic current by 0.61%, 0.84%, and 1.37, respectively, compared to the ST-GPR model. For the 11th-order harmonic current, the reductions are 19.94%, 4.67%, and 0.65, respectively.

In addition, the coverage probability (CP) generally exceeds 96%, and the average interval width (AIW) for the third-order harmonic current decreases significantly from 10.09 to 4.56. Moreover, the multi-task modeling framework reduces redundant computations, thereby improving training efficiency to some extent. Therefore, MT-GPR achieves notable improvements in prediction accuracy, uncertainty quantification, and computational efficiency.

However, in practical applications, different harmonic signals exhibit varying feature scales. For instance, lower-order harmonics with larger magnitudes (such as the 3rd, 5th, and 7th orders) tend to have larger feature scales, while higher-order harmonics with smaller magnitudes (such as the 9th and 11th orders) exhibit smaller feature scales. Although MT-GPR demonstrates advantages in multi-task harmonic modeling by exploiting inter-task correlations, it still falls short in accurately capturing complex harmonic characteristics and in quantifying prediction uncertainty. In contrast, the proposed Adaptive Multi-Task GPR (AMT-GPR) assigns independent and learnable length-scale parameters to each task, introducing an adaptive mechanism that improves the kernel optimization efficiency. This enables the model to capture task-specific feature scale differences more effectively. As a result, AMT-GPR achieves further improvements over MT-GPR: for the 3rd-order harmonic current, the MPE, MAPE, and MSE are reduced by 3.09%, 0.12%, and 0.21, respectively; for the 11th-order harmonic current, the reductions are 5.96%, 0.03%, and 0.08, respectively. The coverage probability (CP) remains consistently above 96%, and the AIW for the 11th-order harmonic current decreases from 3.15 to 1.61, indicating that the AMT-GPR model maintains reliable performance even under challenging conditions. Furthermore, detailed analysis reveals that higher-order harmonics, such as the 11th order, exhibit relatively larger prediction errors, with MPE and MAPE reaching 19.12% and 3.7%, respectively. This phenomenon can be attributed to two main factors: (1) The significantly lower magnitude of high-order harmonic currents leads to a reduced signal-to-noise ratio (SNR). (2) High-order harmonics often exhibit pronounced sawtooth-like waveform characteristics, increasing signal nonlinearity and posing greater challenges for accurate modeling.

Based on the result, higher-order harmonics, such as the eleventh harmonic, exhibit higher errors and smaller amplitudes. To address this imbalance, one can examine methods such as data augmentation or weighted loss functions and discuss their potential effects. Specifically, the use of a weighted loss function can improve model performance by adjusting the loss weights to make the model more sensitive to errors in the high-frequency components. The advantage of this approach is that by adjusting the loss weights, the model becomes more focused on errors in the high-frequency components, thereby improving the fitting accuracy of these components. However, this method also has a disadvantage in that it increases the training complexity, requiring additional computational resources for both the calculation and application of the weights.

The simulation results demonstrate that compared with ST-GPR and MT-GPR, the proposed AMT-GPR model achieves significant improvements in prediction accuracy, uncertainty quantification, and computational efficiency, thereby validating the effectiveness of the proposed modeling approach and offering a practical solution for engineering applications.

The AMT-GPR model is suitable for real-time applications. Essentially, it is a machine learning algorithm. Although the training process is time-consuming, once the model is trained and deployed, the actual prediction speed is very fast. However, when the system undergoes significant changes (e.g., sudden changes in the harmonic spectrum), the previously trained model may no longer be applicable. In such cases, retraining the model is necessary to ensure accuracy. While retraining may take some additional time, this sacrifice is necessary to guarantee the model’s accuracy and reliability.

Compared to real-time simulation or analytical solving methods, machine learning algorithms clearly have distinct advantages in terms of usage. Specifically, when the system state is relatively stable, the AMT-GPR model can achieve a balance between speed and accuracy. However, since the system state is unlikely to remain unchanged, when the system changes, the previously trained model may result in decreased accuracy. In such cases, if adjustments or retraining are made, although some real-time performance may be sacrificed, the model’s accuracy can be significantly improved.

In summary, the AMT-GPR model can effectively balance speed and accuracy when the system state remains unchanged. However, when the system state changes, retraining is necessary to ensure high accuracy, despite the additional time delay. This trade-off is reasonable in practical applications.

4.2. Comparison with Other Modeling Methods

Based on the same set of measured data, the proposed AMT-GPR model is further compared with other existing harmonic source modeling methods reported in the literature. These methods include the Radial Basis Function Neural Network (RBF) proposed in [31], the Least Squares Support Vector Machine (LS-SVM) approach in [35], and the Long Short-Term Memory (LSTM) neural network method in [36], which incorporates temporal sequence characteristics of the data. All these approaches fall under the category of data-driven machine learning models.

A randomly selected 70% of the dataset is used as the training set for each model. The prediction results are illustrated in Figure 8, and the performance metrics are summarized in Table 2. Since the aforementioned comparison models do not provide uncertainty quantification (e.g., prediction intervals), the evaluation focuses solely on MPE, MAPE, and MSE metrics.

The results demonstrate that AMT-GPR significantly outperforms the other methods. Taking the third-order harmonic as an example, its MPE, MAPE, and MSE reach 3.46%, 0.49%, and 0.87, respectively. Compared to the best-performing alternative model, LSTM, the MAPE and MSE of AMT-GPR are reduced by 0.49% and 0.13%, respectively. Compared to the worst-performing method, LS-SVM, the reductions are 2.82%, 0.51%, and 0.89, respectively, highlighting the superior performance of AMT-GPR in point estimation tasks. Therefore, its predictive mean can be directly adopted as the final output.

In addition, by analyzing Figure 8 and Table 2 together, it is evident that the LS-SVM model exhibits relatively poor fitting accuracy. For the third-order harmonic current, its MPE, MAPE, and MSE are as high as 6.28%, 1.00%, and 1.76, respectively, with the predicted curve showing noticeable deviations from the measured values in certain cases. This result suggests that not all data-driven machine learning models perform well in fitting tasks; their effectiveness also depends on the complexity of the problem, the size of the training dataset, and data quality. For instance, LS-SVM may lack sufficient fitting capacity when dealing with harmonic source characteristics involving strong nonlinearity and multivariate coupling. In addition, limited data samples may prevent the model from adequately learning complex nonlinear features. Therefore, in engineering practice, it is essential to conduct multidimensional analyses based on problem characteristics and data properties to optimize model performance.

4.3. Analysis of the Impact of Data Conditions on Model Performance

To further analyze the impact of data conditions on model performance, 10% of the samples are randomly selected as the training set in this section. Based on the measured data from a high-voltage substation, four data-driven modeling methods—namely, LS-SVM, RBF, LSTM, and the proposed AMT-GPR, as described in Section B—are used to establish harmonic source models. The corresponding prediction results and performance metrics are presented in Figure 9 and Table 3.

The results show that AMT-GPR significantly outperforms the other methods. Taking the third-order harmonic as an example, its MPE, MAPE, and MSE reach 3.77%, 0.59%, and 1.02, respectively. Compared to the best-performing alternative model (LSTM), the MPE, MAPE, and MSE are reduced by 1.33%, 0.19%, and 0.39, respectively. Compared to the worst-performing method (LS-SVM), the reductions are 8.24%, 2.49%, and 3.84, respectively.

A comparison between Table 2 and Table 3 reveals that when the size of the training dataset is small, the prediction error of the models generally increases significantly—particularly for LS-SVM. Further analysis shows that AMT-GPR exhibits low sensitivity to training data size and demonstrates strong robustness. This characteristic highlights its superiority under small-sample conditions and enhances its practical value in engineering applications. The main reasons are as follows:

Gaussian Process Regression (GPR) is a Bayesian non-parametric model that, unlike traditional machine learning methods such as neural networks, does not require a large amount of data to optimize parameters. Instead, GPR models the relationships between input data points through a kernel function. As a result, its predictions rely on data distribution rather than parameter fitting, allowing for reasonable prediction performance even with limited data.
AMT-GPR is a multi-task learning method that models inter-task relationships by sharing the kernel function. This enables information sharing across tasks, so that even when the data for a single task is sparse, the model can leverage data from other tasks for auxiliary learning, thereby improving robustness.
The introduction of an adaptive kernel function allows the model to maintain stable performance under varying data sizes, reducing its sensitivity to data quantity. The adaptive mechanism enables the model to reduce complexity and avoid overfitting with limited data, while enhancing accuracy and generalization capability when more data are available.

In contrast, other methods such as LS-SVM and RBF are more sensitive to dataset size, which is mainly reflected in their higher risk of overfitting when data is insufficient and degraded performance when the data distribution changes.

In summary, the influence of sample size on model prediction error is multifaceted:

From a statistical perspective, as the number of harmonic samples increases, the statistical characteristics of the sample (e.g., mean, variance) more closely approximate those of the population. This implies that larger datasets yield estimates that are closer to the true values, thereby reducing prediction errors;
From the perspective of model training, expanding the dataset enhances the model’s capacity to capture harmonic characteristics, thereby improving prediction accuracy;
In terms of generalization, increasing the sample size helps to mitigate overfitting and improves the model’s generalization ability.

Therefore, in practical engineering applications, constructing a sufficiently large and representative training dataset is a critical factor in ensuring high prediction performance.

4.4. Probabilistic Modeling of Harmonic Sources

In this section, the model prediction results at a representative time instant are selected, and Gaussian distributions of each harmonic current are constructed using the predicted mean and standard deviation as parameters. This provides an intuitive comparison of the predictive performance of each model, as shown in Table 4.

Figure 10 illustrates the probability distributions of the predicted 3rd- and 11th-order harmonic currents at a specific time instant. By comparing the results of the ST-GPR, MT-GPR, and AMT-GPR models, it is observed that the AMT-GPR distribution exhibits a sharper peak and narrower tails, indicating both higher prediction accuracy and lower uncertainty. In contrast, the distributions produced by ST-GPR and MT-GPR are more dispersed, with lower peaks and wider tails. Among them, ST-GPR performs the worst, reflecting lower prediction accuracy and higher uncertainty. This comparison further validates the significant advantages of AMT-GPR in terms of prediction precision and robustness.

5. Conclusions

This paper proposes a general probabilistic harmonic modeling method based on Adaptive Multi-Task Gaussian Process Regression (AMT-GPR) to address the modeling of aggregated load harmonics in high-voltage substations. By integrating an adaptive kernel function with a multi-task learning framework, AMT-GPR effectively captures the correlations among different harmonic components and adaptively adjusts to their varying feature scales. This significantly enhances both prediction accuracy and model robustness. The main conclusions are as follows:

Harmonic Characteristic Analysis: Based on measured data from a high-voltage substation, the time-varying characteristics of harmonics generated by aggregated loads are analyzed. The results reveal a strong correlation between harmonic currents and system load variations, providing a solid data foundation for harmonic modeling.
Model Design: The proposed AMT-GPR method assigns independent and trainable length-scale parameters to each harmonic task, enabling adaptive feature scale learning. This design substantially improves model flexibility and adaptability across different harmonic orders.
Performance Validation: Through comparative experiments with ST-GPR, MT-GPR, and other data-driven approaches (including LS-SVM, RBF, and LSTM), AMT-GPR demonstrates superior performance in prediction accuracy, uncertainty quantification, and computational efficiency. The results show that AMT-GPR performs well for both low- and high-order harmonics and exhibits strong robustness under small training datasets.
Probabilistic Modeling: Based on the AMT-GPR prediction results, Gaussian probability distributions of harmonic currents are constructed, providing a reliable theoretical foundation for subsequent probabilistic harmonic power flow analysis.

Harmonic power flow analysis requires modeling harmonic sources in the power grid. AMT-GPR serves as a harmonic source modeling method that can replace previous approaches. Further developing probabilistic harmonic power flow calculation tools based on AMT-GPR represents one of the future research directions.

Author Contributions

Conceptualization, J.Z.; methodology, J.Z. and K.S.; software, J.D. and Y.W.; validation, K.S.; resources, K.S.; writing—original draft, J.Z. and K.S.; writing—review and editing, J.Z.; visualization, J.D.; supervision, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is unavailable due to privacy or ethical restriction.

Conflicts of Interest

Author Kun Song was employed by Chengdu Electric Power Supply Company, State Grid Sichuan Power Supply Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Shao, Z.; Xu, H.; Xiao, S.; Wu, G.; Zhang, Y. Harmonic problems in a new energy power grid. Power Syst. Prot. Control 2021, 49, 178–187. [Google Scholar] [CrossRef]
Valavi, M.; Devillers, E.; Besnerais, J.L.; Nysveen, A.; Nilsen, R. Influence of converter topology and carrier frequency on airgap field harmonics, magnetic forces, and vibrations in converter-fed hydropower generator. IEEE Trans. Ind. Appl. 2018, 54, 2202–2214. [Google Scholar] [CrossRef]
Domínguez-Crespo, M.A.; Rodríguez, E.; Torres-Huerta, A.M.; Soni-Castro, I.J.; Brachetti-Sibaja, S.B.; Narro-García, R.; López-Oyama, A.B. Production of BN nanostructures by pulsed laser ablation in liquids: Influence of the applied Nd: YAG harmonics on the structural, optical and photoluminescence properties. Ceram. Int. 2020, 46, 21667–21680. [Google Scholar] [CrossRef]
Zhang, W.; Xiong, Y.; Li, C.; Yao, W.; Wen, J.; Gao, D. Continuous commutation failure suppression and coordinated recovery of multi-infeed DC system based on improved VDCOL. Power Syst. Prot. Control 2020, 48, 63–72. [Google Scholar] [CrossRef]
Chen, L.; He, H.; Wang, L.; Chen, H. Fault isolation method of a hybrid HVDC system based on the coordination of a fault current limiter and a DC circuit breaker. Power Syst. Prot. Control 2020, 48, 119–127. [Google Scholar] [CrossRef]
Hiyama, T.; Hammam, M.S.A.A.; Ortmeyer, T.H. Distribution system modeling with distributed harmonic sources. IEEE Trans. Power Deliv. 1992, 4, 1297–1304. [Google Scholar] [CrossRef]
Salles, D.; Jiang, C.; Xu, W.; Freitas, W.; Mazin, H.E. Assessing the collective harmonic impact of modern residential loads—Part I: Methodology. IEEE Trans. Power Deliv. 2012, 27, 1937–1946. [Google Scholar] [CrossRef]
Jiang, C.; Salles, D.; Xu, W.; Freitas, W. Assessing the collective harmonic impact of modern residential loads—Part II: Applications. IEEE Trans. Power Deliv. 2012, 27, 1937–1946. [Google Scholar] [CrossRef]
Thunberg, E.; Soder, L. Norton approach to distribution network modeling for harmonic studies. IEEE Trans. Power Deliv. 1999, 14, 272–277. [Google Scholar] [CrossRef]
Fauri, M. Harmonic modelling of non-linear load by means of crossed frequency admittance matrix. IEEE Trans. Power Syst. 1997, 12, 1632–1638. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, G.; Xu, W.; Mayordomo, J.G. A harmonically coupled admittance matrix model for AC/DC converters. IEEE Trans. Power Syst. 2007, 22, 1574–1582. [Google Scholar] [CrossRef]
Sun, Y.; Li, J.; Yin, Z.; Wang, G. A frequency domain harmonic model for uncontrolled rectifier in AC/DC/AC converter. Proc. CSEE 2015, 35, 5483–5491. [Google Scholar] [CrossRef]
Sun, Y.; Liu, F.; Li, J.; Kun, Z. Unified harmonic models and operational mode determination for the three-phase uncontrolled voltage source converters. Proc. CSEE 2016, 36, 3413–3421+3360. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, L.; Xie, X.; Feng, Z.; Wang, S. The collective harmonic evaluation of residential load based on the harmonic coupling dominant component model. Proc. CSEE 2019, 39, 4775–4785+4979. [Google Scholar] [CrossRef]
Xie, X.; Sun, Y.; Zhang, L.; He, J.; Zhang, Y. Self-adaptive harmonic modeling of the residential nonlinear load. Proc. CSEE 2020, 40, 2479–2489. [Google Scholar] [CrossRef]
Liao, Y.; Hu, J.; Zhang, H. Time-varying parameter model of AC electrical arc furnace for power quality predictions and analysis. Electr. Eng. 2016, 17, 41–46. [Google Scholar]
Wang, J.; Shu, H.; Lin, M.; Xueyun, C. Modeling and simulation of AC arc furnace for dynamic power quality studies. Trans. China Electrotech. Soc. 2003, 18, 53–58. [Google Scholar] [CrossRef]
King, P.E.; Ochs, T.L.; Hartman, A.D. Chaotic response in electric arc furnaces. J. Appl. Phys. 1994, 76, 2059–2065. [Google Scholar] [CrossRef]
Wang, Y.; Jiang, J. A novel chaotic model of AC electric arc furnace for power quality study. In Proceedings of the 2007 International Conference on Electrical Machines and Systems (ICEMS), Seoul, Republic of Korea, 8–11 October 2007; pp. 106–110. [Google Scholar]
Lin, C.; Zhang, Y.; Shao, Z.; Lin, C.; Zhang, Y. An ultra-high-power electric arc furnace model for low-frequency non-stationary inter-harmonics studies. Electr. Power 2020, 53, 1–8. [Google Scholar]
Nian, H.; Zhou, Q.; Wu, C.; Zhu, Q. The modeling and characteristic analysis of harmonic current of DFIG based wind turbine in grid-connected mode. Proc. CSEE 2019, 39, 5037–5048+5285. [Google Scholar] [CrossRef]
Zhang, B. Research on Harmonic Dynamic Model and Resonance Analysis of Wind Parks. Ph.D. Thesis, Shandong University, Jinan, China, 2019. [Google Scholar]
Tao, S.; Zhu, X.; Chen, H.; Xu, Y. Dead-time coupling harmonic source modeling of grid-connected Inverter. Proc. CSEE 2024, 1–14. [Google Scholar]
Che, Q.; Yang, H. Assessing the harmonic emission level based on robust regression method. Proc. CSEE 2004, 24, 43–46+53. [Google Scholar]
Ortega, J.M.M.; Exposito, A.G.; Garcia, A.L.T.; Payan, M.B. A state estimation approach to harmonic polluting load characterization in distribution systems. IEEE Trans. Power Syst. 2005, 20, 765–772. [Google Scholar]
Huang, S.; Xu, Y. Assessing harmonic impedance and the harmonic emission level based on partial least-squares regression method. Proc. CSEE 2007, 27, 93–97. [Google Scholar]
Feng, F.; Xie, X. Assessment of network harmonic emission level via enhanced partial least-squares regression method. Metrol. Meas. Tech. 2014, 41, 33–35. [Google Scholar]
Hua, H.C.; Jia, X.F.; An, H.Q.; Zhang, S.G. Determining harmonic contributions based on complex least squares method. Proc. CSEE 2013, 33, 149–155. [Google Scholar] [CrossRef]
Yongle, A.; Yudong, W.; Jingjing, D.; Wang, W. A method for assessing harmonic emission level based on robust regression of LTS initial value. Power Syst. Prot. Control 2015, 43, 99–105. [Google Scholar]
Moreno, M.A.; Usaola, J. A new balanced harmonic load flow including nonlinear loads modeled with RBF networks. IEEE Trans. Power Deliv. 2004, 19, 686–693. [Google Scholar] [CrossRef]
Yong, Z.; Haozhong, C.; Naicheng, G. Generalized growing and pruning RBF neural network based harmonic source modeling. Proc. CSEE 2005, 25, 42–46. [Google Scholar]
Maciej, K.; Dariusz, G. Application of shallow neural networks in electric arc furnace modeling. IEEE Trans. Ind. Appl. 2022, 58, 6814–6823. [Google Scholar] [CrossRef]
Xie, K.; Yang, H.; Zhang, Y. Harmonic source modeling based on generalized regression neural network. Adv. Technol. Electr. Eng. Energy 2012, 31, 64–67. [Google Scholar]
Zhang, Y.; Ou, J.; Chen, S.; Jie, K.; Liu, S. A data-driven harmonic source modeling method based on joint time-frequency features extraction. Proc. CSEE 2025, 1–14. [Google Scholar]
Lianqing, Z.; Ping, W.; Xiaolong, L. Application of least squares support vector machine to harmonic sources modeling based on genetic algorithm. Power Syst. Prot. Control 2011, 39, 52–56. [Google Scholar]
Yang, P.; Wang, X.; Zhao, X. Research on harmonic prediction of the grid-connected photovoltaic system based on deep learning. Power Syst. Clean Energy 2022, 38, 71–80. [Google Scholar]

Figure 1. Measured voltage and current waveforms in high-voltage substation.

Figure 2. Residual plots of denoised voltage and current waveforms.

Figure 3. Variation of harmonic characteristics of aggregated loads in a high-voltage substation.

Figure 4. Harmonic current profiles of each order on two consecutive days at the same substation.

Figure 5. The statistical characteristics of the 3rd harmonic current.

Figure 6. Heatmaps of PCC and SCC among harmonic currents of different orders.

Figure 7. Model prediction results for the 3rd-, 5th-, and 7th-order harmonic currents.

Figure 8. Model prediction results for the 3rd, 5th, and 7th-order harmonic currents.

Figure 9. Model prediction results for the 3rd, 5th, and 7th-order harmonic currents.

Figure 10. Probability distributions of harmonic currents at a specific time instant (3rd and 11th orders).

Table 1. Performance evaluation results of the above models.

Model	Harmonic Order	MPE (%)	MAPE (%)	MSE	CP (%)	AIW	Training Time (s)
ST-GPR	3th	7.16	1.45	2.45	96.0	10.09	21.7
	5th	11.35	1.08	1.86	87.4	7.32	22.6
	7th	13.98	2.65	1.12	88.7	3.82	19.2
	9th	14.29	1.83	0.61	99.4	3.50	25.6
	11th	45.02	8.40	1.33	99.2	6.71	26.5
MT-GPR	3th	6.55	0.61	1.08	96.2	4.56	18.7
	5th	7.83	0.46	0.85	96.1	3.25
	7th	8.74	1.40	0.57	94.1	2.18
	9th	5.94	1.20	0.40	96.0	1.62
	11th	25.08	3.73	0.68	96.1	3.15
AMT-GPR	3th	3.46	0.49	0.87	96.5	3.88	16.2
	5th	1.75	0.40	0.66	97.1	2.94
	7th	5.58	1.20	0.48	96.6	2.08
	9th	5.66	1.16	0.38	95.8	1.57
	11th	19.12	3.70	0.60	95.8	1.61

Table 2. Performance evaluation results of the models (70% training set).

Model	Harmonic Order	MPE (%)	MAPE (%)	MSE
LS-SVM	3th	6.28	1.00	1.76
	5th	5.68	0.69	1.18
	7th	9.72	1.58	0.64
RBF	3th	4.01	0.73	1.54
	5th	2.36	0.46	0.81
	7th	6.63	1.30	0.54
LSTM	3th	3.95	0.62	1.06
	5th	2.64	0.40	0.67
	7th	6.31	1.20	0.49
AMT-GPR	3th	3.46	0.49	0.87
	5th	1.75	0.40	0.66
	7th	5.58	1.20	0.48

Table 3. Performance evaluation results of the models (10% training set).

Model	Harmonic Order	MPE (%)	MAPE (%)	MSE
LS-SVM	3th	12.01	3.08	4.86
	5th	7.57	1.21	2.11
	7th	16.86	2.93	1.25
RBF	3th	5.36	1.41	2.35
	5th	7.38	0.98	1.72
	7th	13.89	2.61	1.08
LSTM	3th	5.10	0.78	1.38
	5th	6.45	0.57	1.03
	7th	8.43	1.54	0.63
AMT-GPR	3th	3.77	0.59	1.02
	5th	5.74	0.44	0.85
	7th	6.54	1.32	0.54

Table 4. Gaussian distributions of harmonic current outputs from different models at time t.

Model	ST-GPR	MT-GPR	AMT-GPR	True Value
Ih3	N (1.017 × 10⁻¹, 6.3 × 10⁻³)	N (1.064 × 10⁻¹, 1.1 × 10⁻³)	N (1.064 × 10⁻¹, 9.4 × 10⁻⁴)	1.067 × 10⁻¹
Ih5	N (1.391 × 10⁻¹, 4.7 × 10⁻³)	N (1.415 × 10⁻¹, 7.7 × 10⁻⁴)	N (1.414 × 10⁻¹, 7.1 × 10⁻⁴)	1.412 × 10⁻¹
Ih7	N (3.32 × 10⁻², 2.0 × 10⁻³)	N (3.54 × 10⁻², 5.29 × 10⁻⁴)	N (3.54 × 10⁻², 5.04 × 10⁻⁴)	3.55 × 10⁻²
Ih9	N (2.58 × 10⁻², 1.9 × 10⁻³)	N (2.60 × 10⁻², 4.05 × 10⁻⁴)	N (2.58 × 10⁻², 3.91 × 10⁻⁴)	2.65 × 10⁻²
Ih11	N (1.17 × 10⁻², 2.3 × 10⁻³)	N (9.10 × 10⁻³, 7.78 × 10⁻⁴)	N (8.70 × 10⁻⁴, 6.30 × 10⁻⁴)	8.80 × 10⁻³

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, J.; Song, K.; Duan, J.; Wang, Y. An Adaptive Multi-Task Gaussian Process Regression Approach for Harmonic Modeling of Aggregated Loads in High-Voltage Substations. Energies 2025, 18, 4670. https://doi.org/10.3390/en18174670

AMA Style

Zheng J, Song K, Duan J, Wang Y. An Adaptive Multi-Task Gaussian Process Regression Approach for Harmonic Modeling of Aggregated Loads in High-Voltage Substations. Energies. 2025; 18(17):4670. https://doi.org/10.3390/en18174670

Chicago/Turabian Style

Zheng, Jiahui, Kun Song, Jiaqi Duan, and Yang Wang. 2025. "An Adaptive Multi-Task Gaussian Process Regression Approach for Harmonic Modeling of Aggregated Loads in High-Voltage Substations" Energies 18, no. 17: 4670. https://doi.org/10.3390/en18174670

APA Style

Zheng, J., Song, K., Duan, J., & Wang, Y. (2025). An Adaptive Multi-Task Gaussian Process Regression Approach for Harmonic Modeling of Aggregated Loads in High-Voltage Substations. Energies, 18(17), 4670. https://doi.org/10.3390/en18174670

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Adaptive Multi-Task Gaussian Process Regression Approach for Harmonic Modeling of Aggregated Loads in High-Voltage Substations

Abstract

1. Introduction

2. Measured Data Analysis

2.1. Denoising

2.2. Harmonic Extraction

3. General Probabilistic Harmonic Model Based on Adaptive Multi-Task Gaussian Process Regression

3.1. Gaussian Process Regression

3.2. Multi-Task Gaussian Process Regression

3.3. Adaptive Kernel Function Design

3.4. Adaptive Multi-Task Gaussian Process Regression for General Probabilistic Harmonic Modeling

4. Case Study

4.1. Performance Analysis of the Improved Model

4.2. Comparison with Other Modeling Methods

4.3. Analysis of the Impact of Data Conditions on Model Performance

4.4. Probabilistic Modeling of Harmonic Sources

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI