A Principal Component Estimation Algorithm with Adaptive Learning Rate and Its Convergence Analysis

Yingbin Gao; Haidi Dong; Yuan Zhou; Zhongying Xu; Jing Li; Kai Jin

doi:10.3390/app152111826

,

and

¹

The 54th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang 050081, China

²

College of Weaponry Engineering, Naval University of Engineering, Wuhan 430033, China

³

Rocket Force University of Engineering, Xi’an 710025, China

^*

Author to whom correspondence should be addressed.

Appl. Sci.2025, 15(21), 11826;https://doi.org/10.3390/app152111826

This article belongs to the Special Issue Research and Application of Neural Networks

Version Notes

Order Reprints

Abstract

The deterministic discrete-time method is a dominant approach for analyzing neural network algorithms. To address the issue where conventional convergence conditions impose stringent restrictions on the range of learning factors, this paper proposes a principal component estimation algorithm with an adaptive learning factor, which guarantees global convergence. The convergence of the algorithm is analyzed using the deterministic discrete-time method, and conditions ensuring convergence are established. Unlike convergence conditions for other algorithms, the proposed algorithm’s convergence conditions eliminate restrictions on the learning factor, thereby extending its feasible range. Simulation results demonstrate that the proposed algorithm effectively resolves ill-conditioned matrix problems. When compared with existing algorithms, the proposed algorithm exhibits significantly faster convergence speed than several current methods.

Keywords:

adaptive learning factor; principal component; deterministic discrete-time; convergence analysis

1. Introduction

The principal component in signal processing and deep learning is identified by the direction of a signal’s greatest variability []. Principal component analysis (PCA) is widely adopted for data preprocessing and feature extraction, with the resulting features serving as inputs for downstream tasks. For instance, feature vectors from the signal subspace can be utilized with a flow matrix in weighted subspace fitting to achieve high-accuracy estimation of signal source directions []. The utility of PCA further encompasses domains such as image compression [], fault diagnosis [], and various other disciplines [,,,,,,,]. Consequently, the problem of estimating a signal’s principal component could be transformed into a matrix eigenvalue decomposition problem, which could then be solved using eigenvalue decomposition algorithms. However, in real-time signal processing scenarios [], sensors could only acquire the signal value at the current time instant, not the entire autocorrelation matrix. In such cases, batch processing algorithms become unsuitable for the online estimation of principal components. To enable real-time estimation of signal principal components, researchers have proposed Hebbian neural network-based algorithms [,,,]. This class of algorithms could directly estimate the principal component from the input signal stream, thus avoiding the computation of the autocorrelation matrix.

Based on distinct update rules, numerous neural network-based algorithms have been successively proposed [,,,]. Convergence analysis of neural network-based algorithms constitutes a critically important research topic. In the field of convergence analysis, three primary methodological frameworks have gained prominence []: Stochastic Discrete-Time (SDT), Deterministic Continuous-Time (DCT), and Deterministic Discrete-Time (DDT). Typically, algorithms are formulated as SDT systems, for which direct convergence analysis proves highly challenging []. Among indirect analysis methods, the DCT approach imposes excessively restrictive conditions that are often impractical in real-world applications []. In contrast, the DDT method imposes fewer constraints and has consequently become the dominant approach for convergence analysis in neural network-based algorithms [].

In neural networks, the learning rate stands as one of the most critical hyperparameters, directly governing the step size for parameter updates during training. The assumption of a constant learning rate constitutes a standard premise in traditional DDT-based convergence analysis []. To ensure algorithm convergence, restrictive constraints are usually imposed, confining the learning rate to a very narrow range. However, in practical applications, a larger learning rate can be appropriately selected during the initial iterations when far from the stationary point to accelerate convergence. Conversely, during later stages of operation, as the solution approaches the stationary point, a smaller learning rate should be chosen to produce more precise update steps []. Since restricting the learning rate inherently limits algorithm performance, exploring adaptive algorithms constitutes a meaningful research direction.

In this paper, an adaptive algorithm is proposed, which can directly extract the principal components of the input signal without the need to calculate the autocorrelation matrix of the algorithm. Compared with batch processing, the proposed algorithm has higher real-time performance and lower computational complexity. Consequently, the proposed algorithm is highly effective for online signal processing and time-varying systems, particularly in applications like real-time signal tracking and adaptive filtering. The paper is organized as follows: Section 2 details the novel adaptive learning rate algorithm. Section 3 examines its convergence properties via a DDT-based analysis. Section 4 validates the algorithm’s performance through numerical simulations and practical case studies. Finally, Section 5 summarizes the main conclusions.

2. Adaptive Algorithm Design

The theoretical framework of this study is structured around a standard Hebbian neural network, characterized by the input-output relationship below,

p (k) = w^{T} (k) q (k), k = 1, 2, 3, \dots

(1)

where

q (k) \in R^{n \times 1}

is the

n

-dimensional input signal,

w (k) \in R^{n \times 1}

the evolving weight vector, and

p (k)

the corresponding output. The development of this Hebbian algorithm is directed towards establishing an update rule that ensures convergence of

w (k)

to the first generalized eigenvector.

The principal component estimation algorithm proposed by Chen []. is derived from this neural network model.

w (k + 1) = w (k) + η [w^{T} (k) w (k) q (k) p (k) - p^{2} (k) w (k)]

(2)

By employing the DDT method, a convergence proof for the modified algorithm was established by Kong et al. [], contingent on the learning rate satisfying

η < 0.3

. Evidently, the permissible range for the learning rate is severely restricted. To develop an algorithm with an adaptive learning rate, the following method is proposed:

w (k + 1) = w (k) + \frac{1}{{‖w (k)‖}^{4}} [w^{T} (k) w (k) q (k) p (k) - p^{2} (k) w (k)]

(3)

By comparing the two equations, it appears that Equation (3) replaces the learning factor

η

in Equation (2) with variable

1 / {‖w (k)‖}^{4}

. However, there is a distinct difference between them. In Equation (2), the learning factor must satisfy

0 < η < 1

, whereas Equation (3) imposes no such requirement. When the norm of the weight vector reaches

‖w (k)‖ < 1

, we have

1 / {‖w (k)‖}^{4} > 1

, at which point it already violates the requirement

η < 1

. The learning rate is a parameter of paramount importance for algorithmic convergence. An inappropriately low value leads to prohibitively slow progress, whereas an excessively high value adversely induces oscillatory or divergent behavior. The subsequent analysis shows that the norm of the weight vector monotonically rises throughout the iterative process. This implies that, during iteration process, the learning factor in this algorithm is a monotonically decreasing sequence. Unlike a fixed learning rate, our algorithm employs a dynamically self-adjusting learning factor that optimizes the trade-off between convergence speed and stability throughout the iterative process, thereby ensuring robust performance and enhanced practicality.

This section examines the overall computational complexity of the online algorithm, with a detailed cost breakdown provided in Table 1 for clarity. The resulting requirement is 5n + 3 multiplications per update, placing the overall complexity on the same order as that of the algorithms in [,]. Relying solely on elementary vector addition and multiplication, the proposed algorithm is naturally suited for an efficient systolic array implementation.

Table 1. Computational complexity.

For a stationary stochastic input signal

q (k)

, the dynamics of (3) can be analyzed using the DDT approach. The DDT system is derived by applying the conditional expectation

E {w (k + 1) / w (0), p (i), i < k}

to (3) and defining the resultant expectation as the subsequent iterate.

w (k + 1) = w (k) + \frac{1}{{‖w (k)‖}^{4}} [w^{T} (k) w (k) R w (k) - w^{T} (k) R w (k) w (k)]

(4)

where

R = E [q (k) q^{T} (k)]

is the autocorrelation matrix of the input signal.

The norm of the weight vector at time

k + 1

could be calculated according to Equation (4).

\begin{array}{l} {‖w (k + 1)‖}^{2} \\ = w^{T} (k + 1) w (k + 1) \\ = {\{w (k) + \frac{1}{{‖w (k)‖}^{4}} [w^{T} (k) w (k) R w (k) - w^{T} (k) R w (k) w (k)]\}}^{T} \times \\ \{w (k) + \frac{1}{{‖w (k)‖}^{4}} [w^{T} (k) w (k) R w (k) - w^{T} (k) R w (k) w (k)]\} \\ = {‖w (k)‖}^{2} + \frac{{‖G (w (k))‖}^{2}}{{‖w (k)‖}^{8}} \end{array}

(5)

where

G (w (k)) = w^{T} (k) w (k) R w (k) - w^{T} (k) R w (k) w (k)

. Comparing the magnitudes of the weight vectors at two consecutive iteration steps yields:

\begin{matrix} \frac{{‖w (k + 1)‖}^{2}}{{‖w (k)‖}^{2}} & = \frac{{‖w (k)‖}^{2} + {‖G (w (k))‖}^{2} / {‖w (k)‖}^{8}}{{‖w (k)‖}^{2}} \\ = 1 + \frac{1}{{‖w (k)‖}^{10}} {‖G (w (k))‖}^{2} \\ > 1 \end{matrix}

(6)

From Equation (6), it could be concluded that the norm of the weight vector

‖w (k)‖

exhibits monotonic increase during the algorithm’s iteration. Consequently, the adaptive learning factor

1 / {‖w (k)‖}^{4}

decreases monotonically, which is consistent with the algorithm’s requirement for the variation in the learning factor.

3. Convergence of the Algorithm

3.1. Preliminaries

Assume that scalar

λ_{i} (i = 1, 2, \dots, n)

and vectors

v_{i} (i = 1, 2, \dots, n)

are the eigenvalues and mutually orthogonal eigenvectors, respectively, of the signal autocorrelation matrix

R

. For convenience of use, the eigenvalues are arranged here in descending order, namely:

λ_{1} > λ_{2} > \dots > λ_{n} > 0

(7)

The order of the eigenvectors is consistent with that of their respective eigenvalues. In accordance with signal processing theory [], the eigenvector

v_{1}

associated with the largest eigenvalue

λ_{1}

of a matrix

R

represents the first principal component of the signal. The properties of linear independence and mutual orthogonality allow the eigenvectors

v_{i} (i = 1, 2, \dots, n)

to serve as an orthogonal basis for the space

R^{n \times n}

. Any vector in the space

R^{n}

could be represented as a linear combination of this basis.

w (k) = \sum_{i = 1}^{n} z_{i} (k) v_{i}, i = 1, 2, \dots, n

(8)

where

z_{i} (k), i = 1, 2, \dots, n

represents the coefficient for each basis. Utilizing the properties of matrix eigendecomposition, we have:

R w (k) = \sum_{i = 1}^{n} λ_{i} z_{i} (k) v_{i}, i = 1, 2, \dots, n

(9)

Equation (8) is substituted into Equation (4) to obtain

z_{i} (k + 1) = z_{i} (k) + \frac{1}{{‖w (k)‖}^{4}} [λ_{i} {‖w (k)‖}^{2} - w^{T} (k) R w (k)] z_{i} (k)

(10)

By the property of the Rayleigh quotient, it follows that:

0 < λ_{n} w^{T} w \leq w^{T} R w \leq λ_{1} w^{T} w

(11)

where

R

is a symmetric positive definite matrix and

w^{T} (k) v_{1} \neq 0

. Equation (11) is substituted into Equation (10) to obtain

|z_{1} (k + 1)| = |1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{1} {‖w (k)‖}^{2} - w^{T} (k) R w (k)]| |z_{1} (k)| \geq |z_{1} (k)|

(12)

The above relation indicates that

{|z_{1} (k)|}

forms a monotonically increasing sequence. That is, the norm

|z_{1} (k)|

of the projection of the weight vector

w (k)

onto the principal component

|z_{1} (k)|

exhibits monotonic increase throughout the algorithm’s iteration process.

The constant

γ = \sqrt{λ_{1} - λ_{n}}

is defined. Given that

‖w (k)‖

is monotonically increasing, the convergence analysis of the algorithm proceeds by considering the following two cases:

Case 1. For all

k \geq 0

,

‖w (k)‖ < γ

holds.

Case 2. There exists a constant

M

such that

‖w (M)‖ \geq γ

, where

M

is a positive integer.

3.2. Convergence Analysis of the Algorithm in Case 1

The convergence of

‖w (k)‖

in this scenario is governed by the following theorem.

Theorem 1.

Under Case 1 with the initialization

w^{T} (0) v_{1} \neq 0

, it follows that

\lim_{k \to \infty} w (k) / ‖w (k)‖ = \pm v_{1}

.

Proof.

|z_{1} (0)| \neq 0

could be obtained from

w^{T} (0) v_{1} \neq 0

.

‖w (k)‖

has an upper bound in Case 1, so

\{|z_{1} (k)|\}

also has an upper bound and the monotonically increasing sequence

\{|z_{1} (k)|\}

will converge to a constant value.

From

w^{T} (0) v_{1} \neq 0

, it follows that

|z_{1} (0)| \neq 0

. Under Case 1,

‖w (k)‖

is bounded above. Consequently,

|z_{1} (k)|

is also bounded above. Therefore, the monotonically increasing sequence

\{|z_{1} (k)|\}

must converge to a constant value. From Equations (10)–(12), it could be deduced that during the iteration process, the sign of

\{z_{1} (k)\}

remains unchanged. That is, if

z_{1} (0) > 0

, then

z_{1} (k) > 0

holds for all

k \geq 0

; conversely, if

z_{1} (0) < 0

holds for all

k \geq 0

, then

z_{1} (k) < 0

. Hence, when

\{|z_{1} (k)|\}

converges,

\{z_{1} (k)\}

must also converge. Supposing

\lim_{k \to \infty} z_{1} (k) = a

(13)

where

a

is a constant. According to

\lim_{k \to \infty} \frac{z_{1} (k + 1)}{z_{1} (k)} = \lim_{k \to \infty} \{1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{1} {‖w (k)‖}^{2} - w^{T} (k) R w (k)]\} = 1

(14)

we get

\lim_{k \to \infty} \frac{1}{{‖w (k)‖}^{4}} [λ_{1} {‖w (k)‖}^{2} - w^{T} (k) R w (k)] = 0

(15)

From

‖w (0)‖ \neq 0

and Equation (8), we can deduce:

\begin{array}{l} \lim_{k \to \infty} [λ_{1} {‖w (k)‖}^{2} - w^{T} (k) R w (k)] \\ = \lim_{k \to \infty} [λ_{1} \sum_{i = 1}^{n} z_{i}^{2} (k) - \sum_{i = 1}^{n} λ_{i} z_{i}^{2} (k)] \\ = \lim_{k \to \infty} \sum_{i = 2}^{n} (λ_{1} - λ_{i}) z_{i}^{2} (k) \\ = 0 \end{array}

(16)

That is, when

i = 2, 3, \dots, n

, there is

\lim_{k \to \infty} z_{i} (k) = 0

(17)

It could be obtained by combining Equations (13) and (17).

\lim_{k \to \infty} w (k) = \lim_{k \to \infty} \sum_{i = 2}^{n} z_{i} (k) v_{i} + \lim_{k \to \infty} (z_{1} (k) v_{1}) = a v_{1}

(18)

Therefore,

\lim_{k \to \infty} w (k) / ‖w (k)‖ = \pm (a / |a|) v_{1} = \pm v_{1}

. □

3.3. Convergence Analysis of the Algorithm in Case 2

In this case, the convergence of

w (k)

will still be established by analyzing the convergence of

z_{i} (k)

. The following three lemmas will be proved for later use.

Lemma 1.

Under the premises of Case 2 with

w^{T} (0) v_{1} \neq 0

, the two inequalities below hold for all

k \geq M

.

\{\begin{cases} 1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{i} {‖w (k)‖}^{2} - w^{T} (k) R w (k)] > 0 \\ ‖w (k)‖ < ‖w (M)‖ + (k - M) \sqrt{λ_{1} - λ_{n}} \end{cases}

(19)

Proof.

Under Case 2, when

k \geq M

, the condition

‖w (k)‖ \geq \sqrt{λ_{1} - λ_{n}}

is satisfied, we obtain

\begin{array}{l} 1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{i} {‖w (k)‖}^{2} - w^{T} (k) R w (k)] \\ > 1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{n} {‖w (k)‖}^{2} - λ_{1} {‖w (k)‖}^{2}] \\ = 1 - \frac{(λ_{1} - λ_{n})}{{‖w (k)‖}^{2}} \\ > 0 \end{array}

(20)

When

k \geq M

, according to Equations (10) and (8) have

\begin{array}{l} {‖w (k + 1)‖}^{2} \\ = \sum_{i = 1}^{n} z_{i}^{2} (k + 1) \\ = \sum_{i = 1}^{n} {\{1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{i} {‖w (k)‖}^{2} - w^{T} (k) R w (k)]\}}^{2} z_{i}^{2} (k) \\ \leq \sum_{i = 1}^{n} {\{1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{1} {‖w (k)‖}^{2} - λ_{n} {‖w (k)‖}^{2}]\}}^{2} z_{i}^{2} (k) \\ \leq \sum_{i = 1}^{n} {\{1 + \frac{1}{{‖w (k)‖}^{2}} (λ_{1} - λ_{n})\}}^{2} z_{i}^{2} (k) \\ = {\{1 + \frac{(λ_{1} - λ_{n})}{{‖w (k)‖}^{2}}\}}^{2} {‖w (k)‖}^{2} \\ = {\{‖w (k)‖ + \frac{(λ_{1} - λ_{n})}{‖w (k)‖}\}}^{2} \\ \leq {\{‖w (k)‖ + \sqrt{λ_{1} - λ_{n}}\}}^{2} \end{array}

(21)

From the above formula, for all

k \geq M

have:

‖w (k + 1)‖ < ‖w (k)‖ + \sqrt{λ_{1} - λ_{n}}

, and then we can get

‖w (k)‖ < ‖w (M)‖ + (k - M) \sqrt{λ_{1} - λ_{n}}

(22)

□

Lemma 2.

In Case 2, if the initialization condition satisfies

w^{T} (0) v_{1} \neq 0

, then

\lim_{k \to \infty} z_{n} (k) = 0

.

Proof.

According to Equation (10), for all

k \geq M

, the following holds.

|z_{n} (k + 1)| = |1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{n} {‖w (k)‖}^{2} - w^{T} (k) R w (k)]| |z_{n} (k)| < |z_{n} (k)|

(23)

This equality indicates that

|z_{n} (k)|

is a monotonically decreasing sequence. Since

‖w (k)‖

is a monotonically increasing sequence, it follows under Case 2 that:

‖w (k)‖ > ‖w (M)‖ \geq \sqrt{λ_{1} - λ_{n}}

(24)

Let

θ (k) = \frac{(λ_{n - 1} - λ_{n}) ({‖w (M)‖}^{2} - z_{n}^{2} (M))}{{(‖w (M)‖ + (k - M) \sqrt{λ_{1} - λ_{n}})}^{4}}

(25)

According to Equations (10) and (8), we can get

\begin{array}{l} 1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{n} {‖w (k)‖}^{2} - w^{T} (k) R w (k)] \\ = 1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{n} \sum_{i = 1}^{n} z_{i}^{2} (k) - \sum_{i = 1}^{n} λ_{i} z_{i}^{2} (k)] \\ = 1 - \frac{1}{{‖w (k)‖}^{4}} \sum_{i = 1}^{n - 1} (λ_{i} - λ_{n}) z_{i}^{2} (k) \\ < 1 - \frac{(λ_{n - 1} - λ_{n})}{{‖w (k)‖}^{4}} \sum_{i = 1}^{n - 1} z_{i}^{2} (k) \\ < 1 - \frac{(λ_{n - 1} - λ_{n})}{{‖w (k)‖}^{4}} ({‖w (k)‖}^{2} - z_{n}^{2} (k)) \\ < 1 - \frac{(λ_{n - 1} - λ_{n})}{{‖w (k)‖}^{4}} ({‖w (M)‖}^{2} - z_{n}^{2} (M)) \\ \leq 1 - θ (k) \end{array}

(26)

According to Equations (23) and (26), we can get

\begin{array}{l} |z_{n} (k + 1)| \\ = |1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{n} {‖w (k)‖}^{2} - w^{T} (k) R w (k)]| |z_{n} (k)| \\ < |z_{n} (k)| |1 - θ (k)| \\ \leq |z_{1} (M)| • \prod_{i = M}^{k} |1 - θ (i)| \end{array}

(27)

Since the sequence

\sum_{k = M}^{\infty} (1 - θ (k))

is divergent, we have

\prod_{i = M}^{\infty} |1 - θ (i)| = 0

, and, thus,

\lim_{k \to \infty} z_{n} (k) = 0

(28)

□

Lemma 3.

Under Case 2, if the initial condition

w^{T} (0) v_{1} \neq 0

is met, then we have

\lim_{k \to \infty} z_{i} (k) = 0

, where

i = 2, 3, \dots, n - 1

.

Proof.

Since

w^{T} (0) v_{1} \neq 0

, it necessarily follows that

|z_{1} (0)| > 0

. Given that

\{|z_{1} (k)|\}

is a monotonically increasing sequence, we necessarily have

|z_{1} (M)| > |z_{1} (0)| > 0

. We now turn to prove Lemma 3 using mathematical induction.

According to Lemma 2, we have

\lim_{k \to \infty} z_{n} (k) = 0

.

Assume that when

i = n, n - 1, \dots, m + 1

,

\lim_{k \to \infty} z_{i} (k) = 0

holds. Then there necessarily exists a positive integer

N

such that for all

k \geq N

, the following two inequalities hold:

\{\begin{cases} (λ_{1} - λ_{m}) z_{1}^{2} (M) > \sum_{i = m + 1}^{n} (λ_{m} - λ_{i}) z_{i}^{2} (k) \\ \sum_{i = m + 1}^{n} \frac{(λ_{m - 1} - λ_{i}) z_{i}^{2} (k)}{{‖w (M)‖}^{2}} < \frac{(λ_{m - 1} - λ_{m})}{2} [1 - \frac{z_{m}^{2} (M)}{{‖w (M)‖}^{2}}] \end{cases}

(29)

According to Equations (10) and (29), for all

k \geq \max {M, N}

, there are

\begin{matrix} 1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{m} {‖w (k)‖}^{2} - w^{T} (k) R w (k)] \\ = 1 - \frac{1}{{‖w (k)‖}^{4}} [\sum_{i = 1}^{m} (λ_{i} - λ_{m}) z_{i}^{2} (k) - \sum_{i = m + 1}^{n} (λ_{m} - λ_{i}) z_{i}^{2} (k)] \\ < 1 - \frac{1}{{‖w (k)‖}^{4}} [(λ_{1} - λ_{m}) z_{1}^{2} (M) - \sum_{i = m + 1}^{n} (λ_{m} - λ_{i}) z_{i}^{2} (k)] \\ < 1 \end{matrix}

(30)

then

|z_{m} (k + 1)| = |1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{m} {‖w (k)‖}^{2} - w^{T} (k) R w (k)]| |z_{m} (k)| < |z_{m} (k)|

(31)

The above equation shows that the sequence is monotonically decreasing. According to Lemma 2 and Equations (29)–(31), for all

k \geq \max {M, N}

, we have

\begin{matrix} 1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{m} {‖w (k)‖}^{2} - w^{T} (k) R w (k)] \\ = 1 - \frac{1}{{‖w (k)‖}^{4}} [\sum_{i = 1}^{m - 1} (λ_{i} - λ_{m}) z_{i}^{2} (k) - \sum_{i = m + 1}^{n} (λ_{m} - λ_{i}) z_{i}^{2} (k)] \\ = 1 - \frac{1}{{‖w (k)‖}^{4}} [(λ_{m - 1} - λ_{m}) {‖w (k)‖}^{2} - (λ_{m - 1} - λ_{m}) z_{m}^{2} (k)] + \\ \frac{1}{{‖w (k)‖}^{4}} [(λ_{m - 1} - λ_{m}) \sum_{i = m + 1}^{n} z_{i}^{2} (k) + \sum_{i = m + 1}^{n} (λ_{m} - λ_{i}) z_{i}^{2} (k)] \\ < 1 - \frac{(λ_{m - 1} - λ_{m})}{{‖w (k)‖}^{2}} [1 - \frac{z_{m}^{2} (M)}{{‖w (M)‖}^{2}}] + \\ \frac{1}{{‖w (k)‖}^{2}} [\sum_{i = m + 1}^{n} \frac{(λ_{m - 1} - λ_{i}) z_{i}^{2} (k)}{{‖w (M)‖}^{2}}] \\ < 1 - \frac{(λ_{m - 1} - λ_{m})}{2 {[‖w (M)‖ + (k - M) \sqrt{λ_{1} - λ_{n}}]}^{2}} [1 - \frac{z_{m}^{2} (M)}{{‖w (M)‖}^{2}}] \end{matrix}

(32)

Let

θ^{'} (k) = \frac{(λ_{m - 1} - λ_{m})}{2 {[‖w (M)‖ + (k - M) \sqrt{λ_{1} - λ_{n}}]}^{2}} [1 - \frac{z_{m}^{2} (M)}{{‖w (M)‖}^{2}}]

(33)

According to Equations (12) and (32), for all

k \geq \max {M, N}

there are

\begin{matrix} |z_{m} (k + 1)| \\ = |1 + \frac{1}{{‖w (k)‖}^{4}} [λ_{m} {‖w (k)‖}^{2} - w^{T} (k) R w (k)]| |z_{m} (k)| \\ < |z_{m} (k)| |1 - θ^{'} (k)| \\ < |z_{m} (M)| {|1 - θ^{'}|}^{(k - M)} \end{matrix}

(34)

where

θ^{'} = \max {θ^{'} (i)}, i = M, M + 1, \dots, k

. Since the series

\sum_{k = M}^{\infty} θ^{'} (k)

is divergent, there is

\lim_{k \to \infty} {|1 - θ^{'}|}^{(k - M)} = 0

(35)

Then

\lim_{k \to \infty} z_{m} (k) = 0

(36)

□

Theorem 2.

In Case 2, if the initialization condition satisfies

w^{T} (0) v_{1} \neq 0

, then we can get

\lim_{k \to \infty} w (k) / ‖w (k)‖ = \pm v_{1}

.

Proof.

z_{1} (0) \neq 0

could be obtained from

w^{T} (0) v_{1} \neq 0

. Since

{|z_{1} (k)|}

is a monotonically increasing sequence, we can get

|z_{1} (k)| \geq |z_{1} (0)| > 0

. According to Equation (8) to get

\frac{z_{1} (k)}{‖w (k)‖} = \frac{1}{\sqrt{1 + \sum_{i = 2}^{n} \frac{z_{i}^{2} (k)}{z_{1}^{2} (k)}}} \geq \frac{1}{\sqrt{1 + \sum_{i = 2}^{n} \frac{z_{i}^{2} (k)}{z_{1}^{2} (0)}}}

(37)

The following equation is obtained according to Lemma 2 and Lemma 3.

\lim_{k \to \infty} (1 + \sum_{i = 2}^{n} \frac{z_{i}^{2} (k)}{z_{1}^{2} (0)}) = 1 + \frac{\sum_{i = 2}^{n} \lim_{k \to \infty} z_{i}^{2} (k)}{z_{1}^{2} (0)} = 1

(38)

Substituting

‖w (k)‖ > |z_{1} (k)|

into Equation (37) and according to Equation (38), we obtained

1 = \lim_{k \to \infty} \frac{1}{\sqrt{1 + \sum_{i = 2}^{n} \frac{z_{i}^{2} (k)}{z_{1}^{2} (0)}}} \leq \lim_{k \to \infty} \frac{|z_{1} (k)|}{‖w (k)‖} \leq 1

(39)

Since

z_{1} (k)

does not change the symbol during the iteration, there is

\lim_{k \to \infty} z_{1} (k) / ‖w (k)‖ = \pm 1

, and then

\lim_{k \to \infty} \frac{w (k)}{‖w (k)‖} = \lim_{k \to \infty} \frac{z_{1} (k)}{‖w (k)‖} v_{1} + \lim_{k \to \infty} \frac{1}{‖w (k)‖} \sum_{i = 2}^{n} z_{i} (k) v_{i} = \pm v_{1}

(40)

□

3.4. Convergence Analysis of the Algorithm

Synthesizing the results of Theorems 1 and 2, we derive the following convergence guarantee:

Theorem 3.

For the proposed algorithm, convergence is ensured provided that the initial weight vector satisfies

w^{T} (0) v_{1} \neq 0

, then there is

\lim_{k \to \infty} w (k) / ‖w (k)‖ = \pm v_{1}

.

Theorem 3 outlines the convergence conditions for our algorithm. This algorithm is closely related to the one in [], which, in contrast, has a convergence condition of

η \leq 0.3

and

η λ_{1} \leq 0.25

, while the convergence conditions of the proposed algorithm do not restrict the learning factor. Theorem 3 stipulates the initial condition

w^{T} (0) v_{1} \neq 0

. This condition is straightforward to satisfy in practice, for instance, through random initialization of the weight vector. In the extreme case, if

w^{T} (0) v_{1} = 0

, then we can directly calculate all the eigenvectors by constructing an orthogonal space, which has a very small amount of computation. Due to the wide range of changes in the weight vector modulus during the iteration process, the adaptive learning factor

1 / {‖w (k)‖}^{4}

has a large value space.

4. Simulation Experiment

Three simulation experiments are conducted in this section to validate the performance of the proposed algorithm. Experiment 1 verifies the algorithm’s capability to directly estimate principal components from input signals. Experiment 2 examines the conclusions drawn from the convergence analysis. Experiment 3 assesses the algorithm’s performance under extreme conditions.

4.1. Signal Principal Component Estimation Experiment

For this experimental setup, the input signal is synthesized using the following first-order sliding regression process.

x_{k} = 0.9 x_{k - 1} + e_{k}

(41)

where

e_{k}

is a Gaussian random process with zero mean and unit variance. Eight consecutive, non-overlapping data points form the input signal vector

x_{k}

.The first principal component of this signal is then extracted using Oja’s algorithm [], the PASTd algorithm [], and the proposed algorithm, respectively. To ensure operational stability, both the Oja and PASTd algorithms normalize the weight vector, a feature fundamental to their design. The initialization parameters of the algorithm are set as follows:

η = 0.999

(Oja’s algorithm);

β = 0.999

,

d = 1

(the PASTd algorithm). The proposed algorithm does not need to set parameters. For fair comparison, the three algorithms use the same initialization weight vector.

In the iterative process, the direction cosine between the weight vector and the principal component

v_{1}

is calculated continuously.

D C (k) = |w^{T} (k) v_{1}| / ‖w (k)‖ ‖v_{1}‖

(42)

Equation (42) shows that the weight vector aligns with the principal component direction

v_{1}

as

D C

converges to 1. That is to say, the

D C

represents the algorithm’s ability to extract principal components. In adaptive signal processing, where neural network algorithms are predominantly applied, the convergence speed emerges as the primary performance metric once the algorithm’s

D C

curve approaches 1.

Figure 1 displays the directional cosine curves computed by the three algorithms, representing the average of 30 independent trials. Simulation results illustrate the convergence trajectory of the weight vector

v_{1}

. It asymptotically approaches the principal component direction, reaching it after around 200 iterations, which confirms the algorithm’s principal component extraction capability.

Figure 1. Direction cosine of three algorithms.

A comparative evaluation confirmed that the proposed algorithm was observed to converge more rapidly than the two alternative methods by a significant margin.

4.2. Verification Experiment on Convergence Analysis

The following signal autocorrelation matrix is generated by using the method in [].

R = [\begin{matrix} 4.5620 & 2.1706 & - 1.9118 & - 0.9968 & 0.6339 \\ 2.1706 & 7.0649 & - 0.2731 & 0.3931 & 0.4540 \\ - 1.9118 & - 0.2731 & 2.7555 & - 1.5600 & - 2.5410 \\ - 0.9968 & 0.3931 & - 1.5600 & 4.1923 & - 0.0662 \\ 0.6339 & 0.4540 & - 2.5410 & - 0.0662 & 7.3028 \end{matrix}]

(43)

The maximum and minimum eigenvalues of the matrix

R

are

λ_{1} = 9.6638

and

λ_{5} = 0.0361

, respectively, and its principal component is

v_{1} = {[0.4309, 0.5191, - 0.3842, 0.0608, 0.6273]}^{T}

. The largest principal component of this autocorrelation matrix is estimated using the proposed algorithm (4). During the iteration process, the absolute value of the basis coefficients (i.e.,

|z_{i} (k)| i = 1, 2, \dots, n

), the norm of the weight vector

‖w (k)‖

and the direction cosine are calculated.

Figure 2 and Figure 3 present the simulation results obtained when the initial vector satisfies

w (0) = {[1 . 7717, - 3 . 2042, 0 . 6601, - 2 . 9562, 1 . 5555]}^{T}

. Since

w^{T} (0) v_{1} = - 0.3571 \neq 0

holds, this condition fulfills the convergence requirements specified in Theorem 3. As observed in Figure 2,

|z_{1} (k)|

increases monotonically and converges to a constant value, while

|z_{i} (k)| i = 2, 3, \dots, n

ultimately converges to zero. These findings are consistent with the conclusions derived from the convergence analysis. Figure 3 demonstrates that the directional cosine curve converges to 1, indicating that Algorithm (4) accurately estimates the direction of the principal component. Concurrently, the magnitude

‖w (k)‖

of the weight vector also exhibits monotonic increase, aligning with the analytical conclusions presented in Section 3. This further validates the correctness of Theorem 3.

Figure 2. The absolute value curve of the basis coefficient.

Figure 3. Modulus and direction cosine curve.

4.3. Algorithm Performance Experiment Under Pathological Conditions

The condition number serves as a critical metric for assessing whether a matrix is ill-conditioned. Typically, the larger the condition number, the more ill-conditioned the matrix becomes, imposing more stringent requirements on algorithms. To assess the performance of the proposed algorithm on ill-conditioned matrices, we employ the following symmetric positive definite matrix:

R = [\begin{matrix} 1.7424 & - 1.5653 & - 1.9404 & - 0.9906 & 0.5729 & 0.3557 \\ - 1.5653 & 7.1478 & - 0.5545 & 2.3617 & - 0.8033 & - 2.6692 \\ - 1.9404 & - 0.5545 & 7.0277 & 0.7769 & 0.8321 & 1.2436 \\ - 0.9906 & 2.3617 & 0.7769 & 2.2852 & 0.2421 & - 1.0731 \\ 0.5729 & - 0.8033 & 0.8321 & 0.2421 & 3.2834 & - 1.9452 \\ 0.3557 & - 2.6692 & 1.2436 & - 1.0731 & - 1.9452 & 3.4290 \end{matrix}]

(44)

Computations reveal that the condition number of this matrix is

c o n d (R) = 2.3768 \times 10^{5}

. When employing other algorithms to estimate the principal components of ill-conditioned matrices, a very small learning factor is typically chosen to ensure algorithm convergence; however, this significantly impacts the convergence speed. Here, the proposed algorithm is employed for principal component estimation of this matrix. Figure 4 illustrates the trajectory of the basis coefficient’s absolute value under the initial condition

w_{0} = {[0.1822, 0.0913, 0.3023, 0.2809, - 0.1817, 0.0718]}^{T}

. Similar to Figure 2, the proposed algorithm readily achieves convergence even when handling the ill-conditioned matrix.

Figure 4. Absolute value curve of basis coefficient under ill-conditioned.

5. Conclusions

The choice of learning rate significantly impacts the convergence speed of algorithms. To extend the feasible range of learning rates, a principal component estimation algorithm with an adaptive learning rate is proposed. The convergence analysis of the proposed algorithm is rigorously conducted by considering two distinct cases. The derived convergence conditions impose requirements solely on the initialization of the weight vector, with no restrictions applied to the learning rate. Simulation results demonstrate the algorithm’s functional capability for principal component extraction, while also corroborating the validity of the theoretical convergence analysis. In the future, the proposed algorithm can be used to solve image compression, fault diagnosis and other problems.

Author Contributions

Conceptualization, Y.G.; methodology, Y.G. and H.D.; software, H.D. and Y.Z.; validation, Y.G.; formal analysis, Z.X.; investigation, H.D. and Y.Z.; resources, J.L.; data curation, Y.G.; writing—original draft preparation, Y.G.; writing—review and editing, Y.G.; visualization, K.J.; supervision, Z.X.; project administration, Y.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (62101579, 62106242).

Data Availability Statement

The data that support the findings of this study are available within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Du, K.-L.; Swamy, M. Principal component analysis. In Neural Networks and Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2019; pp. 373–425. [Google Scholar]
Zhou, C.; Gu, Y.; Fan, X.; Shi, Z.; Mao, G.; Zhang, Y.D. Direction-of-arrival estimation for coprime array via virtual array interpolation. IEEE Trans. Signal Process. 2018, 66, 5956–5971. [Google Scholar] [CrossRef]
Bartecki, K. Classical vs. neural network-based PCA approaches for lossy image compression: Similarities and differences. Appl. Soft Comput. 2024, 161, 111721. [Google Scholar] [CrossRef]
Zhang, Q.; Song, C.; Yuan, Y. Fault diagnosis of vehicle gearboxes based on adaptive wavelet threshold and LT-PCA-NGO-SVM. Appl. Sci. 2024, 14, 1212. [Google Scholar] [CrossRef]
Chen, P.; Wu, L.; Peng, D.; Gong, X.; De Albuquerque, V.H.C. An improved minor component analysis algorithm based on convergence analysis of 5G multi-dimensional signals. IEEE Access 2019, 7, 91860–91871. [Google Scholar] [CrossRef]
Qian, G.; Lin, B.; Mei, J.; Qian, J.; Wang, S. Adaptive Learning Algorithm and Its Convergence Analysis with Complex-Valued Error Loss Network. Neural Netw. 2025, 190, 107677. [Google Scholar] [CrossRef]
Cha, G.-W.; Choi, S.-H.; Hong, W.-H.; Park, C.-W. Developing a prediction model of demolition-waste generation-rate via principal component analysis. Int. J. Environ. Res. Public Health 2023, 20, 3159. [Google Scholar] [CrossRef]
Gao, Y.; Dong, H.; Xu, Z.; Li, H.; Li, J.; Yuan, S. Multiple Minor Components Extraction in Parallel Based on Möller Algorithm. Electronics 2025, 14, 4073. [Google Scholar] [CrossRef]
Lei, D.; Zhang, Y.; Lu, Z.; Lin, H.; Fang, B.; Jiang, Z. Slope stability prediction using principal component analysis and hybrid machine learning approaches. Appl. Sci. 2024, 14, 6526. [Google Scholar] [CrossRef]
Liu, Y.; Ruan, X. Parameter-Optimal-Gain-Arguable Iterative Learning Control for Linear Time-Invariant Systems with Quantized Error. Appl. Sci. 2023, 13, 9551. [Google Scholar] [CrossRef]
Zhang, F.; Chen, Z. A Novel Reinforcement Learning-Based Particle Swarm Optimization Algorithm for Better Symmetry between Convergence Speed and Diversity. Symmetry 2024, 16, 1290. [Google Scholar] [CrossRef]
Zhang, Y.; Wu, W.; He, W.; Zhao, N. Algorithm design and convergence analysis for coexistence of cognitive radio networks in unlicensed spectrum. Sensors 2023, 23, 9705. [Google Scholar] [CrossRef]
Naik, G.R.; Selvan, S.E.; Gobbo, M.; Acharyya, A.; Nguyen, H.T. Principal component analysis applied to surface electromyography: A comprehensive review. IEEE Access 2016, 4, 4025–4037. [Google Scholar] [CrossRef]
Kalla, U.K. Minor component analysis based anti-hebbian neural network scheme of decoupled voltage and frequency controller (DVFC) for nanohydro system. In Proceedings of the 2016 IEEE 7th Power India International Conference (PIICON), Bikaner, India, 25–27 November 2016; pp. 1–6. [Google Scholar]
Nguyen, T.D.; Yamada, I. A unified convergence analysis of normalized PAST algorithms for estimating principal and minor components. Signal Process. 2013, 93, 176–184. [Google Scholar] [CrossRef]
Nguyen, V.-D.; Abed-Meraim, K.; Linh-Trung, N.; Weber, R. Generalized minimum noise subspace for array processing. IEEE Trans. Signal Process. 2017, 65, 3789–3802. [Google Scholar] [CrossRef]
Pehlevan, C.; Hu, T.; Chklovskii, D.B. A hebbian/anti-hebbian neural network for linear subspace learning: A derivation from multidimensional scaling of streaming data. Neural Comput. 2015, 27, 1461–1495. [Google Scholar] [CrossRef]
Bartelmaos, S.; Abed-Meraim, K. Fast adaptive algorithms for minor component analysis using Householder transformation. Digit. Signal Process. 2011, 21, 667–678. [Google Scholar] [CrossRef]
Feng, X.; Kong, X.; Duan, Z.; Ma, H. Adaptive generalized eigen-pairs extraction algorithms and their convergence analysis. IEEE Trans. Signal Process. 2016, 64, 2976–2989. [Google Scholar] [CrossRef]
Feng, X.; Kong, X.; Ma, H.; Liu, H. Unified and coupled self-stabilizing algorithms for minor and principal eigen-pairs extraction. Neural Process. Lett. 2017, 45, 197–222. [Google Scholar] [CrossRef]
Zhang, W.-T.; Lou, S.-T.; Feng, D.-Z. Adaptive quasi-Newton algorithm for source extraction via CCA approach. IEEE Trans. Neural Netw. Learn. Syst. 2013, 25, 677–689. [Google Scholar] [CrossRef] [PubMed]
Kong, X.; Hu, C.; Duan, Z. Principal Component Analysis Networks and Algorithms; Springer: Durham, NC, USA, 2017. [Google Scholar]
Tan, K.K.; Lv, J.C.; Yi, Z.; Huang, S. Adaptive multiple minor directions extraction in parallel using a PCA neural network. Theor. Comput. Sci. 2010, 411, 4200–4215. [Google Scholar] [CrossRef]
Nguyen, T.D.; Yamada, I. Necessary and sufficient conditions for convergence of the DDT systems of the normalized PAST algorithms. Signal Process. 2014, 94, 288–299. [Google Scholar] [CrossRef]
Kong, X.; An, Q.; Ma, H.; Han, C.; Zhang, Q. Convergence analysis of deterministic discrete time system of a unified self-stabilizing algorithm for PCA and MCA. Neural Netw. 2012, 36, 64–72. [Google Scholar] [CrossRef] [PubMed]
Miao, Y.; Hua, Y. Fast subspace tracking and neural network learning by a novel information criterion. IEEE Trans. Signal Process. 2002, 46, 1967–1979. [Google Scholar] [CrossRef]
Chen, T.; Amari, S.-I. Amari. Unified stabilization approach to principal and minor components extraction algorithms. Neural Netw. 2001, 14, 1377–1387. [Google Scholar] [CrossRef] [PubMed]
Antoniou, A. Digital Signal Processing; McGraw-Hill: New York, NY, USA, 2016. [Google Scholar]
Wang, R.; Yao, M.; Zhang, D.; Zou, H. Stable and orthonormal OJA algorithm with low complexity. IEEE Signal Process. Lett. 2011, 18, 211–214. [Google Scholar] [CrossRef]
Chan, S.-C.; Tan, H.-J.; Lin, J.-Q.; Liao, B. A new local polynomial modeling based variable forgetting factor and variable regularized PAST algorithm for subspace tracking. IEEE Trans. Aerosp. Electron. Syst. 2018, 54, 1530–1544. [Google Scholar] [CrossRef]

Figure 1. Direction cosine of three algorithms.

Figure 2. The absolute value curve of the basis coefficient.

Figure 3. Modulus and direction cosine curve.

Figure 4. Absolute value curve of basis coefficient under ill-conditioned.

Table 1. Computational complexity.

Terms	Flops
$A_{1} = w^{T} (k) w (k)$	$n$
$p (k) = w^{T} (k) q (k)$	$n$
$A_{3} = A_{1} w (k) p (k)$	$n + 1$
$A_{4} = p^{2} (k) w (k)$	$n + 1$
$A_{5} = (A_{3} - A_{4}) / {‖w (k)‖}^{4}$	$n + 1$
Total	$5 n + 3$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Principal Component Estimation Algorithm with Adaptive Learning Rate and Its Convergence Analysis

Abstract

1. Introduction

2. Adaptive Algorithm Design

3. Convergence of the Algorithm

3.1. Preliminaries

3.2. Convergence Analysis of the Algorithm in Case 1

3.3. Convergence Analysis of the Algorithm in Case 2

3.4. Convergence Analysis of the Algorithm

4. Simulation Experiment

4.1. Signal Principal Component Estimation Experiment

4.2. Verification Experiment on Convergence Analysis

4.3. Algorithm Performance Experiment Under Pathological Conditions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics