Robust Momentum-Enhanced Non-Negative Tensor Factorization for Accurate Reconstruction of Incomplete Power Consumption Data

Shi, Dengyu; Xie, Tangtang

doi:10.3390/electronics14020351

Open AccessArticle

Robust Momentum-Enhanced Non-Negative Tensor Factorization for Accurate Reconstruction of Incomplete Power Consumption Data

by

Dengyu Shi

and

Tangtang Xie

^*

School of Computer Science and Information, Southwest University, Chongqing 400715, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(2), 351; https://doi.org/10.3390/electronics14020351

Submission received: 2 December 2024 / Revised: 7 January 2025 / Accepted: 13 January 2025 / Published: 17 January 2025

(This article belongs to the Special Issue Intelligent Data Analysis and Learning)

Download

Browse Figures

Versions Notes

Abstract

Power consumption (PC) data are fundamental for optimizing energy use and managing industrial operations. However, with the widespread adoption of data-driven technologies in the energy sector, maintaining the integrity and quality of these data has become a significant challenge. Missing or incomplete data, often caused by equipment failures or communication disruptions, can severely affect the accuracy and reliability of data analyses, ultimately leading to poor decision-making and increased operational costs. To address this, we propose a Robust Momentum-Enhanced Non-Negative Tensor Factorization (RMNTF) model, which integrates three key innovations. First, the model utilizes adversarial loss and

L_{2}

regularization to enhance its robustness and improve its performance when dealing with incomplete data. Second, a sigmoid function is employed to ensure that the results remain non-negative, aligning with the inherent characteristics of PC data and improving the quality of the analysis. Finally, momentum optimization is applied to accelerate the convergence process, significantly reducing computational time. Experiments conducted on two publicly available PC datasets, with data densities of 6.65% and 4.80%, show that RMNTF outperforms state-of-the-art methods, achieving an average reduction of 16.20% in imputation errors and an average improvement of 68.36% in computational efficiency. These results highlight the model’s effectiveness in handling sparse and incomplete data, ensuring that the reconstructed data can support critical tasks like energy optimization, smart grid maintenance, and predictive analytics.

Keywords:

non-negative tensor factorization; missing data imputation; power consumption; regularization techniques; high-dimensional data analysis; dimensionality reduction

1. Introduction

Power consumption (PC) data play a crucial role in industrial operations and daily life, serving as the foundation for energy optimization and intelligent management. With the rapid development of Internet of Things (IoT) and smart grid technologies, the collection of PC data has become increasingly important, providing essential support for efficient energy management. High-quality data not only improve load forecasting accuracy [1], optimize demand response strategies [2], and facilitate in-depth analyses of user behavior patterns [3] but also enhance building energy efficiency, directly influencing the sustainability and economic performance of energy systems. However, in practical applications, issues such as sensor or smart meter failures, network interruptions, and data transmission delays often result in missing PC data [4]. Additionally, PC data typically exhibit high dimensionality and incompleteness (HDI) across temporal, spatial, and multi-parameter dimensions. These data gaps significantly affect the precision of load monitoring and energy analyses, posing significant challenges for data-dependent applications such as non-intrusive load monitoring (NILM), user behavior modeling, and energy forecasting [5]. Therefore, effectively imputing missing portions of PC data to improve data integrity and quality has become a critical problem that needs to be addressed.

Current imputation methods face significant limitations when handling PC data, particularly in addressing high-dimensional sparsity and complex temporal dependencies [6]. Traditional approaches, such as mean imputation, linear interpolation, and K-nearest neighbor (KNN), can provide preliminary imputations in simple scenarios but fail to effectively capture the periodic features and inter-device interactions inherent in PC data [7]. While matrix completion methods can handle two-dimensional data, they are constrained by the dimensionality of the information that they can exploit [8]. In contrast, high-dimensional tensors better capture underlying data characteristics; however, tensor decomposition models based on nuclear norm minimization (NNM) require the generation of complete tensors for training, resulting in high computational costs and limiting their application to large-scale sparse datasets [9,10]. Hidden Markov Models (HMMs) offer advantages in capturing changes in power load states and are suitable for identifying and predicting device usage patterns under varying conditions [11]. However, HMMs rely on strong assumptions (e.g., stationarity) and require substantial historical data for training, restricting their practical applicability [12]. Neural network-based imputation methods, such as Autoencoders [13], Recurrent Neural Networks (RNNs) [14], Long Short-Term Memory networks (LSTMs) [15], and Generative Adversarial Networks (GANs) [16], can model complex nonlinear relationships but are highly dependent on large datasets and prone to overfitting when dealing with sparse or high-dimensional data, leading to model instability [17]. Additionally, metadata-based imputation methods improve accuracy and reliability by integrating auxiliary information, such as timestamps and geographic locations [18,19]. Probabilistic graphical models [20] and Bayesian methods [21] further enhance precision and robustness by modeling uncertainties and dependencies within the data. However, these methods typically require complex probabilistic inference, especially on high-dimensional datasets, resulting in significant computational overhead and time-consuming processes.

In recent years, tensor factorization models based on CANDECOMP/PARAFAC (CP) decomposition have gained significant attention for their efficiency in large-scale data imputation [22,23,24]. These models utilize multiple low-dimensional matrices to represent latent features, enabling an accurate reconstruction of the original target tensor [25]. Wu et al. proposed the Fused CP (FCP) decomposition model, which integrates priors such as low rank, sparsity, manifold information, and smoothness, and it was successfully applied to image restoration [26]. Luo et al. enhanced non-negative tensor factorization models by introducing bias terms for QoS data imputation [27]. Wu et al. further proposed a PID controlled tensor factorization model for imputing Dynamically Weighted Directed Networks [28]. Ben Said et al. introduced the Spatiotemporal Tensor Completion Model, leveraging enhanced CP decomposition to repair missing values in urban traffic data by incorporating spatial and temporal features [29]. Overall, tensor factorization models demonstrate promising potential and wide applicability in high-dimensional data completion. Zhu et al. developed the Multi-Task Neural Tensor Factorization (MTNTF) model, combining tensor factorization and neural networks to improve the accuracy and efficiency of traffic flow data imputation through multitask learning and attention mechanisms [30]. Jin et al. proposed a Graph-Aware Tensor Factorization Convolutional Network that utilizes Graph Convolutional Networks (GCNs) as the encoder and tensor factorization as the decoder to more effectively represent and complete knowledge graphs [31]. Deng et al. proposed a wireless network traffic prediction model that combines Bayesian Gaussian Tensor Factorization (BGCP) for imputing missing values and RNNs for traffic prediction [32].

Therefore, to address the issue of missing PC data, this study makes the following main contributions:

Third-Order PC Tensor Construction: This study develops a third-order tensor structure that preserves temporal patterns and effectively models inter-appliance relationships. For example, monitoring the PC of 10 devices in a building at a sampling rate of 1 Hz over a 24 h period results in a tensor with dimensions of $10 \times 86, 400 \times 7$ , representing the number of devices, time steps (86,400 s per day), and 7 consecutive days.
Robust Momentum-Enhanced Non-Negative Tensor Factorization (RMNTF) Model: This study introduces the RMNTF model, enhancing robustness through the integration of adversarial loss and $L_{2}$ regularization. The sigmoid activation function ensures non-negativity constraints, thereby improving the interpretability of the imputation results. Additionally, momentum-based optimization methods accelerate the optimization process, significantly reducing convergence time.
Implementation and Performance Evaluation: This study completes detailed implementation and performance evaluations of the RMNTF algorithm on two publicly available PC datasets. The results demonstrate the model’s ability to provide high-accuracy imputation for missing PC data, outperforming existing methods in handling high-sparsity scenarios.

The structure of this paper is as follows: Section 2 outlines the theoretical foundations; Section 3 describes the proposed model in detail; Section 4 presents the experimental results and evaluates the model’s performance; and, finally, Section 5 summarizes the advantages of the RMNTF model in PC data imputation, discusses its key findings and model limitations, and suggests potential directions for future research.

2. Fundamentals

2.1. Symbol Appointment

The symbols adopted in this paper are summarized in Table 1.

2.2. PC Data Tensorization

This study employs two publicly available PC datasets for an empirical analysis: iAWE [33] and REDD [34]. The iAWE dataset encompasses energy usage data from multiple households, capturing device-level usage patterns. In contrast, the REDD dataset provides high-frequency consumption data from several residential buildings, emphasizing the differentiation of energy usage patterns across various devices. However, both datasets exhibit inherent missing value issues, including both continuous and random missing data, which pose significant challenges to the accuracy of data analyses. This issue is particularly critical in high-temporal-resolution scenarios, where the effective handling of missing data is essential.

In the analysis of PC data, intrinsic correlations may exist among different dimensions—namely, date, time samples, and devices. By representing the data as a low-rank tensor, high-dimensional data can be approximated using a limited number of latent patterns, thereby facilitating data compression and imputation.

Date Dimension: Captures energy consumption variations across different dates, revealing usage patterns on weekdays versus weekends and identifying periodic trends over time [35].
Time Sample Dimension: Encapsulates consumption characteristics at various times of the day, highlighting differences between morning and evening usage patterns and uncovering periodic consumption behaviors for specific devices [36].
Device Dimension: Represents the array of devices within the monitored environment, enabling an analysis of inter-device correlations. For example, the usage of certain devices (e.g., computers) may be highly correlated with the operation of others (e.g., air conditioners) [37].

To construct the tensor, this study selects three consecutive weeks of PC data from the iAWE and REDD datasets. The datasets contain a proportion of missing values, and only a subset of the known data is utilized in the tensor construction to simulate a high sparsity rate. The resulting tensor comprises three dimensions: device, time sample, and date. Specifically, the device dimension includes 12 m from the iAWE dataset and 18 m from the REDD dataset; the time sample dimension corresponds to 86,400 sampling points per day, indicating a one-second sampling rate; and the date dimension spans three weeks (21 days) of data. This multidimensional representation effectively captures inter-device correlations and periodic data characteristics, enabling the application of advanced tensor decomposition techniques to uncover latent patterns and relationships. Detailed information about the tensor, including the number of meters, data sampling time range, number of time samples, and other specific attributes for each dataset, is provided in Table 2.

Definition 1.

(HDI PC tensor): Consider three sets I, J, and K, representing time steps, appliances, and days, respectively. Let

Y

denote a tensor of dimensions

| I | \times | J | \times | K |

. Each

y_{i j k}

element in

Y

represents the PC of appliance

j \in J

at time

i \in I

on day

k \in K

. The

Y

tensor is divided into two parts: known data Λ and missing or unknown data Γ. When the amount of known data is much smaller than that of unknown data, i.e.,

| Λ | ≪ | Γ |

, and the tensor has more than two dimensions, the

Y

tensor is referred to as an HDI tensor. Figure 1 illustrates the process of modeling PC data as this tensor.

To better understand the structure of

Y

, three types of slices can be considered:

Horizontal slice: Fixing $k \in K$ , a horizontal slice $Y (:, :, k)$ represents the PC of all appliances across all time steps on a specific day.
Vertical slice: Fixing $j \in J$ , a vertical slice $Y (:, j, :)$ represents the PC of a specific appliance across all time steps and days.
Frontal slice: Fixing $i \in I$ , a frontal slice $Y (i, :, :)$ represents the PC of all appliances across all days at a specific time step.

2.3. TF for Missing PC Data Imputation

Definition 2.

(Rank-one tensor): A three-dimensional tensor

H

of size

| I | \times | J | \times | K |

is defined as a rank-one tensor if each element

h_{i j k}

can be expressed as the product of three scalars:

\begin{matrix} h_{i j k} = a_{i} b_{j} c_{k}, \end{matrix}

(1)

where

a_{i}

,

b_{j}

, and

c_{k}

are elements of the vectors a, b, and c, respectively, with lengths

| I |

,

| J |

, and

| K |

. In this case,

H

can be represented as the outer product of the three vectors:

\begin{matrix} H = a \circ b \circ c . \end{matrix}

(2)

Definition 3.

(Tensor Factorization (TF)): In this paper, TF is constructed based on CP decomposition (CPD), approximating the target tensor

Y

as

X

, which is the sum of R rank-one tensors, where R is the estimated rank of

X

. As illustrated in Figure 2, TF approximates

Y

using latent factor matrices (LFs) to construct a rank-R tensor

X

.

Thus, the approximation tensor

X

can be expressed as

\begin{matrix} X = \sum_{r = 1}^{R} H_{r}, \end{matrix}

(3)

where each

H_{r}

is a rank-one tensor, which is represented by the outer product of the r-th column vectors of the LFs

U^{| I | \times R}

,

O^{| J | \times R}

, and

W^{| K | \times R}

as follows:

\begin{matrix} H_{r} = U_{, r} \circ O_{, r} \circ W_{, r} . \end{matrix}

(4)

By Definitions 2 and 3, each entry

x_{i j k}

in

X

can be written as

x_{i j k} = \sum_{r = 1}^{R} h_{(r) i j k} = \sum_{r = 1}^{R} u_{i r} o_{j r} w_{k r},

(5)

where

u_{i r}

,

o_{j r}

, and

w_{k r}

are the elements of the latent factor vectors

U_{, r}

,

O_{, r}

, and

W_{, r}

, respectively. Note that determining the rank R is an NP-hard problem, so R is typically predefined to simplify computation.

To quantify the difference between

Y

and

X

, an objective function

ε

is formulated based on the Euclidean distance. Given that

Y

contains only a limited set of known entries

Λ

, the objective function is defined as

ε = \frac{1}{2} \sum_{y_{i j k} \in Λ} {(y_{i j k} - x_{i j k})}^{2}

(6)

To accurately capture the non-negativity of PC data, this study incorporates non-negativity constraints on the LFs within the tensor factorization model. Specifically, (5) is substituted into (6), and non-negativity constraints are added to the objective function, thereby transforming (6) into

\begin{matrix} ε = \frac{1}{2} \sum_{y_{i j k} \in Λ} {(y_{i j k} - \sum_{r = 1}^{R} u_{i r} o_{j r} w_{k r})}^{2}, \\ s . t . \forall i \in I, j \in J, k \in K, r \in {1, 2, 3, \dots, R}, u_{i r} \geq 0, o_{j r} \geq 0, w_{k r} \geq 0 . \end{matrix}

(7)

To address the imbalance in the distribution of known entries in

Y

,

L_{2}

regularization is introduced (as suggested in [38,39,40]) to prevent overfitting by increasing the penalty on outliers, thereby enhancing the model’s generalization ability. Consequently, the objective function

ε

is updated as

\begin{matrix} ε = \frac{1}{2} \sum_{y_{i j k} \in Λ_{a}} ({(y_{i j k} - \sum_{r = 1}^{R} u_{i r} o_{j r} w_{k r})}^{2} + λ \sum_{r = 1}^{R} (u_{i r}^{2} + o_{j r}^{2} + w_{k r}^{2})), \\ s . t . \forall i \in I, j \in J, k \in K, r \in {1, 2, 3, \dots, R}, u_{i r} \geq 0, o_{j r} \geq 0, w_{k r} \geq 0 . \end{matrix}

(8)

3. RMNTF Model

3.1. Decoupling Non-Negativity Constraints in Tensor Factorization

In this paper, three decision parameter matrices (DPs) are introduced for the LFs U, O, and W, denoted as

Q_{(U)}^{| I | \times R}

,

Q_{(O)}^{| J | \times R}

, and

Q_{(W)}^{| K | \times R}

, respectively. By incorporating DPs, the model decouples the optimization process from the non-negativity constraints, enabling more efficient learning while maintaining flexibility in the optimization of the LFs.

The LFs are generated through a mapping function f applied to the elements of the DPs, defined as

\begin{matrix} u_{i r} = f (Q_{(U) i r}), o_{j r} = f (Q_{(O) j r}), w_{k r} = f (Q_{(W) k r}), \\ s . t . \forall i \in I, j \in J, k \in K, r \in {1, 2, \dots, R}, \end{matrix}

(9)

where

Q_{{(U)}_{i r}}

,

Q_{{(O)}_{j r}}

, and

Q_{{(W)}_{k r}}

are elements of the DPs

Q_{(U)}

,

Q_{(O)}

, and

Q_{(W)}

, respectively. Element-wise mapping ensures that the generated LFs automatically satisfy the non-negativity constraints.

To ensure non-negativity, the mapping function f, based on [41], must satisfy the following conditions:

\begin{matrix} \forall x \in R : \{\begin{matrix} f (x) \geq 0, & (Non-negativity) \\ \exists x = f^{- 1} (y), & (Invertibility) \\ f^{'} (x) \neq 0, & (Non-zero derivative) \end{matrix} \end{matrix}

(10)

Based on these conditions, the sigmoid function is selected as the mapping function, defined as

\begin{matrix} f (x) = \frac{1}{1 + e^{- x}}, f^{'} (x) = f (x) \cdot (1 - f (x)) . \end{matrix}

(11)

The derivative

f^{'} (x)

approaches zero as

| x | \to \infty

but never equals zero, thus satisfying all necessary conditions.

The flexibility of the sigmoid function allows the DPs

Q_{(U)}

,

Q_{(O)}

, and

Q_{(W)}

to take values freely in the real number domain while ensuring that the LFs are non-negative. This characteristic guarantees that the reconstructed tensor

X

remains non-negative, consistent with the physical significance of PC data. Furthermore, nonlinear mapping enhances the model’s ability to capture complex relationships and patterns, thereby improving overall representation performance.

As a result, the objective function

ε

, which appears in (8), is reformulated as

\begin{matrix} ε = \frac{1}{2} \sum_{y_{i j k} \in Λ} ({(y_{i j k} - x_{i j k})}^{2} + λ \sum_{r = 1}^{R} (f {(Q_{(U) i r})}^{2} + f {(Q_{(O) j r})}^{2} + f {(Q_{(W) k r})}^{2})), \\ s . t . \forall i \in I, j \in J, k \in K, r \in {1, 2, \dots, R} . \end{matrix}

(12)

3.2. Adversarial Loss Regularization for Enhanced Robustness

To enhance the robustness and generalization ability of the model, an adversarial loss regularization term is introduced into the objective function. By adding carefully designed perturbations to the true values, the model simulates worst-case input variations, forcing it to learn features that are robust to these disturbances. This approach improves the model’s performance in diverse and noisy environments [42,43].

The first step is to generate adversarial samples by maximizing the following objective function, which aims to create perturbations by maximizing the error between the observed and reconstructed samples:

\max L = \max \frac{1}{2} \sum_{y_{i j k} \in Λ} {(y_{i j k} - x_{i j k})}^{2} .

(13)

Maximizing this objective function generates perturbations that increase the error between the perturbed samples and the model’s predictions.

For each observed value

y_{i j k}

, perturbations are generated using the Fast Gradient Sign Method (FGSM) [42]. The FGSM computes the gradient of the objective function with respect to the observed value to determine the direction and magnitude of the perturbation. Specifically, the perturbed value

{\tilde{y}}_{i j k}

is given by

{\tilde{y}}_{i j k} = y_{i j k} + δ \cdot sign (\frac{\partial L}{\partial y_{i j k}}) = y_{i j k} + δ \cdot sign (e_{i j k}),

(14)

where

δ

is a hyperparameter controlling the magnitude of the perturbation, and

e_{i j k} = y_{i j k} - x_{i j k}

is the error between the observed value and the reconstructed value, indicating deviation from the model’s reconstruction.

The sign function

sign (e_{i j k})

plays a crucial role in perturbation generation, determining the direction and magnitude of the perturbation. Specifically, the sign function operates as follows:

If $e_{i j k} > 0$ , i.e., $y_{i j k} > x_{i j k}$ , then $sign (e_{i j k}) = 1$ , meaning that the perturbation is added to $y_{i j k}$ (i.e., $y_{i j k}$ is increased).
If $e_{i j k} < 0$ , i.e., $y_{i j k} < x_{i j k}$ , then $sign (e_{i j k}) = - 1$ , meaning that the perturbation is subtracted from $y_{i j k}$ (i.e., $y_{i j k}$ is decreased).
If $e_{i j k} = 0$ , then no perturbation is applied.

This method ensures that the perturbation is applied in the direction of the model’s prediction error, maximizing the model’s ability to adapt to worst-case input scenarios.

Based on the perturbed reconstructed value

{\tilde{y}}_{i j k}

, the adversarial regularization term

L_{adv}

is defined to incorporate the perturbation’s effect into the model’s training. This regularization term measures the difference between the perturbed sample and the reconstructed data, and it is given by

L_{adv} = \frac{1}{2} \sum_{y_{i j k} \in Λ} {({\tilde{y}}_{i j k} - x_{i j k})}^{2} = \frac{1}{2} \sum_{y_{i j k} \in Λ} {(e_{i j k} + δ \cdot sign (e_{i j k}))}^{2} .

(15)

The purpose of this regularization term is to dynamically adjust the instance error through perturbations, thereby enhancing the model’s sensitivity to disturbances.

By adding the regularization term (15) into the objective function (12), the adversarially regularized objective function

\tilde{ε}

becomes

\tilde{ε} = ε + λ_{b} L_{adv},

(16)

where

λ_{b}

is a hyperparameter controlling the weight of the adversarial regularization term, balancing the influence of the original error and the adversarial loss. By minimizing

\tilde{ε}

, the model not only reduces the prediction error on the original data but also strengthens its robustness to perturbed data. Figure 3 presents the procedure for perturbing the

Y

tensor to generate the perturbed tensor

\tilde{Y}

.

3.3. Stochastic Gradient Descent (SGD)-Based Tensor Factorization Model

According to [27,28,44], SGD reduces computational complexity by iteratively optimizing the gradient of individual samples (or mini-batches), making it particularly suitable for high-dimensional and large-scale tensor factorization. Additionally, its simplicity and ease of implementation further enhance its applicability. Therefore, this study employs SGD to minimize the objective function in (16), with the update rules specified as follows:

\begin{matrix} \arg m i n \tilde{ε} (Q_{(U)}, & Q_{(O)}, Q_{(W)}) \overset{SGD}{⟶} \forall i \in I, j \in J, k \in K, r \in {1, \dots, R}, \\ Q_{(U) i r}^{t + 1} \leftarrow Q_{(U) i r}^{t} - η \frac{\partial {\tilde{ε}}_{i j k}^{t}}{\partial Q_{(U) i r}^{t}}; \\ Q_{(O) j r}^{t + 1} \leftarrow Q_{(O) j r}^{t} - η \frac{\partial {\tilde{ε}}_{i j k}^{t}}{\partial Q_{(O) j r}^{t}}; \\ Q_{(W) k r}^{t + 1} \leftarrow Q_{(W) k r}^{t} - η \frac{\partial {\tilde{ε}}_{i j k}^{t}}{\partial Q_{(W) k r}^{t}}, \end{matrix}

(17)

where

{\tilde{ε}}_{i j k}

is the element-wise error term in the objective function

\tilde{ε}

, representing the contribution of the error at position

(i, j, k)

to the entire objective function.

The detailed computation of the gradients of

{\tilde{ε}}_{i j k}

with respect to the DPs

Q_{(U)}

,

Q_{(O)}

, and

Q_{(W)}

in (17) is as follows:

\begin{matrix} \frac{\partial {\tilde{ε}}_{i j k}^{t}}{\partial Q_{(U) i r}^{t}} = & f^{'} (Q_{(U) i r}^{t}) (λ_{a} f (Q_{(U) i r}^{t}) - (e_{i j k}^{t} + λ_{b} ({\tilde{y}}_{i j k} - x_{i j k})) f (Q_{(O) j r}^{t}) f (Q_{(W) k r}^{t})), \\ \frac{\partial {\tilde{ε}}_{i j k}^{t}}{\partial Q_{(O) j r}^{t}} = & f^{'} (Q_{(O) j r}^{t}) (λ_{a} f (Q_{(O) j r}^{t}) - (e_{i j k}^{t} + λ_{b} ({\tilde{y}}_{i j k} - x_{i j k})) f (Q_{(U) i r}^{t}) f (Q_{(W) k r}^{t})), \\ \frac{\partial {\tilde{ε}}_{i j k}^{t}}{\partial Q_{(W) k r}^{t}} = & f^{'} (Q_{(W) k r}^{t}) (λ_{a} f (Q_{(W) k r}^{t}) - (e_{i j k}^{t} + λ_{b} ({\tilde{y}}_{i j k} - x_{i j k})) f (Q_{(U) i r}^{t}) f (Q_{(O) j r}^{t})) . \end{matrix}

(18)

3.4. Incorporating Momentum in TF

The Momentum Method is a technique used to accelerate the optimization of gradient descent [45,46]. By introducing the concept of “momentum” in parameter updates, it reduces oscillations during training and speeds up convergence. The core idea of the Momentum Method is to apply an exponentially weighted moving average of past gradients to the current gradient update. This means that, when updating parameters, it not only considers the current gradient but also accumulates information from previous gradients, thus accelerating in steep directions and reducing oscillations in winding paths.

The update formulas of the Momentum Method are as follows:

\begin{matrix} M^{0} = 0; \\ M^{t} = γ M^{t - 1} + η \nabla (θ^{t - 1}); \\ θ^{t} = θ^{t - 1} - M^{t}, \end{matrix}

(19)

where

θ^{t}

represents the parameter value at the t-th iteration, and

M^{t}

represents the updated “momentum”, which is the weighted sum of the current gradient and the historical gradients.

To accelerate convergence in solving the tensor factorization model, the SGD algorithm is improved. By combining (17) and (19), the update rule for the decision parameter matrix

Q_{(U) i r}

based on the Momentum Method is given as

\begin{matrix} M_{Q_{(U) i r}}^{0} = 0, \\ M_{Q_{(U) i r}}^{t + 1} = γ M_{Q_{(U) i r}}^{t} + η \frac{\partial {\tilde{ε}}_{i j k}^{t}}{\partial Q_{(U) i r}}, \\ Q_{(U) i r}^{t + 1} = Q_{(U) i r}^{t} - M_{Q_{(U) i r}}^{t + 1} . \end{matrix}

(20)

Since the momentum-based update rules of

Q_{{(S)}_{j r}}

and

Q_{{(W)}_{k r}}

are similar to those of

Q_{{(U)}_{i r}}

, they are not presented here. In this paper, three auxiliary matrices,

M_{Q_{(U)}}^{| I | \times R}

,

M_{Q_{(O)}}^{| J | \times R}

, and

M_{Q_{(W)}}^{| K | \times R}

, are used to record the historical momenta of the DPs

Q_{(U)}

,

Q_{(O)}

, and

Q_{(W)}

. Based on (17) and (20), the detailed update rules of the DPs are as follows:

\{\begin{matrix} M_{Q_{(U) i r}}^{t + 1} = γ M_{Q_{(U) i r}}^{t} + η \frac{\partial {\tilde{ε}}_{i j k}^{t}}{\partial Q_{(U) i r}^{t}}, Q_{(U) i r}^{t + 1} = Q_{(U) i r}^{t} - M_{Q_{(U) i r}}^{t + 1}; \\ M_{Q_{(O) j r}}^{t + 1} = γ M_{Q_{(O) j r}}^{t} + η \frac{\partial {\tilde{ε}}_{i j k}^{t}}{\partial Q_{(O) j r}^{t}}, Q_{(O) j r}^{t + 1} = Q_{(O) j r}^{t} - M_{Q_{(O) j r}}^{t + 1}; \\ M_{Q_{(W) k r}}^{t + 1} = γ M_{Q_{(W) k r}}^{t} + η \frac{\partial {\tilde{ε}}_{i j k}^{t}}{\partial Q_{(W) k r}^{t}}, Q_{(W) k r}^{t + 1} = Q_{(W) k r}^{t} - M_{Q_{(W) k r}}^{t + 1} . \end{matrix}

(21)

The RMNTF model is fully presented as described above.

3.5. Algorithm Design and Analysis

Building on the above, this paper presents Algorithm 1, which aims to optimize model performance by updating the DPs. This process involves calculating historical momentum and applying dynamic error perturbations. The computational complexity is

\begin{matrix} C & = Θ ((|I| + |J| + |K|) \times 2 R + t \times |Λ| \times 2 R) \\ \approx Θ (t \times |Λ| \times R) . \end{matrix}

(22)

Algorithm 1 RMNTF
	Input: $I, J, K, R, Λ, η, λ_{a}, λ_{b}, δ$
	output: $f (Q_{(U)}), f (Q_{(O)}), f (Q_{(W)})$
	Operation		Cost
	1:	Initialize $Q_{(U)}^{\| I \| \times R}, Q_{(O)}^{\| J \| \times R}, Q_{(W)}^{\| K \| \times R}$ with random numbers	$Θ ((\|I\| + \|J\| + \|K\|) \times R)$
	2:	Initialize $M_{(U)}^{\| I \| \times R}, M_{(O)}^{\| J \| \times R}, M_{(W)}^{\| K \| \times R}$ with zeros	$Θ ((\|I\| + \|J\| + \|K\|) \times R)$
	3:	Initialize $t = 1, T = m a x_i t e r a t i o n_c o u n t$	$Θ (1)$
	4:	while $t < T$ $and not$ converge $do$	$\times t$
	5:	for $y_{i j k}$ $i n$ $Λ$ do	$\times Θ (\|Λ\|)$
	6:	$x_{i j k} = \sum_{r = 1}^{R} f (Q_{{(U)}_{i r}}) \cdot f (Q_{{(O)}_{j r}}) \cdot f (Q_{{(W)}_{k r}})$	$Θ (R)$
	7:	Compute ${\tilde{y}}_{i j k}$ using (14)	$Θ (1)$
	8:	for $r = 1$ to R do	$\times R$
	9:	$M_{Q_{(U) i r}} = γ M_{Q_{(U) i r}} + η \frac{\partial {\tilde{ε}}_{i j k}}{\partial Q_{(U) i r}}$	$Θ (1)$
	10:	$M_{Q_{(O) j r}} = γ M_{Q_{(O) j r}} + η \frac{\partial {\tilde{ε}}_{i j k}}{\partial Q_{(O) j r}}$	$Θ (1)$
	11:	$M_{Q_{(W) k r}} = γ M_{Q_{(W) k r}} + η \frac{\partial {\tilde{ε}}_{i j k}}{\partial Q_{(W) k r}}$	$Θ (1)$
	12:	$Q_{(U) i r} = Q_{(U) i r} - M_{Q_{(U) i r}}$	$Θ (1)$
	13:	$Q_{(O) j r} = Q_{(O) j r} - M_{Q_{(O) j r}}$	$Θ (1)$
	14:	$Q_{(W) k r} = Q_{(W) k r} - M_{Q_{(W) k r}}$	$Θ (1)$
	15:	end for	−
	16:	end for	−
	17:	end while	−

In practical scenarios (as shown in Table 2),

(| Λ |) ≫ \max {| I |, | J |, | K |}

. Thus, lower-order terms and coefficients are omitted to derive (22). Furthermore, given that n and R are both positive, the computational complexity of RMNTF can be inferred to be linear with respect to the number of known elements

| Λ |

.

The storage complexity is primarily determined by three factors: (1) the DPs

Q_{(U)},

Q_{(O)},

and

Q_{(W)}

and their corresponding LFs

U,

O, and W; (2) the auxiliary matrices

M_{(U)}, M_{(O)}, M_{(W)}

used to store the historical momentum information of the DPs; and (3) the entries in the set

| Λ |

, along with their corresponding reconstructed values. Therefore, the storage complexity of the model is given by

\begin{matrix} S & = Θ ((| I | + | J | + | K |) \times 3 R + 2 \times | Λ |) \\ \approx (| I | + | J | + | K |) \times R + | Λ | . \end{matrix}

(23)

In (23), the storage complexity of the RMNTF model primarily depends on the number of known tensor entries and the dimensionality of latent factors, both of which exhibit a linear relationship.

4. Experimental Results and Discussion

4.1. Experimental Setup and Evaluation

4.1.1. Dataset Normalization

Excessively large PC values may lead to numerical instability or overflow during optimization. To ensure stable and robust updates, the datasets are normalized by rescaling the values to the range [0, 10] using the following formula:

y = 10 \times \frac{y - y_{\min}}{y_{\max} - y_{\min}} .

(24)

4.1.2. Traning Setting

All experiments were conducted on a system with an Intel Core i7-10700 processor (2.9 GHz) and 16 GB of RAM, running Python 3.11.5 as the primary programming environment. For the comparison models, those using neural networks were accelerated by GPU support from a cloud platform, specifically an NVIDIA 2080 Ti (11GB VRAM).

To ensure the reliability of the experimental results, the dataset was divided into non-overlapping training (

Φ

), validation (

Ψ

), and testing (

Ω

) sets, as detailed in Table 3. The model optimized the LFs by minimizing the objective function on the training set, with convergence assessed based on validation metrics. Finally, the learned LFs were evaluated on the test set to measure model performance.

To further guarantee fairness and objectivity, the following experimental protocols were implemented:

To reduce random errors and enhance reliability, each experiment was repeated 20 times on each dataset.
Training was terminated under either of the following conditions: the change in the validation error over three consecutive iterations was less than 1 × 10⁻⁶ (indicating convergence), or the maximum number of iterations reached 500.
The latent factor dimension R was fixed at 10 across all TF models to balance computational efficiency and ensure comparability.
All model hyperparameters were optimized using a grid search to achieve optimal performance.

4.1.3. Evaluation Metrics

The performance of PC data imputation reflects the model’s ability to capture inherent patterns in incomplete tensors and accurately fit the data structure and feature distribution. To comprehensively evaluate the model’s performance, two metrics are employed: the Root Mean Square Error (RMSE) and the Mean Absolute Error (MAE) [47,48,49]. The RMSE measures the deviation between predicted and true values, with sensitivity to larger errors, making it suitable for extreme data scenarios. In contrast, the MAE calculates the average absolute differences, providing a more intuitive assessment of overall error levels. Together, these metrics offer a complementary evaluation of the model’s imputation capability, defined as follows:

\begin{matrix} RMSE = \sqrt{\frac{\sum_{y_{i j k} \in D} {(y_{i j k} - x_{i j k})}^{2}}{| D |}}; MAE = \frac{\sum_{y_{i j k} \in D} |y_{i j k} - x_{i j k}|}{| D |}, \end{matrix}

(25)

where

D

may be either

Ψ

or

Ω

.

4.2. Comparison of TF and NTF Model Performance

To investigate the impact of the sigmoid activation function on model performance, a comparison is made between the TF and NTF models. Figure 4 illustrates the performance of both models across different datasets. The following conclusions can be drawn from the results:

Due to the effect of the sigmoid function, the TF and NTF models exhibit different sensitivities to the learning rate $η$ . However, for the $η$ , both models show a decrease in the iteration count, accompanied by a slight increase in imputation errors. For instance, in Figure 4a–d, on the D1.1 dataset, as the $η$ increases from $1 \times 10^{- 4}$ to $6 \times 10^{- 4}$ , TF’s RMSE increases from 0.1279 to 0.1295, with the iteration count decreasing from 327 to 87; MAE rises from 0.0617 to 0.0636, and the iteration count drops from 465 to 96. Similarly, on the D2.1 dataset, as the $η$ increases from 0.1 to 0.6, NTF’s RMSE increases from 0.2258 to 0.2274, with the iteration count decreasing from 109 to 21; the MAE increases from 0.0418 to 0.0426, and the iteration count drops from 132 to 28.
NTF achieves lower imputation errors than TF. As shown in Figure 4a,c, on the D1.2 dataset, when the learning rate $η$ is set to $1 \times 10^{- 4}$ , TF’s RMSE and MAE are 0.1395 and 0.0678, respectively. When $η$ is set to 0.1, NTF’s RMSE and MAE are 0.1356 and 0.0616, showing reductions of 2.80% and 9.14%, respectively $((V a l u e_{h i g h} - v a l u e_{l o w}) / V a l u e_{h i g h})$ . On the D2.2 dataset, TF’s RMSE and MAE are 0.2346 and 0.0440, respectively. When $η$ is set to 0.1, NTF’s RMSE and MAE are 0.2262 and 0.0424, showing improvements of 3.58% and 3.64%, respectively. Similar results are observed for other learning rates and datasets.

4.3. Hyperparameter Sensitivity Analysis

As discussed in Section 4.2, the choice of the hyperparameter

η

is critical for model performance. Additionally, as outlined in Section 3, the model’s performance is also influenced by the

L_{2}

regularization coefficient

λ_{a}

, adversarial regularization coefficient

λ_{b}

, perturbation magnitude

δ

, and momentum coefficient

γ

. This section, therefore, provides an experimental analysis of the impact of these parameters on model performance.

4.3.1. Impacts of $λ_{a}$ and $λ_{b}$

With the other hyperparameters fixed, Figure 5a,b shows the effects of increasing

λ_{a}

from

5 \times 10^{- 5}

to

5 \times 10^{- 3}

on the model imputation error and iteration count. Figure 5c,d illustrates the impact of increasing

λ_{b}

from

1 \times 10^{- 4}

to

1 \times 10^{0}

. Based on these results, the following conclusions can be drawn:

The selection of $λ_{a}$ is critical to model performance. A smaller $λ_{a}$ can result in overfitting, while a larger $λ_{a}$ can degrade the model’s ability to fit the data. As shown in Figure 5a, for D1.1, as $λ_{a}$ increases from $5 \times 10^{- 5}$ to $5 \times 10^{- 4}$ , the RMSE decreases from 0.12546 to 0.1239. However, when $λ_{a}$ is further increased to $5 \times 10^{- 3}$ , the RMSE rises to 0.1317. Similarly, for D2.1, as $λ_{a}$ increases from $5 \times 10^{- 5}$ to $1 \times 10^{- 4}$ and then to $5 \times 10^{- 3}$ , the MAE decreases from 0.0437 to 0.0428 and then increases to 0.0468. Additionally, the number of iterations increases with $λ_{a}$ . As illustrated in Figure 5b, in D1.2, the iteration count for the MAE rises from 138 to 500 as $λ_{a}$ increases from $5 \times 10^{- 5}$ to $5 \times 10^{- 3}$ . In D2.2, for the RMSE, the iteration count increases from 55 to 121 as $λ_{a}$ increases from $5 \times 10^{- 5}$ to $5 \times 10^{- 3}$ .
The appropriate selection of $λ_{b}$ is critical for reducing overfitting and enhancing the robustness of the model, thereby achieving lower imputation errors. As illustrated in Figure 5c, for dataset D1.1, when $λ_{b} = 1 \times 10^{- 2}$ , the RMSE and MAE reach their minimum values of 0.1237 and 0.0551, respectively. Similarly, for D2.1, when $λ_{b} = 1 \times 10^{- 1}$ , the RMSE and MAE are minimized to 0.02262 and 0.0396, respectively. The iteration count of the model increases with $λ_{b}$ . As shown in Figure 5d, when $λ_{b}$ increases from $1 \times 10^{- 4}$ to $1 \times 10^{0}$ , the iteration counts for the RMSE and MAE decrease from 120 to 72 and from 209 to 97 in D1.2, respectively. In D2.2, the iteration counts for the RMSE and MAE decrease from 48 to 21 and from 37 to 18, respectively.

4.3.2. Impacts of $γ$ and $δ$

As shown in Figure 6, changes in

γ

and

δ

significantly affect the model’s imputation error and the number of iterations. The key observations are as follows:

$γ$ controls the accumulation of historical gradients. As $γ$ increases, the number of iterations rises, while the imputation error shows no significant change. As shown in Figure 6a,b, for the D1.1 dataset, as $γ$ increases from 0.1 to 0.6, the RMSE changes from 0.1236 to 0.1242, and the MAE changes from 0.05784 to 0.0580. The iteration counts for the RMSE and MAE decrease from 78 to 34 and from 166 to 66, respectively. In the D2.1 dataset, the RMSE changes from 0.2268 to 0.2280, and the MAE changes from 0.04062 to 0.04046. The iteration counts for the RMSE and MAE decrease from 42 to 22 and from 33 to 14, respectively. This indicates that, although increasing $γ$ accelerates convergence, it has little impact on the imputation error.
The noise perturbation $δ$ controls the model’s robustness by simulating uncertainties or fluctuations in the data. An appropriate value of $δ$ enhances the model’s stability and adaptability when facing such uncertainties. As shown in Figure 6c, on the D1.2 dataset, when $δ$ is set to 0.05 and 0.1, the RMSE and MAE reach their optimal values of 0.1241 and 0.0551. On the D2.2 dataset, the RMSE is minimized at $δ = 0.05$ with a value of 0.0227, while the MAE decreases from 0.0445 to 0.0380 as $δ$ increases from 0.005 to 1. As $δ$ increases, the model’s iteration count decreases. As shown in Figure 6d, on the D1.1 dataset, the iteration counts for the RMSE and MAE decrease from 105 and 189 to 58 and 67, respectively, as $δ$ increases from 0.005 to 1. On the D2.1 dataset, the iteration counts for the RMSE and MAE decrease from 52 and 32 to 28 and 15, respectively.

4.4. Comparison with State-of-the-Art Models

4.4.1. Models for Comparison

The seven state-of-the-art models used for comparison are described in detail below:

M1: Time-SVD++ [50], a model based on two factor matrices that incorporates time-dependent biases and dynamic feature vectors to effectively capture temporal dynamics during factorization.
M2: PLFT [28], a TF model integrating PID control principles with SGD to enhance optimization and convergence. It employs linear bias vectors for data fitting and $L_{2}$ regularization to prevent overfitting and improve stability.
M3: NeuLFT [44], a nonlinear TF model for HDI tensors. It integrates rank-one tensors as hidden neurons in a single-layer feedback network, uses an adaptive backward propagation (ABP) learning scheme with Momentum-SGD and Particle Swarm Optimization (PSO), applies a sigmoid activation function for nonlinear modeling, and incorporates linear bias tensors with $L_{2}$ regularization.
M4: HaLRTC [9], a low-rank tensor completion model that extends NNM to tensors using a novel tensor trace norm definition. It employs the Alternating Direction Method of Multipliers (ADMM) for efficient optimization, enabling accurate estimations of missing values in high-dimensional data.
M5: RTC-ASVD [51], a robust tensor completion model leveraging approximate singular value decomposition (SVD) with fast QR decomposition and tensor $γ$ -nuclear norm regularization. It combines discrete Fourier transform (DFT) techniques and ADMM for optimization, effectively addressing missing and corrupted data in high-dimensional tensors.
M6: NT-DPTC [52], a tensor completion model using dimension-preserved (DP) tensor decomposition to maintain traffic data structure. It decomposes the tensor into three latent factor tensors, applies a sigmoid function for non-negativity, and uses AdamW for efficient optimization and convergence.
M7: LSTM [15], a specialized RNN model with memory cells and gating mechanisms to capture long-term dependencies in time series data, particularly suited for tasks like multi-step forecasting and imputation.
M8: RMNTF, the model proposed in this paper.

4.4.2. Theoretical Complexity Comparison

In this section, the theoretical time and space complexities of the proposed RMNTF (M8) model are compared with those of three baseline models: TF, HaLRTC (M4), and NT-DPTC (M6). The TF model, as the most basic tensor factorization method, serves as a baseline for comparison; M4 is a classical low-rank tensor completion model based on matrix NNM and is widely used for tensor completion tasks; M6 employs DP decomposition and trains solely on observed data, sharing similarities with M8 in terms of tensor factorization and optimization strategies. By comparing the theoretical complexities of these three models, the theoretical advantages of M8 can be highlighted. Table 4 presents the time and space complexities of the models, and the following conclusions can be drawn:

In the case of the HDI tensor, M8 demonstrates strong competitiveness in terms of complexity compared to the other models. As shown in Table 4, M8 and TF exhibit identical time and space complexities, indicating that M8 does not introduce significant additional complexity compared to TF. M4, due to its optimization process requiring matrix singular value decomposition and the reconstruction of the complete tensor, exhibits significantly higher time and space complexities. When the tensor is extremely sparse, i.e.,

| I | \times | J | \times | K | ≫ | Λ |

, the complexity of M4 becomes even more pronounced. Compared to M6, M8 exhibits superior time and space complexities, making it more efficient in handling large-scale sparse data.

4.4.3. Comparison Results

Table 5 presents the imputation errors of the various models across datasets, reflecting their performance under optimal parameters. Table 6 complements this by comparing the time consumption of each model, providing additional context on computational efficiency. From these results, the following findings can be observed:

M8 achieves lower imputation errors across all datasets, outperforming all other comparable models. As shown in Table 5, on the D1.1 dataset, M8 has an RMSE of 0.1247 and an MAE of 0.0568, which are 1.92% and 10.83% lower than those of the second-best model, M2, with RMSE and MAE values of 0.1271 and 0.0637, respectively. On the D2.1 dataset, M8 has an RMSE of 0.2262 and an MAE of 0.0376, which are 1.48% and 10.40% lower than those of M2, with RMSE and MAE values of 0.2296 and 0.0422, respectively. On the D1.2 dataset, M8 reduces the RMSE by an average of 14.59% and the MAE by 17.37% compared to all other models. On the D2.2 dataset, M8 reduces the RMSE by an average of 5.31% and the MAE by 25.53% compared to all other models. Overall, M8 demonstrates superior imputation accuracy across all datasets, further solidifying its potential for practical data imputation tasks.
M8 demonstrates significant efficiency advantages over the other models. As shown in Table 6, on the D2.1 dataset, the computation times of the RMSE and MAE with M8 are 11.28 and 11.56, respectively, lower than those of the second-best model M1, which takes 13.89 and 14.23, representing reductions of 18.79% and 18.76%, respectively. On the D2.2 dataset, M8 further reduces the RMSE and MAE computation times to 7.30 and 7.96, a decrease of 19.43% and 15.77% compared to those of M2. On the D1.1 dataset, M8 achieves an average reduction of 66.56% and 60.77% in computation times for the RMSE and MAE, respectively, compared to the other models. On the D1.2 dataset, M8 also shows superior performance, with average reductions of 64.34% and 63.46% for the RMSE and MAE computation times.

To assess the statistical significance of the performance differences between the models, the Wilcoxon signed-rank test is applied. This non-parametric paired comparison method evaluates three key metrics:

R^{+}

(the sum of ranks for positive differences),

R^{-}

(the sum of ranks for negative differences), and the p-value. A larger R+ value indicates better performance, while the p-value determines statistical significance. When the p-value is below the significance level (typically 0.05), the difference is considered statistically significant. Table 7 and Table 8 summarize the Wilcoxon test results based on Table 5 and Table 6, where the RMSE and MAE are used as evaluation metrics across four datasets. Thus, each model corresponds to eight comparison cases. The following conclusions can be drawn from these statistical results:

M8 demonstrates significant improvements in both accuracy and efficiency compared to the other models. As shown in Table 7 and Table 8, M8 consistently achieves high

R^{+}

values and low p-values. Moreover, M8 outperforms all other models in nearly all test scenarios, with the

R^{-}

values being zero in 13 out of 14 comparisons, as indicated in the tables. The only exception is in the comparison of computational efficiency between M8 and M1, where M8’s computation time slightly exceeds that of M1 when using the MAE metric on the D1.1 and D1.2 datasets, as recorded in Table 6.

4.5. Summary

The experimental results demonstrate that the RMNTF model excels in two key areas: (a) efficient computational performance and (b) accurate and stable imputation capability, particularly for handling high-missing-rate PC data. These strengths establish RMNTF as a reliable solution for PC data imputation tasks.

5. Discussion

The findings of this study provide valuable insights into PC data analyses, particularly in the context of high-dimensional and incomplete data. Our proposed RMNTF model demonstrates significant improvements in both accuracy and computational efficiency over existing methods. This section discusses the results in relation to those of previous studies, the working hypotheses, and their broader implications, and it suggests potential directions for future research.

Our results support the working hypothesis that integrating adversarial loss regularization and

L_{2}

regularization enhances the robustness of tensor factorization models in the presence of incomplete data. RMNTF’s superior performance, evidenced by lower RMSE and MAE values across all datasets, highlights the effectiveness of these regularization techniques. This is consistent with prior studies, which emphasize the benefits of regularization in improving generalization and stability in machine learning models [53,54]. Another key contribution of our model is the enforcement of non-negativity constraints via the sigmoid function, ensuring that imputed values are physically meaningful and interpretable. This aligns with the inherent characteristics of PC data, which cannot be negative, and represents a significant advancement over models that do not incorporate such constraints. The momentum-based optimization approach in RMNTF has proven to be an effective tool for accelerating convergence, reducing the computational burden compared to that of traditional tensor factorization methods. This is especially relevant for large-scale applications, where time and resource efficiency are critical.

The implications of our findings go beyond PC data analyses. RMNTF’s ability to accurately handle high-dimensional and incomplete data makes it applicable across various domains, including finance, healthcare, and social network analyses [55,56,57], where such data characteristics are common. The model’s robustness and interpretability make it a valuable tool for decision-making in these fields.

Despite the promising results demonstrated by RMNTF, several limitations must be addressed in future work:

Computational Resource Requirements: While RMNTF performs well in terms of computational efficiency, its resource demands remain high when processing large-scale datasets. In resource-constrained environments, such as embedded systems or mobile devices, scalability and parallelization are crucial. Future research should focus on adapting RMNTF for distributed computing and parallel frameworks to improve efficiency in large-scale applications [58,59].
Complexity of Parameter Selection: The model’s performance is highly sensitive to hyperparameters (e.g., regularization and momentum coefficients). Selecting appropriate parameters often requires extensive experimentation and tuning, which adds complexity. Future work should explore smarter hyperparameter optimization methods, such as Bayesian optimization or metaheuristic algorithms, to automate the selection of optimal parameters and enhance both usability and efficiency [60,61].
Real-World Validation: Although RMNTF has been tested on public datasets, these may not fully capture the complexities of real industrial environments. Validating the model in real-world settings is essential to assess its effectiveness, robustness, and stability, especially in environments with higher noise and data uncertainty. Future research should focus on applying the model to industrial applications and conducting field validation to evaluate its performance in real-time data processing [62,63].
Limitations of Data Sources: The datasets used in this study are primarily sourced from publicly available PC datasets, which may introduce biases and limit the model’s generalizability. Future research should consider more diverse data sources, including different building types, regions, and devices, to improve the model’s adaptability and generalization capability [64].
Data Quality Limitations: Although the datasets have been cleaned and preprocessed, noise and outliers may still persist, affecting model accuracy. Future work should adopt more advanced noise filtering and outlier detection methods to enhance data quality and ensure stable model training and prediction [65,66].

In conclusion, the RMNTF model represents a significant advancement in tensor factorization for handling missing PC data. Its effectiveness, robustness, and efficiency position it as a leading solution for data imputation tasks. Future research will further refine and expand the model’s capabilities, ensuring its continued relevance and impact in the era of big data.

Author Contributions

D.S. was was responsible for code development, data processing, and writing—editing and review. T.X. was responsible for reviewing the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data supporting the findings of this study are openly available in the Zenodo repository at https://zenodo.org/records/13917372 (accessed on 16 January 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zabin, R.; Haque, K.F.; Abdelgawad, A. PredXGBR: A Machine Learning Framework for Short-Term Electrical Load Prediction. Electronics 2024, 13, 4521. [Google Scholar] [CrossRef]
Dong, A.; Lee, S.K. The Study of Scheduling Optimization for Multi-Microgrid Systems Based on an Improved Differential Algorithm. Electronics 2024, 13, 4517. [Google Scholar] [CrossRef]
Lin, R.; Ye, Z.; Zhao, Y. OPEC: Daily load data analysis based on optimized evolutionary clustering. Energies 2019, 12, 2668. [Google Scholar] [CrossRef]
Li, Z.H.; Cui, J.X.; Chen, H.Y.; Lu, H.P.; Zhou, F.; Rocha, P.R.F.; Yang, C.Y. Research Progress of All-Fiber Optic Current Transformers in Novel Power Systems: A Review. Microw. Opt. Technol. Lett. 2025, 67, e70061. [Google Scholar] [CrossRef]
Ramadan, R.; Huang, Q.; Zalhaf, A.S.; Bamisile, O.; Li, J.; Mansour, D.E.A.; Lin, X.; Yehia, D.M. Energy Management in Residential Microgrid Based on Non-Intrusive Load Monitoring and Internet of Things. Smart Cities 2024, 7, 1907–1935. [Google Scholar] [CrossRef]
Basu, S.; Matteson, D.S. A survey of estimation methods for sparse high-dimensional time series models. arXiv 2021, arXiv:2107.14754. [Google Scholar] [CrossRef]
Beretta, L.; Santaniello, A. Nearest neighbor imputation algorithms: A critical evaluation. BMC Med. Inform. Decis. Mak. 2016, 16, 197–208. [Google Scholar] [CrossRef]
Athey, S.; Bayati, M.; Doudchenko, N.; Imbens, G.; Khosravi, K. Matrix completion methods for causal panel data models. J. Am. Stat. Assoc. 2021, 116, 1716–1730. [Google Scholar] [CrossRef]
Liu, J.; Musialski, P.; Wonka, P.; Ye, J. Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 208–220. [Google Scholar] [CrossRef]
Zhang, Z.; Aeron, S. Exact tensor completion using t-SVD. IEEE Trans. Signal Process. 2016, 65, 1511–1526. [Google Scholar] [CrossRef]
Ullah, I.; Ahmad, R.; Kim, D. A prediction mechanism of energy consumption in residential buildings using hidden markov model. Energies 2018, 11, 358. [Google Scholar] [CrossRef]
Maruotti, A. Mixed hidden markov models for longitudinal data: An overview. Int. Stat. Rev. 2011, 79, 427–454. [Google Scholar] [CrossRef]
Duarte, O.; Duarte, J.E.; Rosero-Garcia, J. Data Imputation in Electricity Consumption Profiles through Shape Modeling with Autoencoders. Mathematics 2024, 12, 3004. [Google Scholar] [CrossRef]
Kim, J.C.; Chung, K. Recurrent neural network-based multimodal deep learning for estimating missing values in healthcare. Appl. Sci. 2022, 12, 7477. [Google Scholar] [CrossRef]
Yuan, H.; Xu, G.; Yao, Z.; Jia, J.; Zhang, Y. Imputation of missing data in time series for air pollutants using long short-term memory recurrent neural networks. In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers, Singapore, 8–12 October 2018; pp. 1293–1300. [Google Scholar] [CrossRef]
Lee, D.; Kim, J.; Moon, W.J.; Ye, J.C. CollaGAN: Collaborative GAN for missing image data imputation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2487–2496. Available online: https://openaccess.thecvf.com/content_CVPR_2019/html/Lee_CollaGAN_Collaborative_GAN_for_Missing_Image_Data_Imputation_CVPR_2019_paper.html (accessed on 16 January 2025).
Wong, L.Z.; Chen, H.; Lin, S.; Chen, D.C. Imputing missing values in sensor networks using sparse data representations. In Proceedings of the 17th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, Miami Beach, FL, USA, 21–25 November 2014; pp. 227–230. [Google Scholar] [CrossRef]
Song, S.; Sun, Y.; Zhang, A.; Chen, L.; Wang, J. Enriching data imputation under similarity rule constraints. IEEE Trans. Knowl. Data Eng. 2018, 32, 275–287. [Google Scholar] [CrossRef]
Breve, B.; Caruccio, L.; Deufemia, V.; Polese, G. RENUVER: A Missing Value Imputation Algorithm based on Relaxed Functional Dependencies. In Proceedings of the EDBT 2022, Edinburgh, UK, 29 March–1 April 2022; pp. 1–52. [Google Scholar]
Mohan, K.; Pearl, J. Graphical Models for Recovering Probabilistic and Causal Queries from Missing Data. Adv. Neural Inf. Process. Syst. 2014, 27, 1520–1528. [Google Scholar]
Qiao, L.; Cui, Y.; Jia, Z.; Xiao, K.; Su, H. Missing well logs prediction based on hybrid kernel extreme learning machine optimized by Bayesian optimization. Appl. Sci. 2022, 12, 7838. [Google Scholar] [CrossRef]
Acar, E.; Dunlavy, D.M.; Kolda, T.G.; Mørup, M. Scalable tensor factorizations for incomplete data. Chemom. Intell. Lab. Syst. 2011, 106, 41–56. [Google Scholar] [CrossRef]
Song, Q.; Ge, H.; Caverlee, J.; Hu, X. Tensor completion algorithms in big data analytics. ACM Trans. Knowl. Discov. Data (TKDD) 2019, 13, 1–48. [Google Scholar] [CrossRef]
Mørup, M. Applications of tensor (multiway array) factorizations and decompositions in data mining. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 24–40. [Google Scholar] [CrossRef]
Kolda, T.G.; Bader, B.W. Tensor decompositions and applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
Wu, Y.; Tan, H.; Li, Y.; Zhang, J.; Chen, X. A fused CP factorization method for incomplete tensors. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 751–764. [Google Scholar] [CrossRef] [PubMed]
Luo, X.; Wu, H.; Yuan, H.; Zhou, M. Temporal pattern-aware QoS prediction via biased non-negative latent factorization of tensors. IEEE Trans. Cybern. 2019, 50, 1798–1809. [Google Scholar] [CrossRef] [PubMed]
Wu, H.; Luo, X.; Zhou, M.; Rawa, M.J.; Sedraoui, K.; Albeshri, A. A PID-incorporated latent factorization of tensors approach to dynamically weighted directed network analysis. IEEE/CAA J. Autom. Sin. 2021, 9, 533–546. [Google Scholar] [CrossRef]
Zhu, Y.; Wang, J.; Wang, J.; He, Z. Multitask neural tensor factorization for road traffic speed-volume correlation pattern learning and joint imputation. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24550–24560. [Google Scholar] [CrossRef]
Said, A.B.; Erradi, A. Spatiotemporal tensor completion for improved urban traffic imputation. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6836–6849. [Google Scholar] [CrossRef]
Jin, Y.; Yang, L. Graph-aware tensor factorization convolutional network for knowledge graph completion. Int. J. Mach. Learn. Cybern. 2024, 15, 1755–1766. [Google Scholar] [CrossRef]
Deng, T.; Wan, M.; Shi, K.; Zhu, L.; Wang, X.; Jiang, X. Short term prediction of wireless traffic based on tensor decomposition and recurrent neural network. SN Appl. Sci. 2021, 3, 779. [Google Scholar] [CrossRef]
Batra, N.; Gulati, M.; Singh, A.; Srivastava, M.B. It’s Different: Insights into home energy consumption in India. In Proceedings of the 5th ACM Workshop on Embedded Systems For Energy-Efficient Buildings, Roma, Italy, 11–15 November 2013; pp. 1–8. [Google Scholar] [CrossRef]
Kolter, J.Z.; Johnson, M.J. REDD: A public data set for energy disaggregation research. In Proceedings of the Workshop on Data Mining Applications in Sustainability (SIGKDD), San Diego, CA, USA, 21 August 2011; Volume 25, pp. 59–62. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=d85a51e2978f4563ee74bf9a09d3219e03799819 (accessed on 16 January 2025).
Pickering, E.M.; Hossain, M.A.; French, R.H.; Abramson, A.R. Building electricity consumption: Data analytics of building operations with classical time series decomposition and case based subsetting. Energy Build. 2018, 177, 184–196. [Google Scholar] [CrossRef]
Singh, S.; Yassine, A. Big data mining of energy time series for behavioral analytics and energy consumption forecasting. Energies 2018, 11, 452. [Google Scholar] [CrossRef]
Wilhelm, S.; Kasbauer, J. Exploiting smart meter power consumption measurements for human activity recognition (HAR) with a motif-detection-based non-intrusive load monitoring (NILM) approach. Sensors 2021, 21, 8036. [Google Scholar] [CrossRef]
Ma, H.; Zhou, D.; Liu, C.; Lyu, M.R.; King, I. Recommender systems with social regularization. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, Hong Kong, China, 9–12 February 2011. [Google Scholar] [CrossRef]
Zhang, W.; Sun, H.; Liu, X.; Guo, X. Temporal QoS-aware web service recommendation via non-negative tensor factorization. In Proceedings of the WWW ‘14: 23rd International World Wide Web Conference, Seoul, Republic of Korea, 7–11 April 2014. [Google Scholar] [CrossRef]
Takács, G.; Pilászy, I.; Németh, B.; Tikk, D. Scalable collaborative filtering approaches for large recommender systems. J. Mach. Learn. Res. 2009, 10, 623–656. [Google Scholar]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar] [CrossRef]
Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018; pp. 99–112. [Google Scholar] [CrossRef]
Luo, X.; Wu, H.; Li, Z. NeuLFT: A novel approach to nonlinear canonical polyadic decomposition on high-dimensional incomplete tensors. IEEE Trans. Knowl. Data Eng. 2022, 35, 6148–6166. [Google Scholar] [CrossRef]
Qian, N. On the momentum term in gradient descent learning algorithms. Neural Netw. 1999, 12, 145–151. [Google Scholar] [CrossRef]
Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar] [CrossRef]
Morup, M.; Dunlavy, D.M.; Acar, E.; Kolda, T.G. Scalable Tensor Factorizations with Missing Data; Technical report; Sandia National Laboratories (SNL): Albuquerque, NM, USA; Livermore, CA, USA, 2010. [Google Scholar]
Zhang, F.; Gong, T.; Lee, V.E.; Zhao, G.; Rong, C.; Qu, G. Fast Algorithms to Evaluate Collaborative Filtering Recommender Systems; Elsevier: Amsterdam, The Netherlands, 2016; Volume 96. [Google Scholar] [CrossRef]
Bhargava, P.; Phan, T.; Zhou, J.; Lee, J. Who, What, When, and Where: Multi-Dimensional Collaborative Recommendations Using Tensor Factorization on Sparse User-Generated Data. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015. [Google Scholar] [CrossRef]
Koren, Y. Collaborative filtering with temporal dynamics. Commun. In Proceedings of the KDD09: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 447–456. [Google Scholar] [CrossRef]
Wu, F.; Li, C.; Li, Y.; Tang, N. Robust low-rank tensor completion via new regularized model with approximate SVD. Inf. Sci. 2023, 629, 646–666. [Google Scholar] [CrossRef]
Chen, H.; Lin, M.; Liu, J.; Yang, H.; Zhang, C.; Xu, Z. NT-DPTC: A non-negative temporal dimension preserved tensor completion model for missing traffic data imputation. Inf. Sci. 2024, 653, 119797. [Google Scholar] [CrossRef]
Bousquet, O.; Elisseeff, A. Stability and generalization. J. Mach. Learn. Res. 2002, 2, 499–526. [Google Scholar]
Xu, H.; Caramanis, C.; Mannor, S. Robustness and Regularization of Support Vector Machines. J. Mach. Learn. Res. 2009, 10, 1485–1510. [Google Scholar]
Tabassum, S.; Pereira, F.S.; Fernandes, S.; Gama, J. Social network analysis: An overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1256. [Google Scholar] [CrossRef]
Ye, J.; Liu, J. Sparse methods for biomedical data. ACM Sigkdd Explor. Newsl. 2012, 14, 4–15. [Google Scholar] [CrossRef]
Sorjamaa, A.; Corona, F.; Miche, Y.; Merlin, P.; Maillet, B.; Séverin, E.; Lendasse, A. Sparse linear combination of SOMs for data imputation: Application to financial database. In Proceedings of the Advances in Self-Organizing Maps: 7th International Workshop, WSOM 2009, St. Augustine, FL, USA, 8–10 June 2009; pp. 290–297. [Google Scholar] [CrossRef]
Shin, K.; Sael, L.; Kang, U. Fully scalable methods for distributed tensor factorization. IEEE Trans. Knowl. Data Eng. 2016, 29, 100–113. [Google Scholar] [CrossRef]
Choi, J.H.; Vishwanathan, S. DFacTo: Distributed factorization of tensors. Adv. Neural Inf. Process. Syst. 2014, 27, 1296–1304. [Google Scholar]
Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. Available online: https://www.sciencedirect.com/science/article/pii/S1674862X19300047 (accessed on 16 January 2025).
Joseph, S.B.; Dada, E.G.; Abidemi, A.; Oyewola, D.O.; Khammas, B.M. Metaheuristic algorithms for PID controller parameters tuning: Review, approaches and open problems. Heliyon 2022, 8, e09399. [Google Scholar] [CrossRef]
Sun, B.; Xu, Y.; Gu, W.; Cai, H.; Lu, S.; Mili, L.; Yu, W.; Wu, Z. A Low-Rank Tensor Train Approach for Electric Vehicle Load Data Reconstruction Using Real Industrial Data. IEEE Trans. Smart Grid 2024. [Google Scholar] [CrossRef]
Li, H.; Li, K.; An, J.; Li, K. An online and scalable model for generalized sparse nonnegative matrix factorization in industrial applications on multi-GPU. IEEE Trans. Ind. Inform. 2019, 18, 437–447. [Google Scholar] [CrossRef]
Xu, X.; Cao, X.; Yu, L. Carbon emissions forecasting based on tensor decomposition with multi-source data fusion. Inf. Sci. 2024, 681, 121235. [Google Scholar] [CrossRef]
Liu, H.; Chen, C. Data processing strategies in wind energy forecasting models and applications: A comprehensive review. Appl. Energy 2019, 249, 392–408. [Google Scholar] [CrossRef]
Li, Z.h.; Cui, J.x.; Lu, H.p.; Zhou, F.; Diao, Y.l.; Li, Z.x. Prediction model of measurement errors in current transformers based on deep learning. Rev. Sci. Instrum. 2024, 95, 044704. [Google Scholar] [CrossRef] [PubMed]

Figure 1. A three-dimensional tensor representation of PC data.

Figure 2. Factorizing the target tensor

Y

.

Figure 2. Factorizing the target tensor

Y

.

Figure 3. Building the perturbed tensor

\tilde{Y}

from the observed tensor

Y

.

Figure 3. Building the perturbed tensor

\tilde{Y}

from the observed tensor

Y

.

Figure 4. Comparison of TF and NTF models’ performance.

Figure 5. Impacts of

λ_{a}

and

λ_{b}

.

Figure 5. Impacts of

λ_{a}

and

λ_{b}

.

Figure 6. Impacts of

γ

and

δ

.

Figure 6. Impacts of

γ

and

δ

.

Table 1. Adopted symbols and their descriptions.

Symbol	Description
$I, J, K$	Three entity sets
$Y$	Third-order target tensor
$H$	The r-th rank-one tensor
$H_{r}$	The r-th rank-one tensor summed to form $X$
$X$	Approximation to $Y$
$\tilde{Y}$	The tensor obtained by perturbing $Y$
$y_{i j k},$ $h_{i j k},$ $h_{(r) i j k}$ , $x_{i j k}$	Single entry of $Y,$ $H$ , $H_{r}$ , and $X$
${\tilde{y}}_{i j k}$	Single entry of $\tilde{Y}$
R	Latent feature dimension or the number of rank-one tensors
$U, O, W$	Latent factor matrices
$U_{, r}, O_{, r}, W_{, r}$	The r-th latent factor vectors in $U, O$ , and W
$u_{i r}, o_{j r}, w_{k r}$	Single entry of $U, O$ , and W
$Q_{(U)}, Q_{(O)}, Q_{(W)}$	Decision parameter matrices
$Q_{(U) i r}, Q_{(O) j r}, Q_{(W) k r}$	Single entry of $Q_{(U)}, Q_{(O)},$ and $Q_{(W)}$
$M_{(U) i r}, M_{(O) j r}, M_{(W) k r}$	The historical momentum of $Q_{(U) i r}, Q_{(O) j r}, Q_{(W) k r}$
$δ$	Perturbation magnitude
$λ_{a}, λ_{b}$	$L_{2}$ and adversarial regularization coefficients
$η$	Learning rate
$γ$	Momentum coefficient
∘	Outer product of two vectors
$\|\cdot\|$	Number of elements in the set
${∥\cdot∥}_{F}$	Tensor’s frobenius norm
$Λ, Γ$	The observed and missing entries of tensor $Y$
$Φ, Ψ, Ω$	Training, validation, and testing subsets from $Λ$
t	Iteration counts

Table 2. Tensorization details.

Attributes	D1 (iAWE)	D2 (REDD)
Sampling Rate	1 Hz	1 Hz
Days	21	21
Samples per Day	86,400	86,400
Meter Quantity	12	18
Tensor Dimensions	$21 \times 86, 400 \times 12$	$21 \times 86, 400 \times 18$
Entries	1,448,417	1,579,227
Density	6.65%	4.80%

Table 3. Training and testing data splits for D1 and D2.

Dataset No.	D1		D2
Dataset No.	D1.1	D1.2	D2.1	D2.2
Training–Testing	60%:20%	30%:40%	60%:20%	30%:40%
Training data	869,050	434,525	947,536	473,768
Testing data	289,683	579,366	315,845	631,691

Table 4. Computational and storage complexity of different models.

Model	Computational Complexity	Storage Complexity
TF	$Θ (t \times R \times \|Λ\|)$	$Θ ((\|I\| + \|J\| + \|K\|) \times R + \|Λ\|)$
M4	$Θ (t \times (\|I\| \times \|J\| \times \|K\| \times \min (\|I\|, \|J\|, \|K\|)))$	$Θ (\|I\| \times \|J\| \times \|K\|)$
M6	$Θ (t \times \|Λ\| \times (\|K\| \times \|I\| \times R + \|K\| \times \|J\| \times R + \|I\| \times \|J\|))$	$Θ ((\|K\| \times \|I\| \times R + \|K\| \times R \times \|J\| + \|I\| \times \|J\|) + \|Λ\|)$
M8	$Θ (t \times R \times \|Λ\|)$	$Θ ((\|I\| + \|J\| + \|K\|) \times R + \|Λ\|)$

Table 5. Performance comparison of models across datasets using the MAE and RMSE metrics.

Dataset	Metric	M1	M2	M3	M4	M5	M6	M7	M8
D1.1	RMSE	$0 . 1288_{\pm 6 \times 10^{- 4}}$	$0 . 1271_{\pm 3 \times 10^{- 3}}$	$0 . 1268_{\pm 5 \times 10^{- 4}}$	$0 . 1560_{\pm 0}$	$0 . 1964_{\pm 3 \times 10^{- 6}}$	$0 . 1468_{\pm 4 \times 10^{- 6}}$	$0 . 1403_{\pm 1 \times 10^{- 6}}$	$0 . 1247_{\pm 6 \times 10^{- 4}}$
D1.1	MAE	$0 . 0664_{\pm 5 \times 10^{- 4}}$	$0 . 0637_{\pm 2 \times 10^{- 3}}$	$0 . 0587_{\pm 2 \times 10^{- 3}}$	$0 . 0769_{\pm 0}$	$0 . 0914_{\pm 2 \times 10^{- 6}}$	$0 . 0697_{\pm 6 \times 10^{- 6}}$	$0 . 0596_{\pm 1 \times 10^{- 6}}$	$0 . 0568_{\pm 1 \times 10^{- 3}}$
D1.2	RMSE	$0 . 1387_{\pm 4 \times 10^{- 4}}$	$0 . 1390_{\pm 4 \times 10^{- 3}}$	$0 . 1376_{\pm 4 \times 10^{- 4}}$	$0 . 1951_{\pm 0}$	$0 . 2074_{\pm 3 \times 10^{- 6}}$	$0 . 1643_{\pm 5 \times 10^{- 6}}$	$0 . 1426_{\pm 1 \times 10^{- 6}}$	$0 . 1336_{\pm 8 \times 10^{- 4}}$
D1.2	MAE	$0 . 0711_{\pm 3 \times 10^{- 4}}$	$0 . 0697_{\pm 2 \times 10^{- 3}}$	$0 . 0676_{\pm 2 \times 10^{- 3}}$	$0 . 0963_{\pm 0}$	$0 . 1023_{\pm 3 \times 10^{- 6}}$	$0 . 0818_{\pm 4 \times 10^{- 6}}$	$0 . 0687_{\pm 2 \times 10^{- 6}}$	$0 . 0633_{\pm 4 \times 10^{- 4}}$
D2.1	RMSE	$0 . 2336_{\pm 3 \times 10^{- 4}}$	$0 . 2296_{\pm 2 \times 10^{- 3}}$	$0 . 2316_{\pm 6 \times 10^{- 4}}$	$0 . 2557_{\pm 0}$	$0 . 2560_{\pm 8 \times 10^{- 6}}$	$0 . 2338_{\pm 1 \times 10^{- 5}}$	$0 . 2541_{\pm 1 \times 10^{- 7}}$	$0 . 2262_{\pm 1 \times 10^{- 3}}$
D2.1	MAE	$0 . 0433_{\pm 5 \times 10^{- 4}}$	$0 . 0422_{\pm 1 \times 10^{- 3}}$	$0 . 0456_{\pm 8 \times 10^{- 4}}$	$0 . 0452_{\pm 0}$	$0 . 0482_{\pm 1 \times 10^{- 5}}$	$0 . 0544_{\pm 9 \times 10^{- 6}}$	$0 . 0623_{\pm 5 \times 10^{- 6}}$	$0 . 0376_{\pm 2 \times 10^{- 4}}$
D2.2	RMSE	$0 . 2291_{\pm 3 \times 10^{- 4}}$	$0 . 2295_{\pm 1 \times 10^{- 3}}$	$0 . 2307_{\pm 5 \times 10^{- 4}}$	$0 . 2545_{\pm 0}$	$0 . 2559_{\pm 7 \times 10^{- 6}}$	$0 . 2386_{\pm 5 \times 10^{- 6}}$	$0 . 2489_{\pm 3 \times 10^{- 7}}$	$0 . 2263_{\pm 3 \times 10^{- 3}}$
D2.2	MAE	$0 . 0442_{\pm 2 \times 10^{- 4}}$	$0 . 0431_{\pm 8 \times 10^{- 4}}$	$0 . 0469_{\pm 3 \times 10^{- 4}}$	$0 . 0473_{\pm 0}$	$0 . 0477_{\pm 6 \times 10^{- 6}}$	$0 . 0635_{\pm 1 \times 10^{- 5}}$	$0 . 0646_{\pm 1 \times 10^{- 6}}$	$0 . 0378_{\pm 4 \times 10^{- 4}}$

The bold values represent the best performance for each metric. The “±” values represent the standard deviation.

Table 6. Time consumption of models across datasets (seconds).

Dataset	Metric	M1	M2	M3	M4	M5	M6	M7	M8
D1.1	RMSE	$32 . 36_{\pm 2.27}$	$54 . 61_{\pm 3.03}$	$335 . 89_{\pm 32.35}$	$336 . 36_{\pm 1.20}$	$87 . 62_{\pm 2.47}$	$147 . 03_{\pm 1.40}$	$229 . 24_{\pm 1.49}$	$31 . 44_{\pm 2.6}$
D1.1	MAE	$39 . 47_{\pm 2.55}$	$62 . 63_{\pm 3.38}$	$423 . 74_{\pm 38.67}$	$1054 . 46_{\pm 1.01}$	$72 . 45_{\pm 2.41}$	$167 . 64_{\pm 2.07}$	$231 . 41_{\pm 1.33}$	$42 . 63_{\pm 1.98}$
D1.2	RMSE	$35 . 62_{\pm 2.37}$	$56 . 52_{\pm 4.52}$	$305 . 36_{\pm 29.13}$	$334 . 64_{\pm 1.26}$	$81 . 54_{\pm 1.96}$	$118 . 91_{\pm 1.14}$	$230 . 44_{\pm 1.10}$	$32 . 26_{\pm 1.88}$
D1.2	MAE	$35 . 41_{\pm 2.81}$	$82 . 43_{\pm 4.32}$	$403 . 25_{\pm 27.66}$	$1046 . 95_{\pm 1.96}$	$67 . 53_{\pm 2.57}$	$127 . 60_{\pm 2.41}$	$236 . 21_{\pm 1.79}$	$37 . 37_{\pm 1.54}$
D2.1	RMSE	$13 . 89_{\pm 0.65}$	$24 . 32_{\pm 1.20}$	$178 . 65_{\pm 16.81}$	$99 . 32_{\pm 0.96}$	$36 . 80_{\pm 1.66}$	$28 . 18_{\pm 2.04}$	$235 . 74_{\pm 1.74}$	$11 . 28_{\pm 0.68}$
D2.1	MAE	$14 . 23_{\pm 0.74}$	$31 . 86_{\pm 2.03}$	$203 . 36_{\pm 17.61}$	$107 . 74_{\pm 1.34}$	$41 . 56_{\pm 1.68}$	$36 . 20_{\pm 1.44}$	$235 . 41_{\pm 2.04}$	$11 . 56_{\pm 0.73}$
D2.2	RMSE	$9 . 06_{\pm 0.61}$	$21 . 74_{\pm 1.14}$	$170 . 63_{\pm 14.69}$	$102 . 32_{\pm 0.64}$	$30 . 74_{\pm 1.25}$	$20 . 98_{\pm 1.09}$	$232 . 56_{\pm 1.41}$	$7 . 30_{\pm 0.57}$
D2.2	MAE	$9 . 45_{\pm 0.64}$	$24 . 31_{\pm 1.56}$	$190 . 57_{\pm 16.54}$	$111 . 74_{\pm 1.54}$	$43 . 20_{\pm 2.26}$	$22 . 36_{\pm 1.12}$	$237 . 69_{\pm 1.77}$	$7 . 96_{\pm 0.89}$

The bold values represent the best performance for each metric. The “±” values represent the standard deviation.

Table 7. Statistical results of the Wilcoxon signed-rank test for the RMSE and MAE in Table 5.

Comparison	R⁺	p-Value
M8 vs. M1	36	0.0078
M8 vs. M2	36	0.0078
M8 vs. M3	36	0.0078
M8 vs. M4	36	0.0078
M8 vs. M5	36	0.0078
M8 vs. M6	36	0.0078
M8 vs. M7	36	0.0078

Table 8. Statistical results of the Wilcoxon signed-rank test for time consumption in Table 6.

Comparison	R⁺	R⁻	p-Value
M8 vs. M1	26	10	0.3828
M8 vs. M2	36	0	0.0078
M8 vs. M3	36	0	0.0078
M8 vs. M4	36	0	0.0078
M8 vs. M5	36	0	0.0078
M8 vs. M6	36	0	0.0078
M8 vs. M7	36	0	0.0078

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, D.; Xie, T. Robust Momentum-Enhanced Non-Negative Tensor Factorization for Accurate Reconstruction of Incomplete Power Consumption Data. Electronics 2025, 14, 351. https://doi.org/10.3390/electronics14020351

AMA Style

Shi D, Xie T. Robust Momentum-Enhanced Non-Negative Tensor Factorization for Accurate Reconstruction of Incomplete Power Consumption Data. Electronics. 2025; 14(2):351. https://doi.org/10.3390/electronics14020351

Chicago/Turabian Style

Shi, Dengyu, and Tangtang Xie. 2025. "Robust Momentum-Enhanced Non-Negative Tensor Factorization for Accurate Reconstruction of Incomplete Power Consumption Data" Electronics 14, no. 2: 351. https://doi.org/10.3390/electronics14020351

APA Style

Shi, D., & Xie, T. (2025). Robust Momentum-Enhanced Non-Negative Tensor Factorization for Accurate Reconstruction of Incomplete Power Consumption Data. Electronics, 14(2), 351. https://doi.org/10.3390/electronics14020351

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Model	Computational Complexity	Storage Complexity
TF	$Θ (t \times R \times \|Λ\|)$	$Θ ((\|I\| + \|J\| + \|K\|) \times R + \|Λ\|)$
M4	$Θ (t \times (\|I\| \times \|J\| \times \|K\| \times \min (\|I\|, \|J\|, \|K\|)))$	$Θ (\|I\| \times \|J\| \times \|K\|)$
M6	$Θ (t \times \|Λ\| \times (\|K\| \times \|I\| \times R + \|K\| \times \|J\| \times R + \|I\| \times \|J\|))$	$Θ ((\|K\| \times \|I\| \times R + \|K\| \times R \times \|J\| + \|I\| \times \|J\|) + \|Λ\|)$
M8	$Θ (t \times R \times \|Λ\|)$	$Θ ((\|I\| + \|J\| + \|K\|) \times R + \|Λ\|)$

Article Menu

Robust Momentum-Enhanced Non-Negative Tensor Factorization for Accurate Reconstruction of Incomplete Power Consumption Data

Abstract

1. Introduction

2. Fundamentals

2.1. Symbol Appointment

2.2. PC Data Tensorization

2.3. TF for Missing PC Data Imputation

3. RMNTF Model

3.1. Decoupling Non-Negativity Constraints in Tensor Factorization

3.2. Adversarial Loss Regularization for Enhanced Robustness

3.3. Stochastic Gradient Descent (SGD)-Based Tensor Factorization Model

3.4. Incorporating Momentum in TF

3.5. Algorithm Design and Analysis

4. Experimental Results and Discussion

4.1. Experimental Setup and Evaluation

4.1.1. Dataset Normalization

4.1.2. Traning Setting

4.1.3. Evaluation Metrics

4.2. Comparison of TF and NTF Model Performance

4.3. Hyperparameter Sensitivity Analysis

4.3.1. Impacts of λ a and λ b

4.3.2. Impacts of γ and δ

4.4. Comparison with State-of-the-Art Models

4.4.1. Models for Comparison

4.4.2. Theoretical Complexity Comparison

4.4.3. Comparison Results

4.5. Summary

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.3.1. Impacts of $λ_{a}$ and $λ_{b}$

4.3.2. Impacts of $γ$ and $δ$