Design of Self-Optimizing Polynomial Neural Networks with Temporal Feature Enhancement for Time Series Classification

Tang, Yuqi; Xu, Zhilei; Huang, Wei

doi:10.3390/electronics14030465

Open AccessArticle

Design of Self-Optimizing Polynomial Neural Networks with Temporal Feature Enhancement for Time Series Classification

by

Yuqi Tang

^1,2,

Zhilei Xu

¹

and

Wei Huang

^1,*

¹

School of Cyberspace Science and Technology, Beijing Institute of Technology, Beijing 100081, China

²

School of Electrical and Information Engineering, Changsha University of Science and Technology, Changsha 410114, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(3), 465; https://doi.org/10.3390/electronics14030465

Submission received: 18 December 2024 / Revised: 11 January 2025 / Accepted: 22 January 2025 / Published: 23 January 2025

(This article belongs to the Special Issue Security and Privacy in Distributed Machine Learning)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Time series classification is a significant and complex issue in data mining, it is prevalent across various fields and holds substantial research value. However, enhancing the classification rate of time series data remains a formidable challenge. Traditional time series classification methods often face difficulties related to insufficient feature extraction or excessive model complexity. In this study, we propose a self-optimizing polynomial neural network with a temporal feature enhancement, which is referred to as OPNN-T. Existing classifiers based on polynomial neural networks (PNNs) struggle to achieve high-quality performances when dealing with time series data, primarily due to their inability to extract temporal information effectively. The goal of the proposed classifier is to enhance the nonlinear modeling capability for time series data, thereby improving the classification rate in practical applications. The key features of the proposed OPNN-T include the following: (1) A temporal feature module is employed to capture the dependencies in time series data, providing adaptability and flexibility in handling complex temporal patterns. (2) A polynomial neural network (PNN) is constructed using sub-datasets combined with three types of polynomial neurons, which enhances its nonlinear modeling capabilities across diverse scenarios. (3) A self-optimization mechanism is integrated into iteratively optimized sub-datasets, features, and polynomial types, resulting in significant improvements in the classification rate. The experimental results demonstrate that the proposed method achieves superior performances across multiple standard time series datasets, exhibiting higher classification accuracy and greater robustness than the existing classification models. Our research offers an effective solution for time series classification, and highlights the potential of polynomial neural networks in this field.

Keywords:

polynomial neural network (PNN); temporal feature extraction; sub-dataset generation; least squares estimation (LSE); classification rate

1. Introduction

Machine learning-based classifiers have been successfully applied across various fields, and time series classification is a significant research area that has garnered increasing attention [1,2]. Numerous practical solutions rely on time series classification techniques, such as medical diagnosis [3], seismic wave analysis [4], road condition monitoring [5], and anomaly detection [6]. These applications have accelerated the rapid development of time series classification research. However, time series data are characterized by high data volumes, significant noise, and unknown spans of temporal dependencies [7]. Additionally, since time series data consist of continuous numerical sequences, extracting effective features from the raw sequences poses a considerable challenge. The complexity of the classification tasks [8] and the ever-increasing scale of the data [9] further exacerbates the difficulties in achieving accurate and efficient time series classification.

Researchers have proposed various techniques for implementing classifiers, including methods such as K-nearest neighbor (KNN) [10], support vector machines (SVMS) [11,12], naive Bayes [13], and decision trees [14]. The classifiers achieve classification by learning the fundamental features of each class, making the construction of classifiers closely linked to the nature of the samples. In time series classification, a variety of methods for feature extraction are typically employed in either the time domain or the frequency domain to map the original time series into a lower-dimensional feature space. This approach offers notable advantages. First, feature extraction effectively captures the essential characteristics of the original time series, which can significantly improve the accuracy of the classifier in most cases. Second, these methods often exhibit an inherent noise reduction capability, allowing them to filter out some of the noise present in the raw data. Finally, processing data points in a lower-dimensional feature space is more resource-efficient, saving both computational power and time. Therefore, the properties demonstrated by the feature extraction methods, such as dimensionality reduction and noise resilience, provide robust support for developing high-performance classifiers.

Neural networks [15,16] have opened new possibilities for machine learning and pattern recognition, showcasing their unique advantages such as their learning capabilities, generalization abilities, and robustness. Among them, the earliest neural network classifier was the well-known multilayer perceptron (MLP) [17]. Researchers have subsequently introduced various neural network classifiers, including backpropagation (BP) neural networks [18], radial basis function neural networks (RBFNNs) [19] and polynomial neural networks (PNNs) [20]. Although these classifiers have made significant advances, there is substantial room for improvement in the context of time series classification, particularly in addressing the complexity of temporal dependencies, nonlinear patterns, and high dimensionalities in time series data. Real-world applications, such as anomaly detection in industrial equipment, financial forecasting, and health care monitoring, all require efficient and accurate time series classification models. However, the existing methods often struggle to achieve satisfactory performances due to their limitations in capturing intricate temporal relationships or adapting to diverse application scenarios. For instance, in healthcare, the accurate classification of time series data, such as electrocardiogram (ECG) signals, is critical for diagnosing heart conditions. Similarly, in industrial contexts, identifying irregular temporal patterns in sensor data can prevent equipment failure and reduce downtime. These challenges highlight the need for novel models that can effectively address these issues with improved accuracy and robustness.

In recent years, scholars have proposed several effective classification models specifically for time series data. For instance, TBOPE [21] integrates multiple TBOP single classifiers, combining sample averages and trend features for classification. RSFCF [22] reduces the search space by using randomly selected shapelets and embedding multiple typical time series features in the shapelet transformation, thus enhancing the model’s adaptability. AFFNet [23] improves the classification rate by adaptively fusing the multiscale temporal features and distance features of the time series data. These innovations reflect an ongoing effort to enhance the performance of classifiers in the challenging domain of time series classification.

The performance of recurrent neural networks (RNNs) [24] in time series forecasting has inspired us to explore a fusion in classification tasks. RNN is a class of deep learning algorithms commonly employed for modeling sequential data that are temporally correlated. Long short-term memory (LSTM) networks can be considered specialized extensions of RNNs, which are designed to address the shortcomings of traditional RNNs in capturing complicated dependencies. Consequently, employing LSTM networks as the primary component of the temporal feature module significantly enhances the model’s adaptability to time series features. We chose PNN as our research object due to its ability to model complex nonlinear relationships and its inherent advantages in function approximation. This combination aims to leverage LSTM’s strength in sequential data processing while benefiting from the expressive power of PNNs, which could potentially lead to an improved classification rate.

The integration of LSTM and PNN provides complementary strengths that address the limitations inherent in each individual approach. LSTM networks specialize in learning temporal dependencies by leveraging memory cells and gating mechanisms, which allows them to retain the relevant temporal information and discard the irrelevant details. This makes them exceptionally effective at capturing the sequential patterns, trends, and dependencies in time series data. However, while LSTM excels at extracting temporal features, it is not explicitly designed to model the complex nonlinear interactions between the features, which are often critical for classification tasks. PNNs, on the other hand, are particularly effective for representing nonlinear relationships through their polynomial-based structure, which approximates complex functions by combining low-order and high-order terms. When applied to the temporal features extracted by LSTM, PNNs enhance the model’s ability to capture intricate feature relationships and construct nonlinear decision boundaries. This interaction ensures that the information extracted from the sequential data is effectively utilized for classification, allowing the combined framework to address both the temporal and nonlinear complexities in the data.

We also observed that employing various optimization algorithms within neural network models can significantly enhance their performance. Numerous optimized neural network classifiers have been developed, including optimized radial basis function neural networks [25,26], among others. Particle swarm optimization (PSO) [27], as a global optimization algorithm grounded in swarm intelligence, has garnered significant attention due to its superiority in addressing complex optimization problems. By simulating the collaborative behavior of biological populations, PSO effectively achieves a balance between global exploration and local exploitation. Compared with other optimization algorithms, PSO is characterized by its simplicity and ease of use, requiring minimal parameter tuning and exhibiting a reduced risk of overfitting. This makes it especially suitable for optimizing neural networks. Therefore, we adopted the PSO algorithm as our primary self-optimizing technique, leading to improved performances in complex scenarios.

This study introduces a novel self-optimizing polynomial neural network with temporal feature enhancement (OPNN-T) for time series classification. The main features of OPNN-T are as follows: First, OPNN-T uses a temporal enhancement module for feature extraction from time series data. Compared with other statistical methods such as principal component analysis (PCA), the temporal enhancement module has the advantage of being able to capture the temporal features of samples, making it more suitable for time series data. Second, OPNN-T constructs a polynomial neural network classifier (PNNC) via sub-datasets and three types of polynomial neurons, employing least squares estimation (LSE) [28,29] to converge on the parameters. The combination of various polynomial types effectively enhances the representational capacity of the classifier, compared to using separate linear or quadratic polynomials. Finally, to improve the model’s adaptability across different scenarios and reduce overfitting to training data, we optimize the combination of polynomial types and subsets via the self-optimization strategy.

The remainder of this paper is structured as follows: Section 2 presents the main architecture of self-optimizing polynomial neural networks with temporal feature enhancement. Section 3 discusses the learning and optimization methods for OPNN-T. Section 4 provides experimental results and an analysis, demonstrating the feasibility and superiority of our model for time series classification. Finally, Section 5 concludes the paper and offers future perspectives.

2. Architecture Design of the Proposed Model

The architecture of the OPNN-T consists of three main modules: temporal feature enhancement, polynomial neural network classifiers, and self-optimization strategy. The temporal feature enhancement module is designed to capture the time series feature information of the sequences, effectively extracting the dynamic changes and correlations inherent in the time series data. The PNNC is constructed via three different types of polynomials, ensuring the classifier’s flexibility in representing complex decision boundaries. The coefficients of the polynomial functions are estimated via LSE, while the self-optimizing strategy is employed to optimize the model parameters. This approach facilitates the convergence of the model while enhancing the classification performance. Finally, by combining binary classifiers with a discriminative function, the model can obtain classification results efficiently, thereby effectively accomplishing the time series classification task.

2.1. Structure of Temporal Feature Enhancement Module

As a fundamental component of the temporal feature enhancement module, an LSTM network is an improved variant of an RNN that can recognize nonlinear patterns in data and retain historical information over longer time steps. In our model, the LSTM is designed with 128 hidden layer nodes, which provides sufficient capacity to model the complex temporal dependencies in the data without introducing unnecessary computational complexity. Additionally, the output layer consists of 16 nodes, which are determined based on the specific requirements of the task. This balance ensures that the model remains efficient when achieving high predictive accuracy.

The unique memory and gating mechanisms of LSTM, combined with the use of nonlinear activation functions in each layer, make it well-suited for learning long-term dependencies. The cell state of the LSTM is responsible for the transmission of information, which is central to its functionality. A key feature of LSTM is the design of three types of gates: the input gate, the forget gate, and the output gate, as illustrated in Figure 1. These gates are specifically designed to effectively protect and manage the cell state.

Let

\{x_{1}, x_{2}, \dots, x_{T}\}

be a typical input sequence for the LSTM, where

x_{t} \in R^{k}

represents a k-dimensional real-valued vector at time step t. The function of the forget gate is to determine which information should be discarded from the cell state. Its operation can be described as follows:

f_{t} = σ (W_{f} \times x_{t} + U_{f} \times h_{t - 1} + b_{f})

(1)

where

f_{t}

represents the forget threshold, and

σ

denotes the sigmoid activation function, which transforms input values into a range between 0 and 1. The input values are denoted as

x_{t}

,

W_{f}

represents the input weights,

U_{f}

refers to the recurrent weights,

h_{t - 1}

is the output value at time step

t - 1

, and

b_{f}

is the bias term. The function of the input gate is to determine which pieces of information will be stored in the cell state.

i_{t} = σ (W_{i} \times x_{t} + U_{i} \times h_{t - 1} + b_{i})

(2)

\bar{C_{t}} = tanh (W_{c} \times x_{t} + U_{c} \times h_{t - 1} + b_{c})

(3)

where

i_{t}

represents the input threshold,

W_{i}

and

W_{c}

are the corresponding input weights, and

U_{i}

and

U_{c}

are recurrent weights.

b_{i}

and

b_{c}

denote the bias term. Equation (4) is utilized to update the cell state at time step t, where

C_{t - 1}

represents the memory content at time step

t - 1

.

C_{t} = f_{t} \times C_{t - 1} + i_{t} \times \bar{C_{t}}

(4)

Output Gate: This gate is responsible for generating the output information, which can be described as follows:

O_{t} = σ (W_{o} \times x_{t} + U_{o} \times h_{t - 1} + b_{o})

(5)

where

O_{t}

represents the output threshold at time step t,

W_{o}

denotes the input weights,

U_{o}

refers to the recurrent weights, and

b_{o}

is the bias term. The output value at time step t can be expressed as follows:

h_{t} = O_{t} \times tanh C_{t}

(6)

The cell state is processed through the tanh function to normalize the values between −1 and 1, and then it is multiplied by

O_{t}

.

C_{t}

represents the cell state at time step t. The values of W, U, and b are all determined through the learning process during training. The gated structure of the LSTM enables it to effectively capture both the short-term and long-term temporal dependencies within the sequence.

2.2. Structure of the Polynomial Neural Network Classifier

Polynomials are utilized to estimate the multivariate higher-order relationships between inputs and outputs. In this paper, we construct a polynomial neural network classifiers using the following three types of polynomials:

Linear polynomials (Type 1):

f_{j} = a_{j 0} + a_{j 1} x_{1} + \dots + a_{j k} x_{k}

(7)

Quadratic polynomials (Type 2):

f_{j} = a_{j 0} + a_{j 1} x_{1} + \dots + a_{j k} x_{k} + a_{j (k + 1)} x_{1}^{2} + \dots + a_{j (2 k)} x_{k}^{2} + a_{j (2 k + 1)} x_{1} x_{2} + \dots + a_{j ((k + 2) (k + 1) / 2)} x_{k - 1} x_{k}

(8)

Modified quadratic polynomials (Type 3):

f_{j} = a_{j 0} + a_{j 1} x_{1} + \dots + a_{j k} x_{k} + a_{j (k + 1)} x_{1} x_{2} + \dots + a_{j ((k + 2) (k + 1) / 2)} x_{k - 1} x_{k}

(9)

In these expressions, the coefficients of the polynomials are represented as

a_{j n}

for each

n = 0, 1, \dots, k

. Meanwhile,

x_{k}

denotes the input variables, and y signifies the output data.

Figure 2 illustrates the basic structure of polynomial neurons and polynomial neural networks. In this figure, PN denotes the type of neurons, where

x_{a}

and

x_{b}

represent the input variables,

a_{0}, a_{1}, a_{2}

and

a_{3}

indicate the coefficients of the polynomials.

2.3. Structure of the Sub-Dataset Generator

Inspired by the random forest (RF) algorithm, we constructed a sub-dataset generator using two components, as illustrated in Figure 3. In the first component, random sampling is employed to handle the rows of the data, where M random sub-samples are obtained. The second component processes the columns of the data, using information gain (IG) [30] to evaluate the importance of each feature. The feature with the highest information gain is selected as the decision feature in the decision tree, as shown in the figure.

The IG quantifies how much information a feature contributes to the classification system. The greater the amount of information provided by a feature, the greater its importance. For a variable X with m possible values, each with a probability

p (x_{i})

, the entropy of X is defined as follows:

H (X) = - \sum_{i = 0}^{m} p (x_{i}) log p (x_{i})

(10)

The greater the number of possible values for X, the more information there is, and the higher entropy it carries. In the case of multi-feature classification, the IG for a feature is calculated by subtracting the conditional entropy, given the known features, from the original entropy. Consequently, the IG contributed by feature N with respect to class C is expressed as follows:

I G (N) = H (C) - H (C | N)

(11)

Features are then ranked in descending order based on their IG to create a new feature sequence. From this new sequence, n features are selected and input into the training model. Generally,

n \leq N

, where N represents the total number of features in the original dataset.

2.4. Structure of OPNN-T

The OPNN-T comprises two distinct types of neural networks: long short-term memory networks, which are based on deep learning methods, and polynomial neural networks. The proposed OPNN-T architecture is illustrated in Figure 4, where the temporal feature enhancement module is designed to capture the temporal features of time series data. The PNNC is employed to flexibly approximate the relationship between inputs and outputs and to capture the inherent uncertainties within the data. A generator produces sub-datasets of varying sizes and feature quantities, self-optimizing the dataset size, features, and polynomial types.

3. Learning Techniques of OPNN-T

In the proposed model, the time backpropagation algorithm is employed for learning the parameters of the LSTM layers in the temporal feature enhancement module. This algorithm serves as the core method for training LSTM layers and effectively captures both the long-term and short-term features present in time series data. The PNNC utilizes least squares estimation to adjust the weights. To enable the model’s key parameters to flexibly adapt to different datasets, the particle swarm algorithm is used for self-optimization.

3.1. Backpropagation Through the Time Algorithm for LSTM

The first layer of the OPNN-T consists of an LSTM network. A time-based backpropagation algorithm based on mean squared error is employed to train the LSTM network. During the forward propagation of the LSTM, each time step of the input data is processed sequentially; the gating mechanisms of the LSTM are utilized to update the hidden state and cell state, generating outputs at each step.

In the backpropagation phase, the model parameters are updated by calculating the gradients

δ

of the hidden state

h_{t}

and cell state

C_{t}

. This process involves propagating the error back through time, allowing the model to learn from the temporal dynamics of the data and improving its ability to capture dependencies over time. By effectively adjusting the weights based on these gradients, the LSTM network can better fit the underlying patterns in the input time series data.

{\hat{y}}_{t} = σ (V h_{t} + c)

(12)

δ_{h_{t}} = \frac{\partial L}{\partial h_{t}}

(13)

δ_{C_{t}} = \frac{\partial L}{\partial C_{t}}

(14)

In this context,

{\hat{y}}_{t}

denotes the predicted output of time step t, L is the loss function, V represents the weights, and c is the bias. The variable

L (t)

indicates the following recursive relationship:

L (t) = \{\begin{matrix} l (t) + L (t + 1) i f t < τ \\ l (t) i f t = τ \end{matrix}

(15)

When

t = τ

:

δ_{h_{τ}} = {(\frac{\partial O_{τ}}{\partial C_{τ}})}^{T} \frac{\partial L_{τ}}{\partial h_{τ}} = V^{T} ({\hat{y}}_{τ} - y_{τ})

(16)

δ_{C_{τ}} = {(\frac{\partial h_{τ}}{\partial C_{τ}})}^{T} \frac{\partial L_{τ}}{\partial h_{τ}} = δ_{h_{τ}} O τ (1 - {tanh}^{2} (C_{τ}))

(17)

From

δ_{C_{t + 1}}

and

δ_{h_{t + 1}}

, we can derive

δ_{C_{t}}

and

δ_{h_{t}}

:

δ_{h_{t}} = \frac{\partial L}{\partial h_{t}} = \frac{\partial l (t)}{\partial h_{t}} + {(\frac{\partial h_{t + 1}}{\partial h_{t}})}^{T} \frac{\partial L (t + 1)}{\partial h_{t + 1}} = V^{T} ({\hat{y}}_{t} - y_{t}) + {(\frac{\partial h_{t + 1}}{\partial h_{t}})}^{T} δ_{h_{t + 1}}

(18)

Based on Equations (4) and (6), we can derive the following:

\begin{matrix} \frac{\partial h_{t + 1}}{\partial h_{t}} = d i a g [O_{t + 1} (1 - O_{t + 1}) tanh (C_{t + 1})] W_{o} + d i a g [Δ C f_{t + 1} (1 - f_{t + 1}) C_{t}] W_{f} + \\ d i a g [Δ C i_{t + 1} (1 - {\bar{C}}_{t + 1}^{2})] W_{C} + d i a g [Δ C {\bar{C}}_{t + 1} i_{t + 1} (1 - i_{t + 1})] W_{i} \end{matrix}

(19)

Δ C = O_{t + 1} [1 - {tanh}^{2} (C_{t + 1})]

(20)

From

δ_{C_{t + 1}}

and

δ_{h_{t}}

, we can derive the following:

\begin{matrix} δ_{C_{t}} = {(\frac{\partial C_{t + 1}}{\partial C_{t}})}^{T} \frac{\partial L}{\partial C_{t + 1}} + {(\frac{\partial h_{t}}{\partial C_{t}})}^{T} \frac{\partial L}{\partial h_{t}} = {(\frac{\partial C_{t + 1}}{\partial C_{t}})}^{T} δ_{C_{t + 1}} + δ_{h_{t}} O_{t} (1 - {tanh}^{2} C_{t}) \\ = δ_{C_{t + 1}} f_{t + 1} + δ_{h_{t}} O_{t} (1 - {tanh}^{2} C_{t}) \end{matrix}

(21)

From

δ_{C_{t + 1}}

and

δ_{h_{t}}

, we can derive the following:

\frac{\partial L}{\partial W_{f}} = \sum_{t = 1}^{τ} [δ_{C_{t}} C_{t - 1} f_{t} (1 - f_{t})] {h_{t - 1}}^{T}

(22)

W_{f}^{n e w} = W_{f} - η \frac{\partial L}{\partial W_{f}}

(23)

where

η

is the learning rate. The calculations for the other weights

W_{i}

,

W_{o}

,

W_{c}

,

U_{f}

,

U_{i}

,

U_{o}

,

U_{c}

,

b_{f}

,

b_{i}

,

b_{o}

, and

b_{c}

are similar to those of

W_{f}

. Furthermore, the training data are divided into several small batches containing multiple samples. After each batch is processed, the weights are updated. During the training process, the cross-entropy loss function is employed to quantify the difference between predicted and true labels, making it highly suitable for classification tasks. The learning rate is set to 0.01, a value that strikes an effective balance between the convergence speed and training stability. To optimize the training process, we use the Adam optimizer, which combines the benefits of adaptive learning rates and momentum for efficient and stable convergence.

3.2. Least Square Estimation for PNNC

Our model employs the LSE to adjust the weights of the PNNC. The loss function is as follows:

Loss (A) = \sum_{k = 1}^{N} {(y_{k} - \sum_{j = 1}^{c} f_{j} (x_{k}))}^{2} = ∥y - XA∥

(24)

where N represents the number of data points.

f_{j} (x_{k})

is the weight of the

x_{k}

term in the polynomial from Equations (7)–(9).

X

,

A

, and

y

represent the input feature, coefficients of the polynomials, and target output, respectively. The connection weights of the linear polynomial are used as an example, with

X

,

A

, and

y

expressed as follows:

X = [\begin{matrix} x_{11} & x_{21} & \dots & x_{n 1} \\ x_{12} & x_{22} & \dots & x_{n 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{1 N} & x_{2 N} & \dots & x_{n N} \end{matrix}]

(25)

A = {[\begin{matrix} a_{10} & \dots & a_{c 0} \end{matrix} \begin{matrix} \dots & a_{c 1} & \dots & a_{c n} \end{matrix}]}^{T}

(26)

y = {[\begin{matrix} y_{1} & y_{2} & \dots & y_{N} \end{matrix}]}^{T}

(27)

To optimize

A

, the loss function

Loss (A)

is minimized in the following manner:

\begin{matrix} min Loss (A) = (y - X A^{T}) (y - XA) = y^{T} y - y^{T} XA - X A^{T} y + X A^{T} XA \\ = y^{T} y - 2 y^{T} XA + A^{T} X^{T} XA \end{matrix}

(28)

The condition for achieving the minimum is as follows:

\frac{\partial Loss (A)}{\partial A} = - 2 X^{T} y + 2 X^{T} XA = 0

(29)

That is

X^{T} XA = X^{T} y

(30)

Multiplying both sides of the equation by

{(X^{T} X)}^{- 1}

yields the following:

{(X^{T} X)}^{- 1} X^{T} XA = {(X^{T} X)}^{- 1} X^{T} y

(31)

Thus, we can derive the expression for the weights when the loss function is at its minimum:

A = {(X^{T} X)}^{- 1} X^{T} y

(32)

3.3. Strategies of Self-Optimization

In the process of constructing the OPNN-T model, certain parameters can be adjusted based on the actual conditions of the dataset to achieve better performance. Consequently, the model employs a self-optimizing approach, with the PSO algorithm constituting its core component. PSO is a heuristic search technique that falls under the category of evolutionary computation in artificial intelligence and is widely used to solve complex and nonlinear optimization problems. Unlike other heuristic algorithms, PSO is a flexible and balanced mechanism that enhances both global and local exploration capabilities.

As shown in Figure 5, the OPNN-T is constructed using the parameter values optimized by the PSO algorithm. The optimization of the model includes the selection of the number of features, the number of sub-datasets, and the type of polynomial.

[Step 1] Perform random sampling on the input data, obtaining a subsample of size N through iterations.

[Step 2] Calculate the information gain for each feature according to Equation (11), and rank the features based on their information gain values. A higher information gain value indicates greater importance for the feature.

[Step 3] Determine the selected features. The first particle is responsible for selecting the number of features, processed via the RF algorithm. The randomly selected value is then rounded to the nearest integer. The decoded integer value represents the optimal dimensionality of the input variables within the feature data obtained through random sampling.

[Step 4] Determine the size of the selected sub-dataset. The second particle decides the size of the sub-dataset and adjusts it as necessary to enhance the model’s fitting capability. The randomly assigned value is also rounded to the nearest integer, and the selected sub-datasets are used for polynomial calculations.

[Step 5] Determine the type of polynomial. The third particle is tasked with determining the type of polynomial to be used, with the selected value similarly rounded to the nearest integer, corresponding to three different types of polynomials.

[Step 6] Evaluate the actual performance of each subset. The classification accuracy (CR) is used as the fitness value for the PSO objective function, and serves as the evaluation metric. The

C R

can be expressed in the following form:

C R = (1 - \frac{e r r o r}{N}) \times 100 %

(33)

where

e r r o r

represents the total number of classification errors. N denotes the number of samples in the training set when the training accuracy (TR) is calculated. When computing the testing accuracy (TE), N represents the number of samples in the testing set. The training data are used to construct the model, whereas the test set is employed to validate the actual performance of the model.

[Step 7] Check for the termination condition: (1) the maximum number of iterations is reached, or (2) the global best fitness value shows no significant improvement over a predefined number of consecutive iterations. If the termination condition is not met, return to and repeat Steps 3 through 6. If the termination condition is satisfied, output the final results. These termination criteria ensure the stability and efficiency of the optimization process, making it suitable for the diverse datasets used in the experiments.

3.4. Computational Complexity Analysis

For the temporal feature enhancement module, the LSTM network is local in both time and space, which means that the input sequence length does not affect the complexity [31]. For each time step, the complexity for each weight is

O (1)

. Therefore, the complexity of the temporal feature enhancement module is

O (W)

, where W represents the number of weights.

For the PNNC, the complexity of the LSE in each polynomial neuron (PN) is

O ({n_{k}}^{3})

[32], where

n_{k}

denotes the dimension of the input values. Therefore, the complexity of the PNNC can be expressed as

O ({n_{k}}^{3})

.

For the self-optimization algorithm, the complexity of a single PSO iteration is determined by the fitness function. In the proposed model, the fitness function is derived from the PNNC. Therefore, the complexity of a single iteration is

O (m_{s} \cdot {n_{k}}^{3})

, where

m_{s}

represents the swarm size. Considering that the algorithm requires T iterations to converge or reach the predefined maximum number of iterations, the complexity of the self-optimization algorithm is

O (T \cdot m_{s} \cdot {n_{k}}^{3})

. In summary, the overall complexity of the proposed model can be expressed as

O (W + T \cdot m_{s} \cdot {n_{k}}^{3})

(34)

4. Experiments and Results

4.1. Experimental Design

We compared the experimental results with those of the traditional models and other related models proposed by researchers. To evaluate the performance of the proposed model, we selected eight public machine learning datasets from the UCI benchmark for comparison. The models compared with the PNN and the self-optimized PNN (OPNN) are the latest standard machine learning classifiers available in scikit-learn [33]. The UCI datasets can be accessed at https://archive.ics.uci.edu/ml (accessed on 21 January 2025).

Furthermore, we chose 17 publicly available time series datasets to compare the differences between OPNN-T and some other models. Time series datasets can be accessed at https://timeseriesclassification.com/ (accessed on 21 January 2025). The datasets utilized in experiments span a wide range of characteristics, including number of features, number of classes, temporal complexity, and application domains. This diversity allows models to be tested under different conditions, ensuring that the performance of the models is evaluated comprehensively and potential limitations are identified across various scenarios.

In the PSO algorithm, different parameter values may yield varying results. The parameter values for our model were selected based on experiments and references from several existing models, as shown in Table 1. For the swarm size, we experimented with values of 10, 20, and 30. A smaller swarm size reduced computational cost but often resulted in suboptimal solutions, whereas a larger swarm size improved the quality of solutions at the expense of increased computation time. A swarm size of 20 provided a suitable balance between efficiency and solution quality. Similarly, for the inertia weight, values of 0.3, 0.4, and 0.5 were tested. A lower value encouraged exploitation by focusing on local search. However, a higher value promoted exploration at the cost of slower convergence. The inertia weight of 0.4 achieved consistent results in most cases and proved effective in balancing exploration and exploitation.

All the experiments used the classification rate as the evaluation metric, referring to Equation (33) for the calculation of the classification rate.

4.2. Experiments with Machine Learning Datasets

To validate the performance of the OPNN, we conducted experiments on eight different machine learning datasets obtained from the UCI database. The datasets were divided into a 60:40 ratio, with 60% of the data allocated for the training set and 40% for the testing set. Table 2 summarizes the relevant characteristics of these eight datasets, including the number of samples, the number of features, and the number of classes. The optimization includes selected features, the size of the chosen sub-dataset, and the type of polynomial. Table 3 presents the parameter results of the classifier after optimization. Figure 6 compares the number of original features with the number of features selected after optimization.

Multiple randomly sampled sub-datasets were fed into the polynomial neurons of the classifier, enabling the model to achieve good generalization capabilities. Table 4 displays the classification accuracy for both training and testing data. As described in Section 3, the fitness of the objective function refers to the classification rate of the training data, with TR and TE representing the classification rate of the training and testing sets, respectively. Table 5 and Figure 7 compare the classification rates of the proposed classifiers with those of classifiers from the literature and the traditional classification algorithms.

The experimental results in Table 5 demonstrate that OPNN consistently outperforms or matches other classification algorithms across a range of datasets. Specifically, OPNN achieved the highest classification rate in five out of the eight datasets, including Fertility, Forest Fires, Seeds, Vehicle, and Zoo. On the remaining three datasets, OPNN either matched the best-performing algorithm, as seen in the Forest Fires and Wine datasets or performed slightly below the top methods, as observed in WDBC. Notably, on the Seeds dataset, the OPNN achieves a classification rate of 98.81%, surpassing the strong performance of the PNN of 97.62%, which highlights the effectiveness of the optimization process in enhancing classification accuracy. In datasets with well-separated and clearly distinguishable features, such as Zoo and Seeds, OPNN demonstrates notable advantages, outperforming even ensemble-based algorithms such as random forest and AdaBoost. However, in datasets with overlapping class boundaries or less distinct feature patterns, such as WDBC and Ionosphere, OPNN’s performance remains competitive but does not show a substantial improvement over the best baseline methods.

Additionally, OPNN shows a significant improvement over PNN in datasets such Vehicle and Forest Fires, further emphasizing the impact of the optimization process. These findings highlight OPNN’s robustness and adaptability across diverse datasets, particularly for scenarios involving complex or nonlinear data patterns. However, it is also evident that in datasets where multiple algorithms already achieve near-perfect accuracy, such as WDBC and Zoo, the relative improvement offered by OPNN may appear less pronounced due to the inherently high baseline performance. Overall, the results underline OPNN’s capacity to maintain consistent, and high levels of classification accuracy while demonstrating adaptability across a broad range of data conditions.

4.3. Experiments with Time Series Datasets

To further validate the capabilities of the proposed model, we conducted experiments on 17 different publicly available time series datasets. Figure 8 illustrates the time series features corresponding to different labels (0 and 1) in the earthquake dataset. Table 6 summarizes the relevant characteristics of these 17 datasets, including the number of samples, the number of features, and the number of classes.

Table 7 presents the classification accuracy for both the training and testing data, with TR and TE indicating the classification accuracy of the time series training and testing sets, respectively. Table 8 compares the classification rates of different proposed models for time series classification, including OPNN-T, OPNN (without temporal feature enhancement), OPNN-T (with only linear polynomial types), and OPNN-T (without the sub-dataset method). Table 9 compares the classification results of the proposed classifier with those of the classifiers referenced in the literature.

Figure 9 displays a comparison of accuracy results between OPNN-T and other classifiers, including RF, AdaBoost, TBOPE, RSFCF, AFFNet, and OPNN. Table 10 presents the results of the Wilcoxon signed-rank test conducted to compare the OPNN-T with six other models. Figure 10 provides the average ranking and critical difference diagram for the seven classifiers across the 17 datasets.

The experimental results indicate that the proposed model not only significantly outperforms traditional classification algorithms in terms of the classification rate, but also has advantages over recently developed time series classification models. Specifically, OPNN-T achieved the highest classification rate in 12 out of the 17 time series datasets, while maintaining competitive performance in the remaining five datasets. To further validate the statistical significance of OPNN-T’s performance improvements, we conducted a Wilcoxon signed-rank test to compare its classification rates with those of the baseline models across the 17 datasets. The results of the test revealed that the p values for all comparisons with the baseline models were consistently less than 0.05, indicating that the observed performance improvements of OPNN-T are statistically significant. This demonstrates that OPNN-T not only achieves superior performance in terms of average ranking but also provides consistent and reliable advantages over a variety of classification models, including both the traditional classifiers and state-of-the-art time series classification models.

The excellent performance of OPNN-T can be attributed to its architectural design, which combines self-optimizing polynomial neural networks with a temporal feature enhancement module. As shown in the results of Table 8, temporal feature enhancement and the self-optimization approach have varying impacts on the model’s performance. Temporal feature enhancement enables the model to capture more critical time series features. The diverse types of polynomials and the sub-dataset generation method further improve the classification rate of the model in different scenarios. Specifically, the best results for 4 of the 17 datasets were achieved via linear polynomials, which means that the performance on the remaining 13 datasets benefited from higher-order polynomials. Additionally, the sub-dataset generation method often outperforms the performance observed on the full dataset, effectively reducing the model’s overreliance on the training data.

The design of OPNN-T provides a distinctive methodological perspective compared to existing models, addressing certain challenges they face under specific conditions. For instance, TBOPE uses the SAX symbolization method, which predominantly captures the average and trend features of time series data. While effective in certain scenarios, this method may overlook other critical feature information, limiting its ability to fully represent the complexity of the data. RSFCF enhances computational efficiency by randomly selecting shapelets, and restricts their search range, effectively reducing computational overhead. However, this strategy can result in the loss of shapelet location information and insufficient discriminative features, which may ultimately impact the classification rate.

OPNN-T generally demonstrates superior adaptability in time series classification tasks compared to OPNN; its performance on the ItalyPowerDemand dataset is slightly suboptimal. This may be attributed to several factors. First, the dataset is characterized by a limited number of features and lacks the pronounced long-term temporal dependencies observed in other datasets, such as those with periodic or seasonal patterns. As the OPNN-T is specifically designed to capture complex temporal dynamics, its advantage becomes less pronounced when such characteristics are absent. Second, the relatively low variability within the dataset may reduce the model’s ability to leverage its advanced feature extraction mechanisms, leading to a classification performance closer to that of the simpler models.

5. Conclusions

This study presents a novel self-optimizing polynomial neural network with temporal feature enhancement for time series classification, referred to as OPNN-T. The proposed model consists of a temporal feature enhancement module and PNNC, which uses three types of polynomial functions and employs self-optimization. Experimental results on 17 publicly available time series machine learning datasets indicate that the classification rate of the proposed model surpasses that of general models and several time series classification models reported in related literature.

OPNN-T offers significant potential for integration into a variety of practical applications where accurate time series classification is essential. For instance, in the domain of financial analysis, the model can be applied to detect anomalies in trading patterns by leveraging its ability to extract and enhance temporal dependencies from noisy and volatile time series data. In healthcare, it could support diagnostic systems by analyzing biomedical signals, such as ECG or electroencephalograms (EEGs), to identify irregularities indicative of diseases. Furthermore, in industrial settings it might be utilized in predictive maintenance systems to detect early warning signs of equipment failures, thereby reducing downtime and preventing costly repairs. These applications underscore the versatility of the model, as it is capable of adapting to diverse data characteristics while maintaining high predictive performances.

Despite the model’s promising applicability, its performance on certain datasets, especially those characterized by high noise levels or complex temporal dynamics, is not yet optimal, suggesting potential avenues for future refinements. To address these limitations, and ensure the broader applicability of the model, we plan to further optimize its core architecture. For example, replacing the current polynomial neurons with fuzzy polynomial neurons may enhance the model’s capability for nonlinear feature extraction and improve its robustness in handling noisy or irregular time series data. Furthermore, to facilitate deployment in real-world systems, future work will also focus on improving computational efficiency and scalability. By addressing these aspects, the proposed model holds promise as a more versatile and effective tool for diverse time series classification tasks, bridging the gap between theoretical innovation and practical implementation.

Author Contributions

Y.T. provided the idea, conducted the experiments and wrote the main manuscript text. Z.X. analyzed the data and edited the manuscript. W.H. provided critical insights and supervised the project. All authors reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Major Scientific Instruments and Equipments Development Project of National Natural Science Foundation of China grant number 62227805.

Data Availability Statement

Data will be available on request. UCI datasets are available at https://archive.ics.uci.edu/ml (accessed on 21 January 2025). Time series datasets are available at https://timeseriesclassification.com/ (accessed on 21 January 2025).

Acknowledgments

The authors would like to thank the editors and reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Grabocka, J.; Schilling, N.; Wistuba, M.; Schmidt-Thieme, L. Learning time-series shapelets. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 392–401. [Google Scholar]
Wenhe, Y.; Guiling, L. Research on time series classification based on shapelet. Comput. Sci. 2019, 46, 29–35. [Google Scholar]
Hadiyoso, S.; Aulia, S.; Rizal, A. One-lead electrocardiogram for biometric authentication using time series analysis and Support Vector Machine. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 276–283. [Google Scholar] [CrossRef]
Tareen, A.D.K.; Asim, K.M.; Kearfott, K.J.; Rafique, M.; Nadeem, M.S.A.; Iqbal, T.; Rahman, S.U. Automated anomalous behaviour detection in soil radon gas prior to earthquakes using computational intelligence techniques. J. Environ. Radioact. 2019, 203, 48–54. [Google Scholar] [CrossRef]
Middlehurst, M.; Large, J.; Bagnall, A. The canonical interval forest (CIF) classifier for time series classification. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 188–195. [Google Scholar]
Zhou, Y.; Ren, H.; Li, Z.; Wu, N.; Al-Ahmari, A.M. Anomaly detection via a combination model in time series data. Appl. Intell. 2021, 51, 4874–4887. [Google Scholar] [CrossRef]
Ren, S.; Zhang, J.; Gu, X.; Xiong, Y.; Wang, H.; Xu, H. Overview of feature extraction algorithms for time series. J. Chin. Comput. Syst. 2021, 42, 271–278. [Google Scholar]
Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR time series archive. IEEE/CAA J. Autom. Sin. 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef]
Zhu, L.; Zhang, X. Time series data-driven online prognosis of wind turbine faults in presence of SCADA data loss. IEEE Trans. Sustain. Energy 2020, 12, 1289–1300. [Google Scholar] [CrossRef]
Huang, H.; Wang, Y.; Zong, H. Support vector machine classification over encrypted data. Appl. Intell. 2022, 52, 5938–5948. [Google Scholar] [CrossRef]
Zhang, L.; Zheng, X.; Pang, Q.; Zhou, W. Fast Gaussian kernel support vector machine recursive feature elimination algorithm. Appl. Intell. 2021, 51, 9001–9014. [Google Scholar] [CrossRef]
Zhang, H.; Cheng, N.; Zhang, Y.; Li, Z. Label flipping attacks against Naive Bayes on spam filtering systems. Appl. Intell. 2021, 51, 4503–4514. [Google Scholar] [CrossRef]
Wu, W.; Xia, Y.; Jin, W. Predicting bus passenger flow and prioritizing influential factors using multi-source data: Scaled stacking gradient boosting decision trees. IEEE Trans. Intell. Transp. Syst. 2020, 22, 2510–2523. [Google Scholar] [CrossRef]
Bui, K.H.N.; Cho, J.; Yi, H. Spatial-temporal graph neural network for traffic forecasting: An overview and open research issues. Appl. Intell. 2022, 52, 2763–2774. [Google Scholar] [CrossRef]
Qiao, J.; Wang, L. Nonlinear system modeling and application based on restricted Boltzmann machine and improved BP neural network. Appl. Intell. 2021, 51, 37–50. [Google Scholar] [CrossRef]
Lippmann, R.P. An introduction to computing with neural nets. ACM Sigarch Comput. Archit. News 1988, 16, 7–25. [Google Scholar] [CrossRef]
Misra, B.B.; Dehuri, S.; Dash, P.K.; Panda, G. A reduced and comprehensible polynomial neural network for classification. Pattern Recognit. Lett. 2008, 29, 1705–1712. [Google Scholar] [CrossRef]
Buhmann, M.D. Radial basis functions. Acta Numer. 2000, 9, 1–38. [Google Scholar] [CrossRef]
Xiao, Y.; Huang, W.; Oh, S.K.; Zhu, L. A polynomial kernel neural network classifier based on random sampling and information gain. Appl. Intell. 2022, 52, 6398–6412. [Google Scholar] [CrossRef]
Bai, B.; Li, G.; Wang, S.; Wu, Z.; Yan, W. Time series classification based on multi-feature dictionary representation and ensemble learning. Expert Syst. Appl. 2021, 169, 114162. [Google Scholar] [CrossRef]
Liu, H.Y.; Gao, Z.Z.; Wang, Z.H.; Deng, Y.H. Time series classification with shapelet and canonical features. Appl. Sci. 2022, 12, 8685. [Google Scholar] [CrossRef]
Wang, T.; Liu, Z.; Zhang, T.; Hussain, S.F.; Waqas, M.; Li, Y. Adaptive feature fusion for time series classification. Knowl.-Based Syst. 2022, 243, 108459. [Google Scholar] [CrossRef]
Wang, Z.; Oh, S.K.; Wang, Z.; Fu, Z.; Pedrycz, W.; Yoon, J.H. Design of progressive fuzzy polynomial neural networks through gated recurrent unit structure and correlation/probabilistic selection strategies. Fuzzy Sets Syst. 2023, 470, 108656. [Google Scholar] [CrossRef]
Yoo, S.H.; Oh, S.K.; Pedrycz, W. Optimized face recognition algorithm using radial basis function neural networks and its practical applications. Neural Netw. 2015, 69, 111–125. [Google Scholar] [CrossRef]
Roh, S.B.; Oh, S.K.; Pedrycz, W.; Seo, K.; Fu, Z. Design methodology for radial basis function neural networks classifier based on locally linear reconstruction and conditional fuzzy C-means clustering. Int. J. Approx. Reason. 2019, 106, 228–243. [Google Scholar] [CrossRef]
Laskar, N.M.; Guha, K.; Chatterjee, I.; Chanda, S.; Baishnab, K.L.; Paul, P.K. HWPSO: A new hybrid whale-particle swarm optimization algorithm and its application in electronic design optimization problems. Appl. Intell. 2019, 49, 265–291. [Google Scholar] [CrossRef]
Björck, Å. Least squares methods. Handb. Numer. Anal. 1990, 1, 465–652. [Google Scholar]
Wu, C.; Zhang, H.; Hua, J.; Hua, S.; Zhang, Y.; Lu, X.; Tang, Y. A novel least square and image rotation based method for solving the inclination problem of license plate in its camera captured image. KSII Trans. Internet Inf. Syst. (TIIS) 2019, 13, 5990–6008. [Google Scholar]
Jadhav, S.; He, H.; Jenkins, K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl. Soft Comput. 2018, 69, 541–553. [Google Scholar] [CrossRef]
Tsironi, E.; Barros, P.; Weber, C.; Wermter, S. An analysis of convolutional long short-term memory recurrent neural networks for gesture recognition. Neurocomputing 2017, 268, 76–86. [Google Scholar] [CrossRef]
Zhou, K.; Oh, S.K.; Qiu, J.; Pedrycz, W.; Seo, K.; Yoon, J.H. Design of Hierarchical Neural Networks Using Deep LSTM and Self-organizing Dynamical Fuzzy-Neural Network Architecture. IEEE Trans. Fuzzy Syst. 2024, 32, 2915–2929. [Google Scholar] [CrossRef]
Abraham, A.; Pedregosa, F.; Eickenberg, M.; Gervais, P.; Mueller, A.; Kossaifi, J.; Gramfort, A.; Thirion, B.; Varoquaux, G. Machine learning for neuroimaging with scikit-learn. Front. Neuroinform. 2014, 8, 71792. [Google Scholar] [CrossRef]

Figure 1. The gate structure of LSTM.

Figure 2. Basic structure of PNs and PNN.

Figure 3. Structure of the sub-dataset generator.

Figure 4. Structure of OPNN-T.

Figure 5. Overall procedure of the self-optimization strategy.

Figure 6. The number of original features after self-optimization.

Figure 7. Heat map of the classification rates of the classifiers.

Figure 8. Time series characteristics with different labels in the earthquake dataset. (a) A sample with label = 0 in the earthquake dataset. (b) A sample with label = 1 in the earthquake dataset.

Figure 9. Comparison of the classification rates between OPNN-T and RF, AdaBoost, OPNN, TBOPE, RSFCF, and AFFNet. (a) OPNN-T vs. RF. (b) OPNN-T vs. AdaBoost. (c) OPNN-T vs. TBOPE. (d) OPNN-T vs. RSFCF. (e) OPNN-T vs. AFFNet. (f) OPNN-T vs. OPNN.

Figure 10. Critical difference diagram of the average ranks on the accuracy for seven classifiers.

Table 1. Values of the parameters of the classifier.

Parameters	Values
Generation size	100
Swarm size	20
Vmin, Vmax	−1, 1
W	0.4
c1, c2	[1, 2]
No. of features	[2, no. of features]
No. of sub-datasets	[10, 30]
Types of polynomial	[1, 3]

Table 2. Characteristics of the datasets.

Dataset	Number of Samples	Number of Features	Number of Classes
Fertility	100	10	2
Forest fires	244	13	2
Ionosphere	351	34	2
Seeds	210	7	3
Vehicle	846	18	4
WDBC	569	30	2
Wine	178	13	3
Zoo	101	16	7

Table 3. Parameter results of the classifier after optimization.

Dataset	No. of Selected Features	No. of Sub-Dataset Selections	Type of the Polynomial
Fertility	8	9	3
Forest fires	10	2	3
Ionosphere	17	8	1
Seeds	6	17	3
Vehicle	14	10	2
WDBC	9	14	2
Wine	8	6	2
Zoo	11	13	3

Table 4. Classification rate of the OPNN.

Data	TR	TE
Fertility	90.00	92.50
Forest fires	98.63	98.98
Ionosphere	98.58	91.43
Seeds	97.62	98.81
Vehicle	87.57	80.53
WDBC	97.38	96.77
Wine	99.12	98.44
Zoo	98.55	100.00

Table 5. Comparison of the classification rates of the classifiers.

Data	KNN [33]	SVM [33]	Bayes [33]	Logistic [33]	Decision Tree [33]	RF [33]	AdaBoost [33]	PNN	OPNN
Fertility	87.50	90.00	92.50	90.00	80.00	92.50	85.00	90.00	92.50
Forest fires	90.28	98.98	92.86	97.96	97.96	97.96	97.96	95.92	98.98
Ionosphere	82.86	90.71	87.86	89.29	87.14	92.86	91.43	90.71	91.43
Seeds	85.71	91.67	90.48	92.86	90.48	91.67	86.90	97.62	98.81
Vehicle	61.36	78.76	43.36	80.24	69.62	71.68	55.75	77.58	80.53
WDBC	97.13	97.13	96.77	97.13	92.47	96.77	95.70	95.70	96.77
Wine	64.06	93.75	96.88	96.88	87.50	98.44	84.38	92.19	98.44
Zoo	92.31	97.44	97.44	97.44	97.44	97.44	79.49	97.44	100.00

The best results are shown in bold.

Table 6. Characteristics of the time series datasets.

Dataset	Number of Samples	Number of Features	Number of Classes
BeetleFly	40	512	2
DistalPhalanxOAG	539	80	3
DistalPhalanxOC	876	80	2
DistalPhalanxTW	539	80	6
Earthquakes	461	512	2
ECG200	200	96	2
ECG5000	5000	140	5
Ham	214	431	2
Herring	128	512	2
ItalyPowerDemand	1096	24	2
MiddlePhalanxOAG	554	80	3
MiddlePhalanxOC	891	80	2
MiddlePhalanxTW	553	80	6
OliveOil	60	570	4
ProximalPhalanxOAG	605	80	3
ProximalPhalanxOC	891	80	2
ProximalPhalanxTW	605	80	6

Table 7. Classification rate of OPNN-T.

Data	TR	TE
BeetleFly	100.00	95.00
DistalPhalanxOAG	88.00	77.70
DistalPhalanxOC	84.33	75.00
DistalPhalanxTW	83.00	73.38
Earthquakes	100.00	76.98
ECG200	100.00	95.00
ECG5000	98.20	94.16
Ham	99.08	80.95
Herring	85.94	75.00
ItalyPowerDemand	100.00	96.40
MiddlePhalanxOAG	80.50	66.88
MiddlePhalanxOC	83.50	76.63
MiddlePhalanxTW	79.70	61.04
OliveOil	100.00	93.33
ProximalPhalanxOAG	90.25	88.78
ProximalPhalanxOC	87.50	86.94
ProximalPhalanxTW	88.75	82.44

Table 8. Comparison of the time series classification rate with the proposed models.

Dataset	OPNN	OPNN-T (Linear Polynomial)	OPNN-T (Without Sub-Datasets)	OPNN-T
BeetleFly	85.00	90.00	80.00	95.00
DistalPOAG	71.22	74.82	67.63	77.70
DistalPOC	67.03	71.01	73.19	75.00
DistalPTW	60.43	70.50	58.27	73.38
Earthquakes	67.63	71.22	75.54	76.98
ECG200	93.00	92.00	90.00	95.00
ECG5000	92.42	94.16	93.87	94.16
Ham	66.67	79.04	76.19	80.95
Herring	68.75	71.88	64.06	75.00
ItalyPD	96.99	96.40	96.21	96.40
MiddlePOAG	60.39	64.29	61.04	66.88
MiddlePOC	64.95	61.17	74.91	76.63
MiddlePTW	56.49	61.04	56.49	61.04
OliveOil	80.00	90.00	93.33	93.33
ProximalPOAG	86.34	87.32	84.39	88.78
ProximalPOC	81.10	81.79	86.60	86.94
ProximalPTW	78.05	82.44	77.07	82.44

The best results are shown in bold.

Table 9. Comparison of the time series classification rate with the classifiers.

Data	RF [33]	AdaBoost [33]	TBOPE [21]	RSFCF [22]	AFFNet [23]	OPNN	OPNN-T
BeetleFly	90.00	85.00	85.00	85.00	85.00	85.00	95.00
DistalPOAG	75.54	40.29	76.25	74.82	74.10	71.22	77.70
DistalPOC	78.26	76.45	73.91	78.26	76.40	67.03	75.00
DistalPTW	70.50	64.75	67.62	69.06	69.80	60.43	73.38
Earthquakes	74.82	75.54	74.82	74.82	74.80	67.63	76.98
ECG200	81.00	85.00	88.00	86.00	92.00	93.00	95.00
ECG5000	93.53	87.76	94.37	94.53	94.10	92.42	94.16
Ham	72.38	66.67	58.09	76.19	82.90	66.67	80.95
Herring	57.81	70.31	64.06	67.19	67.20	68.75	75.00
ItalyPD	95.63	95.24	94.07	96.02	97.50	96.99	96.40
MiddlePOAG	61.04	59.09	58.44	62.99	54.50	60.39	66.88
MiddlePOC	82.13	73.54	77.66	83.16	81.80	64.95	76.63
MiddlePTW	57.14	52.60	56.49	58.44	50.00	56.49	61.04
OliveOil	90.00	73.33	86.66	83.33	76.70	80.00	93.33
ProximalPOAG	86.83	80.49	84.19	82.93	84.40	86.34	88.78
ProximalPOC	86.25	83.16	83.41	87.63	91.80	81.10	86.94
ProximalPTW	80.49	73.66	78.53	80.49	80.00	78.05	82.44

The best results are shown in bold.

Table 10. Results of the Wilcoxon signed-rank test comparing OPNN-T with the other models.

Comparison Model	p Value
RF	0.007904
AdaBoost	0.000076
TBOPE	0.000076
RSFCF	0.012863
AFFNet	0.034790
OPNN	0.000031

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, Y.; Xu, Z.; Huang, W. Design of Self-Optimizing Polynomial Neural Networks with Temporal Feature Enhancement for Time Series Classification. Electronics 2025, 14, 465. https://doi.org/10.3390/electronics14030465

AMA Style

Tang Y, Xu Z, Huang W. Design of Self-Optimizing Polynomial Neural Networks with Temporal Feature Enhancement for Time Series Classification. Electronics. 2025; 14(3):465. https://doi.org/10.3390/electronics14030465

Chicago/Turabian Style

Tang, Yuqi, Zhilei Xu, and Wei Huang. 2025. "Design of Self-Optimizing Polynomial Neural Networks with Temporal Feature Enhancement for Time Series Classification" Electronics 14, no. 3: 465. https://doi.org/10.3390/electronics14030465

APA Style

Tang, Y., Xu, Z., & Huang, W. (2025). Design of Self-Optimizing Polynomial Neural Networks with Temporal Feature Enhancement for Time Series Classification. Electronics, 14(3), 465. https://doi.org/10.3390/electronics14030465

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design of Self-Optimizing Polynomial Neural Networks with Temporal Feature Enhancement for Time Series Classification

Abstract

1. Introduction

2. Architecture Design of the Proposed Model

2.1. Structure of Temporal Feature Enhancement Module

2.2. Structure of the Polynomial Neural Network Classifier

2.3. Structure of the Sub-Dataset Generator

2.4. Structure of OPNN-T

3. Learning Techniques of OPNN-T

3.1. Backpropagation Through the Time Algorithm for LSTM

3.2. Least Square Estimation for PNNC

3.3. Strategies of Self-Optimization

3.4. Computational Complexity Analysis

4. Experiments and Results

4.1. Experimental Design

4.2. Experiments with Machine Learning Datasets

4.3. Experiments with Time Series Datasets

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI