Time-Series Neural Network: A High-Accuracy Time-Series Forecasting Method Based on Kernel Filter and Time Attention

: This research introduces a novel high-accuracy time-series forecasting method, namely the Time Neural Network (TNN), which is based on a kernel ﬁlter and time attention mechanism. Taking into account the complex characteristics of time-series data, such as non-linearity, high dimensionality, and long-term dependence, the TNN model is designed and implemented. The key innovations of the TNN model lie in the incorporation of the time attention mechanism and kernel ﬁlter, allowing the model to allocate different weights to features at each time point, and extract high-level features from the time-series data, thereby improving the model’s predictive accuracy. Additionally, an adaptive weight generator is integrated into the model, enabling the model to automatically adjust weights based on input features. Mainstream time-series forecasting models such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory Networks (LSTM) are employed as baseline models and comprehensive comparative experiments are conducted. The results indicate that the TNN model signiﬁcantly outperforms the baseline models in both long-term and short-term prediction tasks. Speciﬁcally, the RMSE, MAE, and R 2 reach 0.05, 0.23, and 0.95, respectively. Remarkably, even for complex time-series data that contain a large amount of noise, the TNN model still maintains a high prediction accuracy.


Introduction
With the rapid development of global financial markets, the stock market has increasingly become a significant choice for investors.In the stock market, the accuracy of stock price prediction directly influences investors' decisions and is crucial for the health and stability of economic activities.However, stock price prediction poses a formidable challenge.Stock prices are influenced by numerous factors, including but not limited to macroeconomic conditions, company performance reports, market sentiment, and even global political dynamics.The interweaving of these factors causes stock prices to exhibit a high degree of uncertainty and non-linearity, which adds significant difficulty to forecasting.
In recent years, deep learning has made considerable contributions in fields such as agriculture [1,2], healthcare [3,4], energy usage [5], and finance [6].This development provides a new solution for the time series prediction problem.Neural network models have gradually been widely used in stock price prediction due to their advantages in processing non-linear data and capturing long-distance dependencies.Nevertheless, most existing prediction models based on neural networks often overlook a critical issue, the temporal attributes of stock prices and their importance.In reality, the impact of past price trends on future prices is not equal; recent price changes often have a more significant effect on future price predictions.
To address this problem, a time-series neural network method based on Kernel Filter and Time Attention is proposed in this paper, both of which are novel applications developed by the authors for achieving higher accuracy in stock price prediction.Firstly, the Kernel Filter is incorporated into the neural network model to effectively extract the features of time-series data, especially in handling data with noise.This is a novel application aiming to improve upon existing filtering techniques in neural networks.By applying Kernel Filter, it is possible to capture the underlying trends of stock prices more accurately and eliminate irrelevant noise interference, thereby enhancing the accuracy of predictions.Secondly, a novel Time Attention mechanism is designed that assigns higher weights to recent data, a unique approach developed to extend the capabilities of existing attention mechanisms in capturing the temporal characteristics of stock prices.The advantage of this approach is that it can more effectively capture the dynamics of recent prices, which often serves as a crucial factor in predicting future prices.With these two innovative designs, the proposed model considers the characteristics of time-series data, effectively extracts data features, and pays more attention to recent data, thereby achieving higher accuracy in stock price prediction.
In addition to introducing these innovative techniques, a series of carefully designed experiments was conducted to measure the model's performance against established benchmarks in the field, such as RNN and LSTM.Our findings confirm that the TNN stands up exceptionally well when challenged with various forecasting tasks, making it particularly suitable for predicting stock prices.Notably, the model's performance remains robust even when applied to noisy, complex time-series data.Detailed evaluations and comparisons are presented in the subsequent sections, reaffirming the model's superior predictive power with noteworthy metrics.
In the future, there are plans to further optimize the model and verify it on more financial datasets, with the aim of further enhancing the model's generalization ability and prediction accuracy.

Related Work
Times-series forecasting has continually served as a research hotspot in the field of finance, with its core premise being to decipher patterns from historical data to predict future price fluctuations.To tackle this issue, researchers have implemented a variety of machine learning methods, which include both traditional machine learning methods and deep learning techniques.

Traditional Machine Learning Methods
In early research endeavors, traditional machine learning methods were ubiquitously employed for stock price prediction.These methods encompassed Linear Regression (LR) [22], SVM [23], and Decision Trees [24], among others.Linear Regression, a fundamental prediction model, primarily extrapolates based on the linear relationship between inputs and outputs.Its basic form is as follows: where X denotes the input variables, y the output variables, and a and b the model parameters to be learned.However, as stock prices are influenced by a multitude of factors, the inherent laws are often non-linear.Consequently, the linear regression model struggles to capture this complexity.
Support Vector Machine, a common method for both classification and regression, operates by finding an optimal hyperplane to separate the data, thereby achieving the goal of prediction.For regression problems, the form of SVM is as follows: where φ(X) represents the feature mapping of input variables X, w and b are the model parameters to be learned, and w, φ(X) denotes the inner product of w and φ(X).Although SVM can handle non-linear problems, its high computational complexity when applied to high-dimensional and large-scale datasets proves to be a substantial obstacle.
Traditional machine learning methods like SVM, Random Forests, and Decision Trees often encounter several limitations in the context of stock price prediction.SVMs [25], while effective for linearly separable problems, struggle with handling high dimensionality and require substantial tuning, including the choice of an appropriate kernel function for nonlinear financial time-series data.Random Forests [26], although they offer an improvement over Decision Trees by ensemble learning, still suffer from high computational complexity and can underperform when dealing with highly noisy and volatile markets.Decision Trees, on the other hand, are simple to implement and interpret but are prone to overfitting, especially when grappling with the complex, noisy, and erratic nature of stock markets.These methods often require manual feature engineering and generally fail to capture the intricate, non-linear patterns and long-term dependencies that are inherent to financial time-series data.

Deep Learning Methods
In recent years, deep learning methods, particularly RNN [27] and LSTM [28], have found extensive application in the field of stock price prediction.RNN, a neural network with memory function, is capable of capturing temporal relationships within sequence data.Its basic formula is as follows: where x t is the input, h t the hidden state, y t the output, σ the activation function, W hh , W xh , and W hy the weight parameters, and b h and b y the bias parameters.Although RNNs can handle sequence data, they suffer from vanishing and exploding gradients in long sequences, making it challenging to capture long-term dependencies.LSTM, an improved RNN, introduces a gating mechanism to resolve the issue of long-term dependencies.The basic formula of LSTM is as follows: where f t , i t , and o t are the forget gate, input gate, and output gate, respectively, C t is the cell state, σ is the sigmoid function, tanh is the tanh function, * represents elementwise multiplication, and [h t−1 , x t ] denotes the concatenation of h t−1 and x t .While LSTM exhibits commendable performance in certain tasks, it also encounters several issues, such as possessing numerous parameters, high computational complexity, and difficulty in dealing with discontinuous and irregular time-series data.
While RNN and LSTMs have been popular for time-series forecasting, including stock price prediction, they also come with their own sets of challenges.RNNs [29,30], for example, are prone to issues like vanishing and exploding gradients when handling long sequences, making them less effective for capturing long-term dependencies in stock price data.LSTMs, designed to mitigate some of these issues, are computationally expensive and still might require substantial parameter tuning for optimal performance [31,32].Additionally, both RNNs and LSTMs can be sensitive to hyperparameter settings, making them less robust when applied to the highly volatile and noisy nature of stock markets.
In summary, both traditional machine learning methods and deep learning techniques come with their respective advantages and drawbacks.In this work, a new time-series neural network is proposed, integrating Kernel Filter and Time Attention mechanisms, aiming to resolve the issues present in the aforementioned methods when applied to stock price prediction.

Dataset Collection
In this research, a decade of stock data from the S&P 500 index was selected as the experimental dataset.The 10-year span was chosen to offer a comprehensive yet computationally feasible dataset for modeling.This timeframe incorporates various market conditions, including both bull and bear phases, high-and low-volatility periods, and multiple economic cycles.While a decade may not capture the full complexity and cyclical nature of stock markets, it provides a rich set of data that allows for robust modeling and prediction.Additionally, using a decade-long data sample enables the evaluation of the model's performance across a variety of scenarios, thereby enhancing the generalizability of the study's findings.It should be noted that although the 10-year dataset is comprehensive in some respects, the scope of this research could be further expanded in future studies by incorporating a larger and more diverse set of data points.
The S&P 500 index, a stock market index released by the Standard & Poor's Financial Services company, encompasses the largest 500 listed companies in the U.S. market.The choice of the S&P 500 was based on two reasons.First, the S&P 500 index covers a broad range of U.S. stock market sectors, with constituent stocks originating from various industries such as technology, healthcare, finance, consumer, and industry.This extensive coverage implies that the collected dataset is representative and can reflect the overall status of the U.S. stock market.Second, the S&P 500 has a large volume of historical data, spanning a long timeframe.Nearly a decade of stock data was collected, providing ample training samples beneficial for the machine learning model's learning and generalization.
To obtain these data, Python and the BeautifulSoup framework were utilized to develop a web scraper.Python, with its concise syntax, extensive library functions, and wide community support, is extensively used in the field of data science.BeautifulSoup is a Python library that facilitates parsing HTML code from webpages, extracting the required information.In the specific implementation process, the source of data was identified as the Yahoo Finance website.Yahoo Finance offers abundant historical stock data and permits users to download data in CSV format.A web scraper was developed to automatically download historical data of the S&P 500 constituent stocks.In the scraper, the principle of "respecting the Robots protocol" was adhered to, setting a reasonable access interval to avoid imposing unnecessary pressure on the server.The collected data include information such as the opening price, highest price, lowest price, closing price, and trading volume of each trading day.The data span from 2013 to 2023, a total of ten years.This approach to data collection not only provided abundant, high-quality stock data but also ensured the data's timeliness and completeness.

Outlier Identification and Treatment
The first step was outlier identification and treatment.Outliers could potentially have a detrimental effect on model learning, leading to inaccurate prediction results.Hence, identifying and handling these outliers is crucial for ensuring model performance.In this work, the 3σ rule was initially applied as a general guideline for outlier identification, under the assumption of a normal distribution.Given the non-stationary nature of the data, as well as the presence of fat tails and skewness, this method serves as a heuristic rather than an absolute criterion for outlier detection.Under the assumption of normal distribution, any value that diverges from the mean by more than three times the standard deviation is considered an outlier.The specific formula is as follows: Here, N represents the number of samples, x i represents the value of a single sample, and µ and σ are the sample mean and standard deviation, respectively.After identifying potential outliers, the median of the corresponding feature was used for replacement.The median is robust to outlier perturbation and thus serves as a reliable measure for this purpose.In our dataset, the actual number of outliers identified and replaced was 1.38% of the total number of data points.

Missing Value Treatment
In the collected raw data, there might be missing values.Ignoring or simply deleting these missing values could result in information loss, subsequently affecting the model's learning performance.Therefore, these missing values needed treatment.In this study, interpolation was employed to fill in missing values.Specifically, linear interpolation was used, assuming that the data could be linearly expressed at the missing point.The formula is as follows: x miss = x be f ore + x a f ter − x be f ore 2 Here, x miss represents the missing value, while x be f ore and x a f ter are the observed values before and after the missing value, respectively.

Normalization
After treating outliers, we also addressed missing values in the data.In our dataset, a total of 218 missing values were observed across "Open", "High", "Low", "Close", and "Vol" prices.These missing values were handled using normalization.Given that the scales and value ranges of different features may vary, inputting them directly into the model might impact the model's learning performance.Through normalization, the value range of all features could be adjusted to a unified interval, avoiding the model's over-dependence on features with large values.Min-max normalization, also known as linear normalization, was adopted.The formula is as follows: Here, x norm is the normalized value, x is the original value, and x min and x max are the minimum and maximum values of the sample, respectively.By systematically addressing outliers and missing values, and by normalizing the feature scales, the data became more suitable for model learning.This contributes to improving the learning performance of the model, thereby enhancing the accuracy of stock price prediction.

Overall
A novel time-series neural network model, termed the Time-series Neural Network (TNN), is proposed in this research.It is designed to handle multivariate time-series data such as stock market prices and trading volumes.Furthermore, the model possesses the ability to manage textual data, such as news and reports, and even video data, like tasks of behavior tracking.The essence of this model lies in its capability to capture complex patterns in time-series data and efficiently integrate diverse types of data, including linear, textual, and video data.This is made possible by the model's flexibility and scalability, allowing for easy fine-tuning across different tasks.The core of the TNN model is composed of two primary components: the Kernel Filter and Time Attention.
The function of the Kernel Filter is to extract useful features from the time-series data.Comparable to the convolutional layer in Convolutional Neural Networks (CNNs), the Kernel Filter convolutes the input data in the time dimension, extracting local patterns and trends.However, unlike traditional convolutional layers, the Kernel Filter can process multivariate time-series data, with each element having its own convolution kernel to extract features individually.The introduction of Kernel Filter enables the model to capture complex patterns in the data, such as the fluctuation rules of stock prices and the changing trends of trading volumes.These patterns and trends are crucial for predicting future stock price movements.Moreover, as the Kernel Filter can automatically learn these features from data, the burden of manual feature engineering is significantly reduced, easing the model development workload.
The role of Time Attention is to determine the importance of different time points.For time-series data, different time points have varying influences on future predictions.Some time points may have a large impact, while others may have a smaller one.Therefore, a mechanism is required to gauge the importance of each time point, and this is the Time Attention mechanism.Specifically, Time Attention assigns a weight to each time point to indicate its importance.This weight is learned by the model from the data, with larger weights denoting higher importance.During prediction, the model pays more attention to time points with larger weights and ignores those with smaller ones.With Time Attention, the model can effectively distinguish which time points are more important for the prediction results and which are relatively less important.This enables the model to better capture key information when handling complex time-series data, thereby improving prediction accuracy.
In practice, the Kernel Filter and Time Attention work together.Initially, the input time-series data are sent into the Kernel Filter to extract the features.These features are then sent into Time Attention, where they are weighted according to the importance of each time point, resulting in the final prediction result.Through the cooperative work of Kernel Filter and Time Attention, the model can extract key information from complex time-series data and accurately predict future stock price trends.In experiments, remarkable results were achieved on multiple stock datasets, demonstrating the model's effectiveness in stock price prediction tasks.Overall, the proposed TNN model, by integrating Kernel Filter and Time Attention, can effectively handle various types of time-series data, including stock prices, trading volumes, news reports, and even video data.This equips the model with a wide range of application prospects in handling multivariate time-series prediction tasks, such as financial market prediction, weather forecasting, and pedestrian flow prediction.

Time-Series Neural Network
The Time-series Neural Network (TNN) proposed in this study is a deep learning model designed for time-series prediction tasks.The design of this model fully considers the characteristics of time-series data, including time order, continuity, and periodicity, as shown in Figure 1.The network structure of TNN primarily comprises an input layer, hidden layers, and an output layer.The input layer receives raw time-series data, the hidden layers process these data using the Kernel Filter and Time Attention mechanism, and the output layer produces prediction results.In the hidden layers of TNN, a multi-layer design is employed.This is because deep neural networks, through their multi-layer structure, can learn high-level features and abstract patterns from input data, which is beneficial for enhancing the model's predictive performance.Specifically, the Kernel Filter is first used in the hidden layers to extract local patterns from the time-series data.Then, the Time Attention mechanism assigns weights to these patterns.Finally, the weighted features are passed to the next layer for further processing.
TNN has significant distinctions from regular deep neural networks (DNNs) when handling time-series data.Firstly, TNN designs the Kernel Filter and Time Attention operators, which can better handle the characteristics of time-series data, while regular DNNs often overlook these characteristics.Secondly, the number of layers and the network structure of TNN are optimized for time-series prediction tasks, while regular DNNs usually adopt a general network structure, which may not effectively handle time-series prediction tasks.Lastly, TNN can adaptively assign weights for each feature, while regular DNNs generally assume that all features are equally important.Furthermore, since the network structure of TNN is optimized for time-series prediction tasks, TNN can make more effective use of computational resources when handling large-scale time-series data, thereby enhancing prediction efficiency.

Kernel Filter
Kernel Filter represents an operator designed for the extraction of features from timeseries data, with its primary objective being the isolation of key local patterns from the input time-series data.This section proceeds to detail the mathematical principles, design significance, conceptual source, and the specific application and advantages of the Kernel Filter within the Temporal Neural Network (TNN) model.The design of the Kernel Filter originates from the convolution operation within Convolutional Neural Networks (CNNs).In CNNs, the convolution kernel slides along the spatial dimension to extract local features from the input image.Drawing inspiration from this idea, the Kernel Filter was designed to slide along the temporal dimension, thereby extracting local patterns from time-series data.
The design carries several significant implications.Firstly, by applying convolutions along the temporal dimension, local patterns within time-series data can be captured, such as short-term fluctuations in stock prices.Secondly, different kernels can extract diverse features, enabling the model to understand time-series data from multiple perspectives.Finally, the convolution operation possesses the attribute of parameter sharing, implying that the same patterns can be sought throughout the entire time-series, an operation unachievable with traditional fully connected neural networks.
Within the TNN model, the Kernel Filter is responsible for the preliminary extraction of features from the input time-series data.Specifically, the model receives a segment of time-series data as input, which is initially processed by the Kernel Filter, resulting in a set of feature maps {h 1 , h 2 , . . ., h t }.These feature maps not only encapsulate local patterns of the time-series data but also preserve the temporal order of the data, providing abundant information for subsequent Time Attention.Its advantages in the tasks are clear.Firstly, since it can extract local patterns from time-series data, it enables the model to capture short-term fluctuations in stock prices, which is not achievable by traditional fully connected neural networks.Secondly, due to the parameter sharing attribute of the Kernel Filter, the model can seek the same patterns throughout the entire time-series, which is of significant importance for understanding and predicting stock prices.For a series of time-series data {x 1 , x 2 , . . ., x t }, a group of Kernel Filters {k 1 , k 2 , . . ., k n } is defined; each kernel k i is a convolution kernel that can perform convolution on the input data along the temporal dimension, extracting local patterns from it.Each kernel k i consists of a group of weights {w i 1 , w i 2 , . . ., w i d } and a bias term b i , where d is the size of the kernel.The operation of the Kernel Filter can be represented by the following formula: In this formula, h i t represents the output of the ith kernel at time t, and f is the activation function.When using the Kernel Filter, the size d and quantity n of the kernel need to be determined first.The size d of the kernel determines the time range of the patterns that can be captured, and the quantity n determines the diversity of the features that can be extracted.Then, at each time point t, each kernel convolves the input data to obtain the corresponding feature map h i t .Finally, all feature maps are integrated to obtain the final output of the Kernel Filter.Overall, the Kernel Filter extracts local patterns in timeseries data through convolution operations, providing rich information for subsequent time-series prediction.Its design originates from the convolution operation of CNN, inherits the advantages of convolution operation in feature extraction, and overcomes the deficiencies of fully connected neural networks in handling time-series data, which is of great significance for the task.

Time Attention
Time attention is an operator designed for the allocation of feature weights in timeseries data, as shown in Figure 2, with its main goal being to assign different weights to features at different time points, enabling the model to pay more attention to the features that have a significant impact on the prediction results.The mathematical principles, design significance, conceptual source, and specific application and advantages of time attention within the TNN model will be detailed in this section.The design of time attention originates from the idea of the Attention Mechanism.The primary concept of the Attention Mechanism is that when dealing with a task, the model should pay more attention to the information that has a larger impact on the results and pay less attention to information that has a smaller impact.Borrowing from this idea, time attention was designed to enable the model to pay more attention to the time points that have a larger impact on the prediction results when processing time-series data.The design carries several important implications.Firstly, by assigning different weights to the features at each time point, the model can pay more attention to the features that have a larger impact on the prediction results, which is beneficial for improving the model's prediction accuracy.Secondly, by introducing a weight generator, the model can automatically adjust the weights based on the input features, giving the model better adaptability.Lastly, the introduction of Time Attention allows the model to better consider the sequence and continuity of time when processing time-series data, which is not achievable with traditional fully connected neural networks.
In the TNN model, the main task of Time Attention is to assign different weights to the input feature sequence {h 1 , h 2 , . . ., h t }.To achieve this goal, a weight generator is designed which generates a weight α t for the feature h t at each time point t.This weight generator is a small neural network, with the feature h t as input and the weight α t as output.The specific operation of the weight generator can be represented by the following formula: In this formula, g is the weight generator, and σ is a normalization function, which normalizes the generated weights so that the sum of all weights is 1.After obtaining the weight α t , the feature h t can be weighted to obtain the weighted feature ht .The specific operation is as follows: ht = α t • h t (11) Through this approach, distinct weights can be allocated to the features at each time point, enabling the model to focus on the features having a significant impact on the prediction outcomes.When employing Time Attention, the structure and parameters of the weight generator are first determined.Subsequently, the weight generator is utilized to generate a weight for each time point feature.This weight is then used to weight the feature, resulting in a weighted feature.Finally, all the weighted features are integrated to attain the final output of Time Attention.
Time Attention exhibits notable advantages in the tasks at hand.Firstly, by assigning different weights to the features at each time point, the model is better equipped to focus on the features that have a larger impact on the prediction results, contributing to an enhanced prediction accuracy of the model.Secondly, by introducing a weight generator, the model can auto-adjust the weights based on input features, thus granting the model improved adaptability.Finally, due to Time Attention's superior consideration of the order and continuity of time, the model, when dealing with time-series data, possesses clear advantages over traditional fully connected neural networks.Compared to other attention mechanisms, Time Attention embodies important distinctions.To begin with, Time Attention is specifically designed for time-series data, providing a better consideration of the order and continuity of time, often overlooked by other attention mechanisms.Additionally, the weight generator in Time Attention is a small neural network capable of auto-adjusting weights based on the input features, granting Time Attention superior adaptability.In essence, by assigning distinct weights to the features at each time point, Time Attention enables the model to focus more effectively on the features impacting the prediction outcomes, will be discussed in Section 4. The design originates from the concept of the Attention Mechanism, inheriting its advantages in weight distribution while overcoming the shortcomings of traditional attention mechanisms when handling timeseries data.Within the TNN model, Time Attention significantly enhances the model's prediction accuracy and provides superior adaptability.

Experiment Designs
The initial problem addressed in this experimental design is the delineation of training and validation sets.Ordinarily, the entire dataset is divided into training and validation sets according to a specific ratio.The training set is primarily utilized for model training, where parameters are continuously adjusted and optimized through iterative learning of data in the training set.On the other hand, the validation set is mainly employed for evaluating the predictive performance of the model.By utilizing the validation set, the performance of the model on unseen data can be tested, thereby preventing overfitting of the training data.In this work, the split ratio adopted is 70% for the training set and 30% for the validation set.
To comprehensively evaluate the model's performance, some baseline models are chosen for comparison.In this experiment, the following models are selected as baseline models: linear regression model [22], decision tree model [24], random forest model [33], support vector machine model [23], RNN [27], and LSTM [28].These models are common in machine learning, have high practicality and wide applicability, and their predictive capabilities and model complexity can cover a wide range, making them good references to better evaluate the performance of the model.
Subsequently, validation experiments under different time slice spans are carried out.A time slice span refers to the selectable time unit when predicting future data.For example, days, weeks, or months can be chosen as the unit for predicting future data.Different time slice spans can affect the prediction accuracy and range of the model, making the choice of an appropriate time slice span crucial.In this experiment, the set of time slice spans is set to 1 day, 7 days, 30 days.For each time slice span, all the above-mentioned models are used for prediction, and their prediction results are recorded.
The experiments are conducted on a computing platform equipped with an Intel Core i9 processor, 64GB of RAM, and an NVIDIA RTX 3090 GPU.The software environment consisted of Python 3.8, PyTorch 1.9, and various other supporting Python libraries.For the TNN models, we employed three major filters: Squeeze-and-Excitation Networks (SENets), Convolutional Block Attention Module (CBAM), and Kalman Filters based on mentioned libraries.It is essential to note that there are no "official" third-party libraries available for these filters.Therefore, we implemented these algorithms from scratch in PyTorch by carefully referencing their respective official papers [34][35][36].

Evaluation Metric
Model evaluation metrics are important tools for measuring model prediction capabilities, and commonly used metrics include the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Coefficient of Determination (R 2 ).The calculation methods, the significance in this research task, and the model capabilities reflected by these metrics are explained in detail below.
Firstly, the Root Mean Square Error (RMSE) is a common model evaluation metric used to measure the bias between the model's predictions and the actual values.Its formula is as follows: Here, n is the total number of samples, y i is the actual value of the i th sample, and ŷi is the model's prediction for the i th sample.In this work, RMSE can accurately evaluate the accuracy of the model's prediction of time-series data.The smaller the RMSE value, the smaller the bias between the model's predictions and the actual results, indicating a higher prediction accuracy of the model.It is worth noting that RMSE gives more weight to larger errors, so if the model's predictions have large deviations, the RMSE value will increase accordingly.
Secondly, the Mean Absolute Error (MAE) is another evaluation metric for model prediction capabilities.Unlike RMSE, MAE pays more attention to the average bias between the model's predictions and the actual results rather than the variance of the predictions.Its formula is as follows: In this formula, n is the total number of samples, y i is the actual value of the i th sample, and ŷi is the model's prediction for the i th sample.In this research task, using MAE can more intuitively reflect the average deviation between the model's predictions and the actual results.The smaller the MAE value, the smaller the bias between the model's predictions and the actual results, indicating a higher prediction accuracy of the model.Compared to RMSE, MAE gives equal weight to all errors, so when the model's predictions have large deviations, the MAE value will be relatively smaller.
The Mean Absolute Percentage Error (MAPE) is an evaluation metric that provides a percentage-based representation of the errors between predicted and actual values, offering an easy-to-interpret scale of accuracy.The formula for calculating MAPE is as follows: In this formula, n is the total number of samples, y i is the actual value of the i th sample, and ŷi is the model's prediction for the i th sample.The smaller the MAPE value, the higher the model's prediction accuracy.One of the advantages of using MAPE is that it provides an easily interpretable percentage error, enabling the performance of the model to be understood in a straightforward manner.
Finally, the Coefficient of Determination (R 2 ) is an evaluation metric that reflects the correlation between the model's predictions and the actual results.The closer R 2 is to 1, the higher the correlation between the model's predictions and the actual results.Its formula is as follows: In this formula, n is the total number of samples, y i is the actual value of the i th sample, ŷi is the model's prediction for the i th sample, and ȳ is the mean of the actual values.In this research task, R 2 can be used to evaluate the correlation between the model's predictions and the actual results.The higher the R 2 value, the higher the correlation between the model's predictions and the actual results, indicating a higher prediction accuracy of the model.
In summary, through the above model evaluation metrics, the predictive capabilities of the model can be comprehensively evaluated from different angles.RMSE focuses more on the variance of the model's predictions, MAE focuses more on the average bias between the model's predictions and the actual results, MAPE offers a percentage-based representation of the model's prediction errors, providing an easily interpretable scale for assessing the model's accuracy, and R 2 evaluates the correlation between the model's predictions and the actual results.These metrics are important tools for measuring model prediction capabilities and can effectively help in understanding and improving the predictive performance of the model.

Results of Time Series Forecasting
The primary objective of this experiment is to conduct stock price prediction using various machine learning models and compare their predictive performance.To realize this goal, different time spans (1 day, 7 days, 30 days) were set, and seven distinct models, including TNN, linear regression, decision tree, random forest, SVM, RNN, and LSTM, were utilized.The predictive performance of each model across all time spans was assessed by computing the RMSE, MAE, and coefficient of determination (R 2 ).
From the results tabulated in Table 1, it is observable that the TNN model performed the best across all time spans, having the lowest RMSE and MAE, and the highest R 2 , which is also shown in Figure 3. Upon further discussion, the sectors with the highest prediction accuracy in the S&P 500 are Technology, Healthcare, Financials, and Communication Services, followed by Basic Materials and Consumer Staples.The sectors with the lowest accuracy are Consumer Discretionary, Industrials, and Energy.This disparity in predictive accuracy across sectors could potentially be attributed to the intrinsic volatility and complexity of certain sectors compared to others.Specifically, sectors like Consumer Discretionary, Industrials, and Energy tend to be more volatile and are influenced by a wide array of external factors, making them harder to accurately predict.This indicates that the TNN model exhibited a high degree of precision in stock price prediction, with minimal deviation from the actual results, and a high correlation between the predicted and actual outcomes.This can potentially be attributed to the strong feature extraction capability of the TNN model, which can extract useful information from raw data, thereby enhancing the predictive accuracy of the model.In the case of the other models, it was noted that as the time span increased, the performance of the models generally decreased, i.e., RMSE and MAE increased while R 2 decreased.This could be due to the increase in future uncertainties with the extension of the time span, leading to a drop in predictive performance.However, the RNN and LSTM models demonstrated a lesser degree of performance reduction with increased time spans compared to other models.This could be attributed to the fact that both RNN and LSTM models are sequence models capable of handling time-series data, thus outperforming other models in long-term trend prediction.A deeper analysis of the experimental results from the characteristics and mathematical theories of each model was subsequently undertaken.Firstly, the Linear Regression model exhibited the poorest predictive performance among all models.This is likely because stock price fluctuations are influenced by numerous factors, including macroeconomic conditions, company financials, and market sentiment, among others.The complex and non-linear relationships between these factors make it challenging for linear models to accurately capture these relationships.The Decision Tree and Random Forest models were next in the analysis, both being tree-structured models with excellent interpretability.However, their predictive performance left room for improvement.This might be due to the fact that while they can handle non-linear relationships, they struggle with time-series data as they cannot capture time dependency within the data.SVM, a model based on margin maximization, demonstrated better predictive performance than Linear Regression, Decision Tree, and Random Forest.However, its performance still lagged behind RNN, LSTM, and TNN.This might be because while SVM can handle non-linear problems, it may struggle with the "curse of dimensionality" when dealing with high-dimensional, complex time-series data.Lastly, the RNN and LSTM models, both types of recurrent neural networks, are especially adept at handling time-series data.RNN and LSTM can capture time dependency in data, hence performing better in stock price prediction than other models.Particularly, LSTM, due to its inherent design advantages, can overcome the gradient vanishing and exploding problems faced by RNN, thereby demonstrating slightly superior predictive performance.

Test on Different Attention Mechanisms
The objective of this experimental design was to evaluate the performance of various attention mechanisms in predicting the S&P 500 index prices.The aim is to understand the effectiveness and applicability of different attention mechanisms specifically in the domain of stock price forecasting.
Focusing on the experimental results in Table 2, it is clear that the Time Attention model outperformed the other attention mechanisms across all metrics.This adds empirical evidence to the model's theoretical advantages in capturing time-dependent features.The TNN model, which incorporates the Time Attention mechanism, yielded the lowest RMSE and MAE values and the highest R 2 score, demonstrating its superior predictive accuracy and stability in stock price prediction.Specifically, SENet and CBAM, despite being effective attention mechanisms, fell short in capturing temporal complexities inherent in stock market data.This is reflected in their lower R 2 values compared to the Time Attention model.While SVM, RNN, and LSTM also performed better than traditional machine learning models like Linear Regression and Decision Trees, they were still outperformed by the TNN model.Interestingly, the data suggest that the ability to capture time-dependent features effectively separates TNN from other models, justifying its superior performance in our experimental setup.In summary, the comparative evaluation of different models based on our dataset provides concrete evidence that attention mechanisms, particularly Time Attention, can significantly enhance the accuracy and reliability of stock price predictions.This suggests that future research in this area could benefit substantially from the integration of timesensitive attention mechanisms.

Test on Kernel Filter
The principal objective of this experiment is to scrutinize the effectiveness of different filters applied within the TNN, and thus underscore the significance of the Kernel Filter in enhancing the precision of time-series forecasting.To be specific, three distinct model configurations were compared in the experiment: a TNN model devoid of filters, a TNN model equipped with a Kalman Filter [36], and a TNN model furnished with a Kernel Filter.
As illustrated in Table 3, the TNN model with no filter yielded RMSE, MAE, and R 2 values of 0.26, 0.51, and 0.82, respectively.These findings indicate that the TNN model can produce satisfactory predictions even without any filter, primarily due to the inherent advantages of the TNN model structure.This model is capable of attributing different weights to features at each timestamp, thereby enabling a focused approach towards features with significant impacts on the predictions.However, in the absence of a filter, the model may encounter challenges when handling complex, noisy time-series data, leading to a potential compromise in the predictive performance.In contrast, the TNN model employing the Kalman Filter demonstrated RMSE, MAE, and R 2 values of 0.12, 0.34, and 0.91, respectively, revealing a marked enhancement in predictive performance with the inclusion of the Kalman Filter.The Kalman Filter, characterized as a linear filter, can accurately estimate system states amidst noisy data, consequently boosting the model's predictive accuracy to a certain extent.However, being linear, the Kalman Filter might fall short when faced with complex non-linear time-series data.Lastly, the TNN model incorporating the Kernel Filter exhibited RMSE, MAE, and R 2 values of 0.05, 0.23, and 0.95, respectively.Evidently, the introduction of the Kernel Filter further improved the predictive performance of the TNN model, outperforming the other two model configurations across all metrics.This superior performance primarily results from the impressive capabilities of the Kernel Filter.Compared to the Kalman Filter, the Kernel Filter can handle not only noisy data but also non-linear time-series data effectively.Its proficiency in extracting higher-level features from time-series data surpasses that of the Kalman Filter, as shown in Figure 4. Therefore, the introduction of the Kernel Filter enables the TNN model to achieve higher prediction accuracy when dealing with complex time-series data.

Limitations and Future Work
Despite the progress and achievements made in this research, the model does have some limitations that suggest avenues for future work.For example, while the model performs well on certain datasets, it can struggle with time-series data containing substantial noise.Additionally, its treatment of time slice spans lacks flexibility, which is an issue for varying real-world applications.Moreover, the model's current capability does not fully address the intercorrelation between features or offer a comprehensive set of evaluation metrics.These limitations outline the areas where improvements are needed:

1.
Noise Handling: To mitigate the issue of noise, future research will incorporate various noise filtering and denoising techniques like Kalman filters and median filters.2.
Time Slice Flexibility: Future versions of the model will aim to allow dynamic adjustments of time slice spans to meet different application requirements, thereby increasing the model's adaptability.

3.
Feature Correlations: Efforts will be undertaken to better uncover and incorporate the intercorrelations among features, with the goal of improving prediction accuracy.4.
Evaluation Criteria: Additional metrics such as stability, robustness, and computational efficiency will be introduced to provide a more rounded evaluation of the model's performance.

5.
Extended Capabilities: The TNN model will be further developed to capture not just time dependencies but also more complex relationships dependent on the history of changes in incoming parameters, making it more versatile for different types of time-series prediction tasks.
Through targeted advancements in these areas, the model is expected to become more robust and versatile, better serving the complex and varying demands of practical applications.

Conclusions
The theme of this research is centered on a high-accuracy time-series forecasting method known as the TNN, which is based on a Kernel Filter and Time Attention mechanism.Forecasting analysis of time-series data is a crucial task in various domains.Nevertheless, high-precision time-series forecasting remains a challenge due to inherent complexities such as non-linearity, high dimensionality, and long-term dependencies.To overcome these challenges, a novel Time Neural Network model has been designed and implemented in this study.
The major innovation of the TNN model involves the introduction of a Time Attention mechanism and a Kernel Filter.The Time Attention mechanism allows the model to allocate different weights to the features at each time point, enabling the model to focus more on features that have a significant impact on the forecasting results.Meanwhile, the Kernel Filter is used to extract high-level features from time-series data, thereby improving the prediction accuracy of the model.In addition, an adaptive weight generator is incorporated into the model, allowing it to automatically adjust the weights according to the input features.
In the experimental section, several mainstream time-series forecasting models, including RNN and LSTM, were adopted as baseline models, and exhaustive comparative experiments were conducted.The results demonstrate that the TNN model significantly outperforms the baseline models, regardless of whether the forecasting tasks are short-term or long-term.Importantly, even for complex time-series data containing a large amount of noise, the TNN model is still capable of maintaining high prediction accuracy.Ablation experiments validated the crucial contribution of the Time Attention mechanism and Kernel Filter to the performance of the model.When either the Time Attention mechanism or Kernel Filter is removed, a significant decline in the predictive performance of the model is evident, further underscoring the importance of these two components in the model.
Despite the excellent performance of the TNN model in the experiments, certain limitations remain.These include the need for enhanced noise data processing capabilities, flexibility in dealing with different time slice spans, and comprehensive handling of the interrelatedness among features.Future work will focus on improving and deepening the approach to address these issues.
In conclusion, the TNN model proposed in this study provides a novel solution for time-series forecasting.By incorporating a Time Attention mechanism and Kernel Filter, the model demonstrates superior forecasting performance and adaptability when dealing with complex time-series data.Despite some existing limitations, it is believed that through future improvements and in-depth exploration, the TNN model can play a greater role in the field of time-series forecasting, offering more accurate and reliable predictions for real-world problem solving.

Figure 1 .
Figure 1.Illustration of the time-series neural network structure.

Figure 2 .
Figure 2. Illustration of time attention structure.

Figure 3 .
Figure 3. Ground truth and the predicted values by TNN.

Figure 4 .
Figure 4. Comparison of different filters on RNN, LSTM, Transformer[37], and ours.The orange line denotes the performance for different models with Kernel Filter, while the blue one is that without Kernel Filter.

Table 1 .
Results of S&P 500 price in different periods.

Table 2 .
Results of different attention mechanisms.

Table 3 .
Results of different filters.