Previous Article in Journal
Commodities from Amazon Biome: A Guide to Choosing Sustainable Paths
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Oil Commodity Movement Estimation: Analysis with Gaussian Process and Data Science

by
Mulue Gebreslasie
1 and
Indranil SenGupta
2,3,*
1
Department of Mathematics, North Dakota State University, Fargo, ND 58108, USA
2
Department of Mathematics and Statistics, Hunter College, City University of New York (CUNY), New York City, NY 10065, USA
3
Environmental Sciences Initiative, Advanced Science Research Center–CUNY, New York City, NY 10031, USA
*
Author to whom correspondence should be addressed.
Commodities 2025, 4(2), 9; https://doi.org/10.3390/commodities4020009
Submission received: 20 May 2025 / Revised: 6 June 2025 / Accepted: 10 June 2025 / Published: 12 June 2025

Abstract

:
In this study, Gaussian process (GP) regression is used to normalize observed commodity data and produce predictions at densely interpolated time intervals. The methodology is applied to an empirical oil price dataset. A Gaussian kernel with data-dependent initialization is used to calculate prediction means and confidence intervals. This approach generates synthetic data points from the denoised dataset to improve prediction accuracy. From this augmented larger dataset, a procedure is developed for estimating an upcoming crash-like behavior of the commodity price. Finally, multiple data-science-driven algorithms are used to demonstrate how data densification using GP regression improves the detection of forthcoming large fluctuations in a particular commodity dataset.

1. Introduction

The goal of this paper is to forecast impending large movements in a commodity price dataset. It is evident from the literature that commodity markets, like the crude oil market, may not have received as much attention as stock markets in some areas of financial study, even though the gap is narrowing. Although there is research on the connections between commodity and equity markets, most of it focuses on developed equity market indices, which give commodity markets little consideration. Studies frequently draw attention to volatility spillovers from stock markets to commodity markets, indicating a better grasp of how stock markets affect commodities than the other way around.
In this paper, we offer an approach to denoising the commodity data. This process will generate synthetic data points that will augment the original data points to create a larger dataset. With the augmented dataset, a threshold-based analysis is implemented to predict a significant future fluctuation. This is applied to the futures contract for the West Texas Intermediate (WTI) Light Sweet Crude Oil on the CME Globex futures exchange. This data is commonly denoted as “CL=F”.
The matched field processing (MFP) method for identifying sound-emitting sources in the ocean serves as the driving force behind the analysis in this paper. For locating a source in the water, MFP is frequently utilized (see [1]). The standard beamformer, sometimes referred to as the Bartlett or linear processor, is the most fundamental MFP system. It examines the squared moduli of inner products between the acoustic input and normalized replica fields.
Numerous studies in the literature offer the MFP technique based on Gaussian processes (GPs) (see, for example, [2,3,4,5,6,7,8,9]). In order to forecast acoustic fields, GPs are frequently employed to interpolate direct observations of a function. However, a crucial difficulty is to select the correlation kernel between the observation points. The implementation of the Gaussian kernel, often referred to as the radial basis function kernel (see [5]), demonstrates that it reduces noise effects on damaged data and improves localization performance compared to a conventional Bartlett correlation.
Crude oil is a commodity of basic importance. A study of the dynamics of the time series of crude oil prices is of great importance. This makes it possible to determine how its shocks can affect other economies as well as other financial assets. The initial segment of this paper develops a GP-based method to analyze the daily recorded CL=F crude oil price data. The two goals are to denoise empirical commodity data and produce synthetic data points to improve forecast accuracy. In the latter segment of this paper, we use this enriched collection of synthetic data to anticipate a substantial future fluctuation in the CL=F crude oil price data using various data-science-driven methods.
In the pioneering work [10], machine learning methods are implemented to analyze commodity data. The dataset in that work is the West Texas Intermediate (WTI or NYMEX) crude oil prices dataset for the period 1 June 2009 to 30 May 2019. The paper proposes a simple method for upgrading the classical Barndorff-Nielsen and Shephard (BN-S) stochastic volatility model using machine learning algorithms. This modified BN-S model is more efficient and requires fewer parameters than other models used in practice as upgrades to the BN-S model. The approach and model demonstrate how data science can extract a “deterministic component” from a typical commodity dataset. Empirical applications confirm the effectiveness of the suggested model for long-range dependence. Similar analysis is explored in [11].
In more recent literature, stochastic models and data-science-driven techniques have been implemented to enhance the performance of a commodity market analysis. For example, the paper [12] provides a stochastic model and calculates the Value at Risk (VaR) for a diversified portfolio made up of numerous cash commodity positions driven by standard Brownian motion and jump processes. Following that, a detailed analytical estimation of the VaR is performed for the suggested model. The findings are then applied to two distinct commodities, corn and soybean, allowing for a detailed comparison of VaR values in the presence and absence of “jumps”. The paper [13] presents a general model for the dynamics of soybean export market share. Over an 8-year period, weekly time series data are studied to train, validate, and forecast US Gulf soybean market shares to China using machine and neural networks. In [14] an improved BN-S model is used to determine the best hedging strategy for commodities. The BN-S model is refined using a variety of data-science-driven algorithms. The methodology is applied to Bakken crude oil data. In [15] a generic mathematical model is developed for assessing yield data. The data used in that paper are from a typical cornfield in the upper midwestern United States. The statistical moment expressions are developed from the stochastic model. As a result, it is shown how a certain feature variable influences the statistical moments of the yield. The data is also analyzed using neural network techniques. Finally, the works [16,17] further enhance these studies by refining the underlying stochastic model through a sequential hypothesis testing approach driven by machine learning. The results are implemented in oil price analysis.
In this paper, with the aid of GPs, we conduct a data-science-driven analysis of the augmented set of CL=F crude oil price data. This study provides a novel way to generate synthetic data for a commodity market and, based on the augmented dataset, analyzes the dataset to predict a big upcoming fluctuation. The rest of the paper is structured as follows: Section 2 provides a detailed description of Gaussian process (GP) regression methods applied to commodity price data for denoising the empirical data and generating synthetic data points, including hyperparameter initialization and tuning. Section 3 presents a procedure for developing a crash indicator using the standard deviation of the daily percentage change for the GP-denoised CL=F crude oil price data. Section 4 presents the supervised learning outcomes from the dataframe generated in Section 3, providing classifier evaluations along with numerical examples, tables, and figures. These analyses are further illustrated using empirical CL=F crude oil price data. Finally, a brief conclusion is provided in Section 5.

2. Denoising and Synthetic Oil Data

2.1. Gaussian Process (GP) Regression

Given some data points, for a parametric model, we can predict the function value f ( x ) for a new specific x, where the observed value is modeled as y = f ( x ) + ϵ , with ϵ representing noise. Suppose that we have the training dataset D comprising n observed points, D = { ( x i , y i ) i = 1 , , n } , to train the model. After the training process, all the information in the dataset is assumed to be encapsulated by the feature parameters β ; thus, predictions are independent of the training dataset D. This can be expressed as P ( f * x * , β , D ) = P ( f * x * , β ) , in which f * are predictions made at unobserved data points x * .
Gaussian process regression falls under a type of nonlinear regression and classification model that is both generic and adaptable. This is used as a primary tool for our work. For details, see [7,18,19]. A Gaussian process is a stochastic process for which any finite collection of random variables has a multivariate normal (Gaussian) distribution. That is, ( f ( x 1 ) , f ( x 2 ) , , f ( x n ) ) N ( m ( x ) , k ( x , x ) ) , for every finite n. We write f ( x ) GP to denote that the function f is a Gaussian process.
A Gaussian process is entirely specified by its mean and covariance function. Those are given by
m ( x ) = E [ f ( x ) ] ,
and
k ( x , x ) = E [ ( f ( x ) m ( x ) ) ( f ( x ) m ( x ) ) ] .
Thus, we can write f ( x ) GP ( m ( x ) , k ( x , x ) ) . For simplicity of notation and calculation, we assume m ( x ) = 0 , and hence,
f ( x ) GP ( 0 , k ( x , x ) ) .
For our analysis, we consider the closing price of a commodity (such as oil) as a noisy set of data. Suppose that we have a training dataset D = { ( t i , y i ) } i = 1 n , where t i represents the time in days when we evaluate the commodity price y i at the closing time. The observations y i are assumed to be noisy, and they are modeled as y = f ( t ) + ϵ , such that y i = f ( t i ) + ϵ i , where ϵ i N ( 0 , σ n 2 ) are independent and identically distributed (i.i.d.) Gaussian noise terms with noise variance σ n 2 . Since we do not observe f ( t ) directly, we assume a Gaussian process prior distribution for f ( t ) with zero mean f ( · ) GP ( 0 , K ) , where K is a covariance matrix with entries k ( t i , t j ) .
The objective in GP regression is to estimate the underlying function f ( t ) by leveraging the assumed correlations among observations, effectively denoising the observed data y . For our analysis, the objective is to compute the posterior density at a denser set of time points.
Once we have the noisy closing price data y , the covariance function of y is defined with entries as
cov ( y i , y j ) = k ( t i , t j ) + σ n 2 δ i j ,
where cov represents the covariance values between different time points, incorporating the effect of noise on the data; δ i j is the Dirac delta function, which equals 1 when i = j and 0 otherwise; and k ( t i , t j ) is the Gaussian kernel:
k ( t i , t j ) = σ f 2 exp 1 2 2 ( t i t j ) 2 .
The hyperparameters σ f (signal variance) and (length scale) determine the vertical and horizontal variations of the functions. Consequently, cov ( y ) = K + σ n 2 I , and the probability density function for y under prior (where X = [ t 1 , , t n ] T ) is
p ( y | X ) = N ( 0 , K + σ n 2 I ) .
In this paper, we consider 10 years of daily recorded CL=F crude oil price data (15 April 2015 to 15 April 2025). We employ GP regression to interpolate denser time points. With GP regression, from intraday data, we construct data for finer time steps with a separation of 1 4 days. With this, we obtain (augmented) 10,057 time points, while maintaining the same start and end dates as the original dataset of 2515 time points.

2.2. The Mathematical Construction

For the analysis we let
X = t 1 t n , y = y 1 y n , f = f 1 f n , and ϵ = ϵ 1 ϵ n .
Define the test points
X * = t * 1 t * N and f * = f * 1 f * N .
The joint distribution of the observed data y and the unobserved function f * given X and X * under the prior is
y f * N 0 0 , K ( X , X ) + σ n 2 I K ( X , X * ) K ( X , X * ) T K ( X * , X * ) .
We provide a couple of results (see [20] for details) that are used in this paper. The first result is related to the marginalization and conditional distributions, and the second result is related to the Gaussian process regression (GPR) predictions and uncertainty.
Theorem 1. 
Let y be a multivariate normal such that  y = y 1 y 2 N ( μ , Σ ) , with the mean vector μ and covariance matrix Σ:
μ = μ 1 μ 2 , and Σ = Σ 11 Σ 12 Σ 21 Σ 22 , with Σ 1 = Λ = Λ 11 Λ 12 Λ 21 Λ 22 .
Then we have the posterior conditional distribution of y 2 given that y 1 is also normal, i.e., p ( y 2 | y 1 ) N ( y 2 ; μ 2 | 1 , Σ 2 | 1 ) , where Σ 2 | 1 = Σ 22 Σ 21 Σ 11 1 Σ 12 and μ 2 | 1 = μ 2 + Σ 21 Σ 22 1 ( y 1 μ 1 ) .
Theorem 2. 
Let f ( x ) GP ( m ( x ) , k ( x , x ) ) be a Gaussian process with mean function m ( x ) and covariance function k ( x , x ) . Given the training data D = { ( x i , y i ) } i = 1 n and testing points X * = { x 1 , , x m } , the predictive distribution of f * at the test point is Gaussian with mean f ¯ * = K * T ( K + σ n 2 I ) 1 y and covariance Cov ( f * ) = K K * T ( K + σ n 2 I ) 1 K * , where K = K ( X , X ) ( n × n -dimensional matrix), K * = K ( X , X * ) ( n × m dimensional matrix), and K = K ( X * , X * ) ( m × m dimensional matrix).
In particular, the expressions for f ¯ * and Cov ( f * ) are given by
f ¯ * = E [ f * | X * , y , X ] = K ( X * , X ) [ K ( X , X ) + σ n 2 I ] 1 y ,
Cov ( f * ) = K ( X * , X * ) K ( X * , X ) [ K ( X , X ) + σ n 2 I ] 1 K ( X , X * ) .
In this paper, we focus on the squared exponential kernel function. Hence, the kernel’s parameters σ f and σ l , along with the noise variance σ n , are unknown hyperparameters. To compute Equations (8) and (9), it is crucial to obtain optimized hyperparameters. This can be achieved by maximizing the marginal likelihood (or equivalently, by minimizing the negative marginal likelihood) (see [7]), as follows:
θ * = arg min θ log P ( y | X , θ ) ,
where θ R 3 , θ = { σ f , σ l , σ n } , and
log P ( y | X , θ ) = 1 2 y T ( K + σ n 2 I ) 1 y + 1 2 log | K + σ n 2 I | + n 2 log 2 π .
Hyperparameter initialization specifically for the Gaussian kernel is introduced in [21]. We adopt this deterministic hyperparameter initialization method. Unlike conventional approaches that rely on random initialization or meta-learning-based strategies, this method directly utilizes the dataset to establish initial values for hyperparameters. Before initialization, the input data X and X * , as well as the output values y, are normalized using their respective means and standard deviations:
x j i x j i μ j σ j , x * j i x * j i μ j σ j , y i y i μ y σ y
For the initialization of hyperparameters, a sorting step is typically required. However, in our work, since we have a single input variable that is already in ascending order, sorting does not introduce any changes. This remains true even when the normalized values are used, as they preserve the original order. Hence, the length scale initialization η is computed as the mean absolute difference between adjacent training data points:
η ini = 1 n i = 1 n 1 | x ( j ) i x j i + 1 | .
The signal variance α and noise variance σ are initialized based on the sorted response values:
α ini = 1 n i = 1 n 1 | y i y i + 1 | ,
and
σ ini = γ α ini + δ min ( d y ) , γ , δ ( 0 , 1 ) ,
where γ and δ are small positive constants with non-zero values. We use the same values for γ = δ = 0.5 as in [21].

2.3. Experimental Evaluation

We begin our analysis using one year (15 April 2024 to 15 April 2025) of empirical data for CL=F, with summary metrics presented in Table 1, showing some time series characteristics of the crude oil price. This table shows that during that year, the daily average and median price changes are slightly negative. In addition to that, the minimum daily price change is (in magnitude) bigger than the maximum daily price change.
We first implement the analysis of one year of daily CL=F data (15 April 2024 to 15 April 2025) to assess how the optimized hyperparameters improve our visualizations. There are 252 data points for 1 year. After that, we extend the analysis to the proposed 10 years of CL=F data. The optimal normalized and denormalized hyperparameters (following the stated initialization procedure) are listed in Table 2.
The data-dependent initialization improved the log marginal likelihood of the predicted closing price from 1243.78768 to 182.99880 , reflecting a better model fit, as a smaller-magnitude negative value (closer to 0) means the model better explains the observed data. The tuned length scale of 3.704892 suggests that the commodity price correlations decay over about 3 trading days, indicating short-term trends. For commodity price analysis, a high σ f 2 (signal variance) indicates significant volatility in commodity price movements, allowing the model to capture large price swings. A high σ n 2 (noise variance) suggests considerable observation noise, likely from market fluctuations, leading to a smoother predictive function with increased uncertainty. Together, these high values reflect a volatile and noisy commodity price dataset, requiring careful tuning to balance trend capture and noise filtering as in Table 2. A similar interpretation holds for the other results as well.
Once the optimal hyperparameters are obtained, we compute the predicted mean and the uncertainty region with (8) and (9). Figure 1 provides a visualization of the observed data, predicted mean, and uncertainty.
We compute the predicted mean for the closing price and the shaded uncertainty region and obtain Figure 1. More precisely, we obtain N = 1005 interpolated closing commodity price data points, which is four times the original 252 closing commodity price data points. The predicted mean obtained by the non-parametric model GP regression follows the general pattern of the noise-free data (i.e., true data) at the dense data time point. This analysis agrees with [5].

3. Daily Change and Threshold Calculations

After obtaining the predicted closing price data for one year of crude oil in the augmented larger dataset, we compute the daily change and its percentage change. These values are added to the original dataframe. We then define a crash indicator using the standard deviation of the daily percentage change. Specifically, we set a threshold
K S = 1.5 × standard deviation ( daily percentage change ) .
If the daily percentage change is less than K S , we assign a value of 1 in the crash column; otherwise, we assign 0. This criterion helps us identify whether a significant drop in oil prices is likely in the upcoming days. In Table 3 and Table 4, the Crash column is shown using a threshold value of K S = 0.3874 and 2.9089 , respectively.
We consider a 1-year crude oil dataset as described in Section 2. We create a dataframe with 10 features for the predicted close price of crude oil data, and a similar procedure is applied to the original crude oil data as follows: Denote the daily change prices as p 1 , p 2 , p 3 , , p 1004 , p 1005 . Then, the new dataframe created will have 986 rows, where each row (consisting of 10 “features”) looks like the following:
First row : p 1 , p 2 , , p 10 Second row : p 2 , p 3 , , p 11 Third row : p 3 , p 4 , , p 12 Last row : p 986 , p 987 , , p 995 .
Once we create the dataframe, then we create a “target” variable Theta ( θ ) that takes a value—either 1 or 0. This target variable will depend on the values of the next 10 days. Those are used to label each sample observation (i.e., each feature row as designed in the above) based on the daily change percent calculation. For instance, for the first row (i.e., p 1 , p 2 , , p 10 ), the target variable will be determined by whether it has a crash or not (as detailed below) on p 10 , p 11 , , p 19 , etc.
Corresponding to the daily change prices p 1 , p 2 , p 3 , , p 1004 , p 1005 , we calculate the crash based on the daily percent change. To assign values to the target column of the dataframe, we count the number of crash values that are equal to 1 for the next 10 days. For example, to determine the target label for the row p 1 , , p 10 , we first count the crash values = 1, for the next 10 days, p 11 , , p 20 . If this crash count is strictly greater than one (i.e., the number of crash values=1 for the next 10 days is 2 ), we assign a value of 1 to the target column for the first row. Otherwise, we assign 0. Similarly, for the row p 2 , , p 11 , we conduct the crash count of the next 10 days, p 12 , , p 21 , and if the crash count (for crash = 1) is greater than 1, we set the target value to 1; otherwise, we set it to 0. We apply the same procedure for the rest of the rows, labeling the target column with 1s and 0s for each row of the dataframe accordingly.
Finally, we apply machine learning algorithms where our sample observations comprise the daily change prices. For the analysis, we run the various classification algorithms. We provide the classification report and confusion matrix results in the next section.

4. Data Analysis Results

4.1. Data Analysis Procedures

We implement various supervised learning algorithms for dataset classification. To run supervised learning algorithms, we use the dataframes in Table 5 (augmented dataset with synthetic data) and Table 6 (original data), where we divide the dataset using the 60%–20%–20% split for training, validation, and testing, respectively. The data consists of 986 samples (or, rows, as described in Section 3) from 1-year data for crude oil. In the training dataset, there is a data imbalance with the class labels. There are 503 samples classified as label 0, while the remaining 88 samples are classified as label 1. To address the data imbalance, we implement an oversampling technique on the training dataset. This means increasing the number of instances in the minority class (i.e., class 1) to match the number of instances in the majority class (i.e., class 0). Essentially, oversampling works by repeatedly sampling from the minority class to increase its size, ensuring that it balances with the majority class in the dataset.
Then, we run the various algorithms like Logistic Regression (LR), K Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), Decision Tree (DT), Deep Learning (DL), LSTM, and the LSTM Method with Batch Normalizer (BN), and the table results of the performance metrics follow in Section 4.2.

4.2. Performance Metrics of Machine Learning Models

4.2.1. Confusion Matrix and Performance Metrics

The following quantities are well-known:
  • True Positive (TP): The number of positive instances correctly predicted as positive.
  • True Negative (TN): The number of negative instances correctly predicted as negative.
  • False Positive (FP): The number of negative instances incorrectly predicted as positive.
  • False Negative (FN): The number of positive instances incorrectly predicted as negative.
For a binary classification problem with two class labels, positive (P) and negative (N), the confusion matrix is organized as follows:
Actual Predicted P N P T P F N N F P T N
Before performing any analysis, it is important to define evaluation metrics to assess the quality of predictions. They provide different perspectives on the model’s performance:
  • Recall (Sensitivity or True Positive Rate): This measures how well the model identifies actual positive instances. It is defined as
    Recall = T P T P + F N .
  • Precision (Positive Predictive Value): This measures how accurate the positive predictions are. It is defined as
    Precision = T P T P + F P .
  • Accuracy: This measures the overall correctness of the model. It is defined as
    Accuracy = T P + T N T P + F P + T N + F N .
Defining these metrics upfront is crucial for our machine learning analysis, enabling us to understand and select the most suitable models when working with both the original and synthetic data. In our analysis, we use classes 1 (big fluctuation in commodity) and 0 (regular fluctuation in commodity), as described earlier.

4.2.2. Crude Oil (CL=F) Data for One Year (from 15 April 2024 to 15 April 2025)

After applying oversampling to address the data imbalance (503 samples for label 0 and 88 for label 1), the machine learning models show improved performance on the minority class when we compare them with the original data. We provide the results in Table 7, Table 8, Table 9 and Table 10. The results in Table 7 and Table 8 are for the augmented dataset.
For the original dataset (with no augmentation with synthetic data), there are 119 samples labeled as class 0 and 20 as class 1. After applying the oversampling technique, the results are provided in Table 9 and Table 10.
Oversampling also improves the neural networks’ performance on the minority class (label 1). The results displayed in Table 7 and Table 8 are better than the results in Table 9 and Table 10.
From the above tables, we observe the superiority in performance of both traditional machine learning (ML) and neural network (NN) models for the 1-year predicted data over the 1-year original data. This can be attributed to two major factors. First, the GP regression process smooths out the inherent noise and volatility present in commodity time series such as our crude oil price data. This smoothing reduces irregular fluctuations and high-frequency components, which allows models to learn more coherent and separable decision boundaries. Second, the predicted dataset shows lower variability in model performance because all models are trained on a more regular and noise-filtered version of the underlying dynamics. On the other hand, the original dataset contains abrupt jumps and microstructure noise, which can negatively affect learning and may cause overfitting or underfitting depending on the model.
As a result, there is a greater spread in model accuracy for the original data, reflecting differences in how each model responds to noise. For example, in the original dataset, accuracy ranges from 0.43 for KNN and 0.53 for the Decision Tree, both of which are sensitive to local noise, to 0.79 for Random Forest and 0.83 for SVM, which are more robust and able to capture complex patterns (i.e., nonlinear relationships). In contrast, the predicted dataset results in much more consistent performance, with accuracy variation limited to only 0.05 for ML models and 0.01 for NN models. This is because smoother and less noisy data allow all models, including simpler ones, to perform effectively.

4.2.3. Crude Oil (CL=F) Data for 10 Years (from 15 April 2015 to 15 April 2025)

We now apply machine learning techniques to 10 years of CL=F data. The original dataset, augmented with quarterly intervals, while maintaining the original starting and ending time points, will be shown to improve prediction accuracy.
The data-dependent initialization improved the log marginal likelihood from 113541.1936 to 888.0075 , reflecting a better model fit. This is described in Table 11. The tuned length scale of 7.448797 suggests oil price correlations decay over about 7 trading days. Table 12 presents the predicted closing prices and other features for 10 years. Table 13 provides the results for the predicted daily percentage change of crude oil (window size = 10) and the target labels for crash detection. On the other hand, Table 14 provides original crude oil price data (without synthetic date), daily change, percentage change, and crash labels, and Table 15 shows the original crude oil daily percentage change features (window size = 10) and crash target labels.
The column “Crash” in Table 12 and Table 14 is calculated with the threshold K S = 0.42714 and 7.21208 considered one standard deviation of the daily percent change. As a result, the dataframes in Table 13 and Table 15 expand to dimensions 10 , 038 × 11 and 2496 × 11 , respectively (including the target variable column). Figure 2 demonstrates the predicted mean along with its uncertainty region. The results obtained during this 10-year period are also promising, as shown in Table 16 and Table 17.
Finally, in the following analysis, we apply the aforementioned machine learning classifiers to the original CL=F 10-year commodity price data. By comparing the classifiers’ accuracy results, we clearly observe that the denoised data computed at dense time points yield the highest accuracy. The methods previously employed for the synthetic data at dense time points are similarly applied here. We construct the dataframe in Table 15, indicating 2496 samples, each with 10 features and one target, consistent with our synthetic data procedure. Upon calculating the threshold we obtain KS = 7.212088. Subsequently, we apply the machine learning classifiers to evaluate the performance improvements achieved using densely sampled data points. For this dataset, we have 119 training samples labeled as class 0 and 20 as class 1. The same oversampling techniques used for the densified data were applied here as well. The results are summarized in Table 18 and Table 19.
Many of the classifiers fail to detect any crash signals on the original data with fewer points (giving zero precision, recall, and F1-scores) for θ = 1 , as we can see in Table 18 and Table 19. We compare the classifiers obtained from the densed time points in Table 16 and Table 17. It is worth noting that for 10-year data, there are already a good number of data points, and thus the synthetic set of data marginally improves the prediction results. However, as we have seen before, the improvement is much better for 1-year data. The number of data points in 1-year data is small, and hence the incorporation of synthetic data makes more improvement to the original results.
Even for the 10-year data, despite the similar accuracies of the machine learning (ML) and neural network (NN) models on both the 10-year original and predicted data, the ML and NN models trained on the original data struggle to detect class 1, as evidenced by the low precision, recall, and F1-scores for θ = 1 . In contrast, the predicted data yields better performance for class 1 detection, even with a limited dataset such as the 1-year data. This suggests that GPR is particularly effective for tasks like ML and NN when working with smaller datasets, as supported by [3]. However, using a larger dataset, such as 10 years of data, still outperforms the ML and NN results on the predicted data when compared to the original data, particularly in terms of precision, recall, and F1-score for θ .
The above results provide the analysis for CL=F data for various time frames. It is very clear from the tables that for shorter time intervals, the synthetic data generation provides much more improvement to the prediction. The synthetic data provides an additional source of data that augments the original dataset. As observed above, for a shorter time frame, where there is a lack of sufficient data points in the original set, the presented method helps to boost the analysis and provide a much better prediction.

5. Conclusions

In this paper, we introduce Gaussian process (GP) regression as a method to denoise observed financial data and generate predictions at densely interpolated time points. Specifically, we apply our methodology to one-year and ten-year datasets of CL=F crude oil prices. Using a Gaussian kernel with data-dependent initialization to optimize hyperparameters, we compute predictive means and uncertainty regions (i.e., confidence interval).
Subsequently, we construct a dataset that incorporates ten predictive features derived from the mean predictions, alongside the calculated daily changes and their percentages. We utilize the standard deviation of the daily change percent, multiplying it by a scalar as much as we need the threshold to be, to decide whether the daily change percent is categorized as under crash or not; then, later, we apply the crash count technique over subsequent 10-day intervals as class labels, enabling the application of various machine learning algorithms. Our analysis reveals that the predictive means obtained via GP regression yield higher accuracy compared to the original sparse data within an identical time period. Thus, data densification using GP regression enhances the detection of potential market crashes or fluctuations over the subsequent 10-day period.
As shown in this paper, the improvement in prediction is significantly greater for one year’s worth of data (compared to a bigger dataset). Since there are only a few data points in a one-year dataset, adding synthetic data improves the initial findings even more. This suggests that the proposed method can be implemented for small-data analysis. There are many occasions when procuring a large dataset is difficult. In those cases, the generation of synthetic data and the corresponding analysis presented in this paper can be implemented. The predictions are of importance for practitioners and policymakers.

Author Contributions

Conceptualization, M.G. and I.S.; methodology, M.G. and I.S.; software, M.G.; validation, M.G.; formal analysis, M.G. and I.S.; investigation, M.G. and I.S.; resources, M.G. and I.S.; writing—original draft preparation, M.G. and I.S.; writing—review and editing, M.G. and I.S.; visualization, M.G. and I.S.; supervision, I.S.; project administration, I.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are available in Yahoo Finance at https://finance.yahoo.com/.

Acknowledgments

The authors would like to thank the anonymous reviewers for their careful reading of the manuscript and for suggesting points to improve the quality of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tolstoy, A. Matched Field Processing for Underwater Acoustics; World Scientific: Singapore, 1993. [Google Scholar]
  2. Caviedes-Nozal, D.; Riis, N.A.; Heuchel, F.M.; Brunskog, J.; Gerstoft, P.; Fernandez-Grande, E. Gaussian processes for sound field reconstruction. J. Acoust. Soc. Am. 2021, 149, 1107–1119. [Google Scholar] [CrossRef]
  3. Frederick, C.; Michalopoulou, Z. Seabed classification and source localization with Gaussian processes and machine learning. JASA Express Lett. 2022, 2, 084801. [Google Scholar] [CrossRef] [PubMed]
  4. Michalopoulou, Z.-H.; Gerstoft, P. Inversion in an uncertain ocean using Gaussian processes. J. Acoust. Soc. Am. 2023, 153, 1600–1611. [Google Scholar] [CrossRef] [PubMed]
  5. Michalopoulou, Z.-H.; Gerstoft, P.; Caviedes-Nozal, D. Matched field source localization with Gaussian processes. JASA Express Lett. 2021, 1, 064801. [Google Scholar] [CrossRef] [PubMed]
  6. Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
  7. Rasmussen, C.E.; Williams, C.K. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
  8. Valentine, A.P.; Sambridge, M. Gaussian process models—I. A framework for probabilistic continuous inverse theory. Geophys. J. Int. 2020, 220, 1632–1647. [Google Scholar] [CrossRef]
  9. Valentine, A.P.; Sambridge, M. Gaussian process models—II. Lessons for discrete inversion. Geophys. J. Int. 2020, 220, 1648–1656. [Google Scholar] [CrossRef]
  10. SenGupta, I.; Nganje, W.; Hanson, E. Refinements of Barndorff-Nielsen and Shephard model: An analysis of crude oil price with machine learning. Ann. Data Sci. 2021, 8, 39–55. [Google Scholar] [CrossRef]
  11. Hui, X.; Sun, B.; Jiang, H.; SenGupta, I. Analysis of stock index with a generalized BN-S model: An approach based on machine learning and fuzzy parameters. Stoch. Anal. Appl. 2023, 41, 938–957. [Google Scholar] [CrossRef]
  12. Lin, M.; SenGupta, I.; Wilson, W. Estimation of VaR with jump process: Application in corn and soybean markets. Appl. Stoch. Model. Bus. Ind. 2024, 40, 1337–1354. [Google Scholar] [CrossRef]
  13. Awasthi, S.; SenGupta, I.; Wilson, W.; Lakkakula, P. Machine learning and neural network based model predictions of soybean export shares from US Gulf to China. Stat. Anal. Data Mining Asa Data Sci. J. 2022, 15, 707–721. [Google Scholar] [CrossRef]
  14. Shoshi, H.; SenGupta, I. Hedging and machine learning driven crude oil data analysis using a refined Barndorff-Nielsen and Shephard model. Int. J. Financ. Eng. 2021, 8, 2150015. [Google Scholar] [CrossRef]
  15. Shoshi, H.; Hanson, E.; Nganje, W.; SenGupta, I. Stochastic analysis and neural network-based yield prediction with precision agriculture. J. Risk Financ. Manag. 2021, 14, 397. [Google Scholar] [CrossRef]
  16. Roberts, M.; SenGupta, I. Infinitesimal generators for two-dimensional Lévy process-driven hypothesis testing. Ann. Financ. 2020, 16, 121–139. [Google Scholar] [CrossRef]
  17. Roberts, M.; SenGupta, I. Sequential Hypothesis Testing in Machine Learning, and Crude Oil Price Jump Size Detection. Appl. Math. Financ. 2020, 27, 374–395. [Google Scholar] [CrossRef]
  18. Bisht, A.; Chahar, A.; Kabthiyal, A.; Goel, A. Stock prediction using Gaussian process regression. In Proceedings of the 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 29–31 March 2022; IEEE: New York, NY, USA, 2022; pp. 693–699. [Google Scholar]
  19. Chen, Z. Gaussian Process Regression Methods and Extensions for Stock Market Prediction. Ph.D. Dissertation, University of Leicester, Leicester, UK, 2017. [Google Scholar]
  20. Gebreslasie, M.; SenGupta, I. Synthetic stocks and market movement estimation: An analysis with Gaussian process, drawdown, and data science. 2025; submitted. [Google Scholar]
  21. Ulapane, N.; Thiyagarajan, K.; Kodagoda, D. Hyper-parameter initialization for squared exponential kernel-based gaussian process regression. In Proceedings of the 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), Kristiansand, Norway, 21–25 June 2020; IEEE: New York, NY, USA, 2020; pp. 1154–1159. [Google Scholar]
Figure 1. Crude oil data trend (15 April 2024 to 15 April 2025) for the noisy data (blue dots) for one year and the predicted mean for the closing price f ¯ * at quarterly time steps obtained using GP regression (solid red line). The shaded area is the uncertainty of the mean prediction obtained via the square root of the diagonal elements of cov ( f * ) , given by ( f ¯ * ± 1.96 × standard deviation ) .
Figure 1. Crude oil data trend (15 April 2024 to 15 April 2025) for the noisy data (blue dots) for one year and the predicted mean for the closing price f ¯ * at quarterly time steps obtained using GP regression (solid red line). The shaded area is the uncertainty of the mean prediction obtained via the square root of the diagonal elements of cov ( f * ) , given by ( f ¯ * ± 1.96 × standard deviation ) .
Commodities 04 00009 g001
Figure 2. CL=F oil price data trend (15 April 2015 to 15 April 2025) for the noisy data (blue dots) for two years and the predicted mean f ¯ * at densed points (virtual time steps) obtained for = 6.4 (i.e., denormalized) using GP regression (solid red line). The shaded area is the uncertainty of the mean prediction obtained via the square root of the diagonal elements of cov ( f * ) , given by ( f ¯ * ± 1.96 × standard deviation ) .
Figure 2. CL=F oil price data trend (15 April 2015 to 15 April 2025) for the noisy data (blue dots) for two years and the predicted mean f ¯ * at densed points (virtual time steps) obtained for = 6.4 (i.e., denormalized) using GP regression (solid red line). The shaded area is the uncertainty of the mean prediction obtained via the square root of the diagonal elements of cov ( f * ) , given by ( f ¯ * ± 1.96 × standard deviation ) .
Commodities 04 00009 g002
Table 1. Some properties of the empirical 1-year empirical dataset.
Table 1. Some properties of the empirical 1-year empirical dataset.
MetricsOriginal Data—Daily Price ChangeOriginal Data—Daily Price Change (%)
Mean−0.09476−0.11129
Median−0.02999−0.038951
Maximum3.610005.14979
Minimum−4.95999−7.40851
Table 2. Optimal hyperparameter values for predicted closing price for CL=F from 15 April 2024 to 15 April 2025.
Table 2. Optimal hyperparameter values for predicted closing price for CL=F from 15 April 2024 to 15 April 2025.
HyperparameterInitializationNormalized Optimal ValuesDenormalized Optimal Values
Noise Variance ( σ n 2 )0.0104460.0566051.635761
Signal Variance ( σ f 2 )0.0410271.49463043.191622
Length Scale ()0.0137470.0509293.704892
Table 3. Predicted closing prices, daily change, daily percent change, and crash label, for K S = 0.3874 .
Table 3. Predicted closing prices, daily change, daily percent change, and crash label, for K S = 0.3874 .
IndexPredicted CloseDaily ChangeDaily Percent ChangeCrash
085.02760.00000.00000
184.9473−0.0803−0.09440
284.8338−0.1134−0.13350
384.6918−0.1420−0.16740
484.5263−0.1655−0.19540
...............
100061.39620.20890.34150
100161.60530.20910.34050
100261.81290.20770.33710
100362.01870.20570.33290
100462.22310.20440.32950
Table 4. Original crude oil data, daily change, daily percent change, and crash labels, for K S = 2.9089 .
Table 4. Original crude oil data, daily change, daily percent change, and crash labels, for K S = 2.9089 .
IndexOpenHighLowCloseVolumeDaily ChangeDaily Percent ChangeCrash
085.930086.110084.050085.41003438940.00000.00000
185.700086.180084.750085.3600241343−0.0500−0.05850
285.360085.510082.550082.6900259540−2.6700−3.12791
382.790083.470081.560082.7300844680.04000.04840
482.620086.280081.800083.1400769010.41000.49560
...........................
24761.030061.750057.880059.5800557655−1.1200−1.84510
24858.320062.930055.120062.35005922502.77004.64920
24962.710063.340058.760060.0700391826−2.2800−3.65681
25060.200061.870059.430061.50003062311.43002.38060
25161.700062.680060.590061.53002380680.03000.04880
Table 5. Crude oil dataframe daily change for predicted closing price for augmented synthetic dataset.
Table 5. Crude oil dataframe daily change for predicted closing price for augmented synthetic dataset.
IndexDaily Change 1Daily Change 2Daily Change 3Daily Change 4Daily Change 5Daily Change 6Daily Change 7Daily Change 8Daily Change 9Daily Change 10Target
00.000000−0.080275−0.113435−0.142010−0.165518−0.183608−0.196062−0.202804−0.203898−0.1995470
1−0.080275−0.113435−0.142010−0.165518−0.183608−0.196062−0.202804−0.203898−0.199547−0.1900810
2−0.113435−0.142010−0.165518−0.183608−0.196062−0.202804−0.203898−0.199547−0.190081−0.1759530
3−0.142010−0.165518−0.183608−0.196062−0.202804−0.203898−0.199547−0.190081−0.175953−0.1577190
4−0.165518−0.183608−0.196062−0.202804−0.203898−0.199547−0.190081−0.175953−0.157719−0.1360270
....................................
981−0.659798−0.614577−0.559984−0.497851−0.430186−0.359083−0.286640−0.214869−0.145620−0.0805140
982−0.614577−0.559984−0.497851−0.430186−0.359083−0.286640−0.214869−0.145620−0.080514−0.0208920
983−0.559984−0.497851−0.430186−0.359083−0.286640−0.214869−0.145620−0.080514−0.0208920.0322280
984−0.497851−0.430186−0.359083−0.286640−0.214869−0.145620−0.080514−0.0208920.0322280.0781700
985−0.430186−0.359083−0.286640−0.214869−0.145620−0.080514−0.0208920.0322280.0781700.1166010
Table 6. Crude oil dataframe daily change for original data (i.e., no data augmentation).
Table 6. Crude oil dataframe daily change for original data (i.e., no data augmentation).
IndexDaily Change 1Daily Change 2Daily Change 3Daily Change 4Daily Change 5Daily Change 6Daily Change 7Daily Change 8Daily Change 9Daily Change 10Target
00.000000−0.050003−2.6699980.0400010.409996−0.2900010.510002−0.5500030.7600020.2799990
1−0.050003−2.6699980.0400010.409996−0.2900010.510002−0.5500030.7600020.279999−1.2200010
2−2.6699980.0400010.409996−0.2900010.510002−0.5500030.7600020.279999−1.220001−0.6999970
30.0400010.409996−0.2900010.510002−0.5500030.7600020.279999−1.220001−0.699997−2.9300000
40.409996−0.2900010.510002−0.5500030.7600020.279999−1.220001−0.699997−2.930000−0.0500030
....................................
2281.430000−1.1299970.6299970.400002−0.6800000.2600021.0999980.0199970.830002−0.1100011
229−1.1299970.6299970.400002−0.6800000.2600021.0999980.0199970.830002−0.1100010.6500021
2300.6299970.400002−0.6800000.2600021.0999980.0199970.830002−0.1100010.6500020.2699971
2310.400002−0.6800000.2600021.0999980.0199970.830002−0.1100010.6500020.269997−0.5599981
232−0.6800000.2600021.0999980.0199970.830002−0.1100010.6500020.269997−0.5599982.1200031
Table 7. Performance metrics for traditional machine learning models for CL=F 1-year predicted data.
Table 7. Performance metrics for traditional machine learning models for CL=F 1-year predicted data.
Models θ PrecisionRecallF1-ScoreSupportAccuracy
LR01.000.940.971720.94
10.701.000.8326
KNN00.970.920.951720.91
10.620.810.7026
RF00.970.970.971720.94
10.780.810.8126
SVM01.000.960.981720.96
10.791.000.8826
DT00.960.940.951720.92
10.670.770.7126
Table 8. Performance metrics for neural networks for CL=F 1-year predicted data.
Table 8. Performance metrics for neural networks for CL=F 1-year predicted data.
Models θ PrecisionRecallF1-ScoreSupportAccuracy
DL00.990.950.971720.95
10.740.960.8326
LSTM01.000.950.971720.95
10.741.000.8526
BN01.000.940.971720.94
10.701.000.8326
Table 9. Performance metrics for traditional machine learning models for CL=F 1-year original data.
Table 9. Performance metrics for traditional machine learning models for CL=F 1-year original data.
Models θ PrecisionRecallF1-ScoreSupportAccuracy
LR00.780.640.70390.55
10.070.120.098
KNN00.750.460.57390.43
10.090.250.138
RF00.820.950.88390.79
10.000.000.0077
SVM00.831.000.91390.83
10.000.000.008
DT00.760.640.69390.53
10.000.000.008
Table 10. Performance metrics for neural networks for CL=F 1-year original data.
Table 10. Performance metrics for neural networks for CL=F 1-year original data.
Models θ PrecisionRecallF1-ScoreSupportAccuracy
DL00.780.720.75390.60
10.000.000.008
LSTM00.800.620.70390.55
10.120.250.168
BN00.890.820.85390.77
10.360.500.428
Table 11. Optimal hyperparameter values from 15 April 2015 to 15 April 2025.
Table 11. Optimal hyperparameter values from 15 April 2015 to 15 April 2025.
HyperparameterInitializationNormalized Optimal ValuesDenormalized Optimal Values
Noise Variance ( σ n 2 )0.0010120.0175855.718099
Signal Variance ( σ f 2 )0.0040481.100225357.762116
Length Scale ()0.0013770.0102607.448797
Table 12. Predicted closing prices, daily change, percentage change, and crash label for 10 yr.
Table 12. Predicted closing prices, daily change, percentage change, and crash label for 10 yr.
IndexPredicted CloseDaily ChangeDaily Change PercentCrash
056.5533430.0000000.0000000
156.468885−0.084458−0.1493420
256.391173−0.077712−0.1376190
356.320490−0.070683−0.1253440
456.257094−0.063396−0.1125630
...............
10,05260.4946170.0971170.1607960
10,05360.6265660.1319490.2181170
10,05460.7922390.1656730.2732670
10,05560.9902800.1980410.3257670
10,05661.2191030.2288230.3751800
Table 13. Predicted crude oil daily percentage change features (window size = 10) and target labels for crash detection.
Table 13. Predicted crude oil daily percentage change features (window size = 10) and target labels for crash detection.
IndexDaily Change 1Daily Change 2Daily Change 3Daily Change 4Daily Change 5Daily Change 6Daily Change 7Daily Change 8Daily Change 9Daily Change 10Target
00.000000−0.084458−0.077712−0.070683−0.063396−0.055880−0.048163−0.040279−0.032258−0.0241360
1−0.084458−0.077712−0.070683−0.063396−0.055880−0.048163−0.040279−0.032258−0.024136−0.0159460
2−0.077712−0.070683−0.063396−0.055880−0.048163−0.040279−0.032258−0.024136−0.015946−0.0077260
3−0.070683−0.063396−0.055880−0.048163−0.040279−0.032258−0.024136−0.015946−0.0077260.0004900
4−0.063396−0.055880−0.048163−0.040279−0.032258−0.024136−0.015946−0.0077260.0004900.0086650
....................................
10,033−0.421399−0.413875−0.403301−0.389730−0.373247−0.353962−0.332011−0.307557−0.280786−0.2519030
10,034−0.413875−0.403301−0.389730−0.373247−0.353962−0.332011−0.307557−0.280786−0.251903−0.2211350
10,035−0.403301−0.389730−0.373247−0.353962−0.332011−0.307557−0.280786−0.251903−0.221135−0.1887260
10,036−0.389730−0.373247−0.353962−0.332011−0.307557−0.280786−0.251903−0.221135−0.188726−0.1549310
10,037−0.373247−0.353962−0.332011−0.307557−0.280786−0.251903−0.221135−0.188726−0.154931−0.1200190
Table 14. Original crude oil price data (without synthetic date), daily change, percentage change, and crash labels.
Table 14. Original crude oil price data (without synthetic date), daily change, percentage change, and crash labels.
IndexOpenHighLowCloseVolumeDaily ChangeDaily Change PercentCrash
053.54999956.68999953.38999956.3899995089040.0000000.0000000
155.91999857.41999855.07000056.7099994131340.3200000.5674760
256.56000156.88000155.31000155.740002230623−0.969997−1.7104520
356.16000057.16999854.84999856.3800011123820.6399991.1481870
456.41000056.91000055.00999855.259998354244−1.120003−1.9865250
...........................
251061.02999961.75000057.88000159.580002557655−1.119999−1.8451380
251158.32000062.93000055.11999962.3499985922502.7699974.6492050
251262.70999963.34000058.75999860.070000391826−2.279999−3.6567740
251360.20000161.86999959.43000061.5000003062311.4300002.3805570
251461.70000162.68000060.59000061.5299992380680.0299990.0487790
Table 15. Original crude oil daily percentage change features (window size = 10) and crash target labels.
Table 15. Original crude oil daily percentage change features (window size = 10) and crash target labels.
IndexDaily Change 1Daily Change 2Daily Change 3Daily Change 4Daily Change 5Daily Change 6Daily Change 7Daily Change 8Daily Change 9Daily Change 10Target
00.0000000.320000−0.9699970.639999−1.1200030.9000021.580002−0.590000−0.1600000.0700000
10.320000−0.9699970.639999−1.1200030.9000021.580002−0.590000−0.1600000.0700001.5200000
2−0.9699970.639999−1.1200030.9000021.580002−0.590000−0.1600000.0700001.5200001.0499990
30.639999−1.1200030.9000021.580002−0.590000−0.1600000.0700001.5200001.049999−0.4800000
4−1.1200030.9000021.580002−0.590000−0.1600000.0700001.5200001.049999−0.480000−0.2200010
....................................
24911.430000−1.1299970.6299970.400002−0.6800000.2600021.0999980.0199970.830002−0.1100010
2492−1.1299970.6299970.400002−0.6800000.2600021.0999980.0199970.830002−0.1100010.6500020
24930.6299970.400002−0.6800000.2600021.0999980.0199970.830002−0.1100010.6500020.2699970
24940.400002−0.6800000.2600021.0999980.0199970.830002−0.1100010.6500020.269997−0.5599980
2495−0.6800000.2600021.0999980.0199970.830002−0.1100010.6500020.269997−0.5599982.1200030
Table 16. Performance metrics for traditional machine learning models for CL=F 10-year predicted data (augmented synthetic dataset).
Table 16. Performance metrics for traditional machine learning models for CL=F 10-year predicted data (augmented synthetic dataset).
Models θ PrecisionRecallF1-ScoreSupportAccuracy
LR01.000.900.9418980.90
10.350.930.50110
KNN00.990.950.9718980.94
10.480.760.59110
RF00.980.980.9818980.96
10.670.610.64110
SVM00.980.950.9718980.94
10.440.650.53110
DT00.980.940.9618980.93
10.430.750.55110
Table 17. Performance metrics for neural networks for CL=F 10-year predicted data (augmented synthetic dataset).
Table 17. Performance metrics for neural networks for CL=F 10-year predicted data (augmented synthetic dataset).
Models θ PrecisionRecallF1-ScoreSupportAccuracy
DL01.000.910.9518980.91
10.370.940.54110
LSTM01.000.920.9618980.93
10.420.950.58110
BN01.000.920.9618980.92
10.410.950.57110
Table 18. Performance metrics for traditional machine learning models on the original CL=F 10-year data (i.e., no augmented synthetic data).
Table 18. Performance metrics for traditional machine learning models on the original CL=F 10-year data (i.e., no augmented synthetic data).
Models θ PrecisionRecallF1-ScoreSupportAccuracy
LR00.990.850.914890.84
10.060.450.1111
KNN00.980.990.984890.97
10.120.090.1111
RF00.981.000.994890.98
10.000.000.0011
SVM00.981.000.994890.98
10.000.000.0011
DT00.980.990.984890.97
10.000.000.0011
Table 19. Performance metrics for neural networks for original CL=F 10-year data (i.e., no augmented synthetic data).
Table 19. Performance metrics for neural networks for original CL=F 10-year data (i.e., no augmented synthetic data).
Models θ PrecisionRecallF1-ScoreSupportAccuracy
DL00.980.980.984890.91
10.250.270.2611
LSTM00.990.990.994890.97
10.420.450.4311
BN00.990.960.974890.94
10.170.360.2411
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gebreslasie, M.; SenGupta, I. Oil Commodity Movement Estimation: Analysis with Gaussian Process and Data Science. Commodities 2025, 4, 9. https://doi.org/10.3390/commodities4020009

AMA Style

Gebreslasie M, SenGupta I. Oil Commodity Movement Estimation: Analysis with Gaussian Process and Data Science. Commodities. 2025; 4(2):9. https://doi.org/10.3390/commodities4020009

Chicago/Turabian Style

Gebreslasie, Mulue, and Indranil SenGupta. 2025. "Oil Commodity Movement Estimation: Analysis with Gaussian Process and Data Science" Commodities 4, no. 2: 9. https://doi.org/10.3390/commodities4020009

APA Style

Gebreslasie, M., & SenGupta, I. (2025). Oil Commodity Movement Estimation: Analysis with Gaussian Process and Data Science. Commodities, 4(2), 9. https://doi.org/10.3390/commodities4020009

Article Metrics

Back to TopTop