Next Article in Journal
Ecotechnologies Versus Conventional Networks: A Socioeconomic Analysis for Water Management in Rural Communities
Previous Article in Journal
A Review of Drones in Smart Agriculture: Issues, Models, Trends, and Challenges
Previous Article in Special Issue
Peat Partial Replacement: Life Cycle Assessment and Eco-Efficiency in Potted Ornamental Sage Cultivation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

From Feature Selection to Forecasting: A Two-Stage Hybrid Framework for Food Price Prediction Using Economic Indicators in Türkiye

1
Department of Management Information Systems, Graduate School of Informatics, Gazi University, Ankara 06170, Türkiye
2
Department of Management Information Systems, Faculty of Applied Sciences, Gazi University, Ankara 06170, Türkiye
3
Department of Economics, Faculty of Economics and Administrative Sciences, Ankara Hacı Bayram Veli University, Ankara 06560, Türkiye
4
Department of Computer Engineering, Faculty of Technology, Gazi University, Ankara 06170, Türkiye
*
Author to whom correspondence should be addressed.
Sustainability 2026, 18(1), 503; https://doi.org/10.3390/su18010503
Submission received: 10 November 2025 / Revised: 24 December 2025 / Accepted: 24 December 2025 / Published: 4 January 2026

Abstract

This study develops a comprehensive two-stage hybrid framework to forecast food prices in Türkiye, addressing inflation prediction challenges in volatile emerging markets where sample sizes are limited. In the first stage, systematic relationship analyses—comprising correlation, ARDL, cointegration, and Granger causality tests—identified ten key macroeconomic predictors from Central Bank datasets. In the second stage, we evaluated diverse predictive models, including XGBoost, Gradient Boosting, Ridge, LSTM, and SVR, using rice prices as a pilot case. A critical methodological contribution is the empirical comparison of feature engineering strategies; results demonstrate that traditional “smoothing” techniques dilute volatility signals, whereas the “Log-Return Transformation Strategy” strategy significantly improves accuracy. XGBoost emerged as the champion model, achieving a remarkable R2 of 0.932 (MAE: 1.68 TL) on the test set. To strictly validate this performance against small-sample limitations, a Recursive Walk-Forward Validation was conducted, confirming the model’s robustness with a strong R2 of 0.870 over a 31-month rolling simulation. Furthermore, Robust Rolling SHAP analysis identified Insurance and Transportation costs as primary drivers, evidencing a strong cost-push mechanism and inflation inertia. These findings integrate econometric rigor with machine learning transparency, offering resilient early warning tools for sustainable inflation management.

1. Introduction

Food price volatility has emerged as a critical challenge for emerging economies, with significant implications for both macroeconomic stability and social welfare. The interconnected nature of modern economic systems implies that food prices are influenced not only by traditional agricultural factors but also by complex interactions with diverse economic sectors including healthcare, communication services, and financial markets. Understanding and predicting these price movements has become essential for effective economic policy formulation, particularly in emerging economies facing high inflation volatility [1]. This is increasingly recognized as a cornerstone of economic and social sustainability, where food security underpins equitable development and resilience against systemic shocks.
In Türkiye, food price dynamics hold particular significance as the Food and Non-Alcoholic Beverages Consumer Price Index (CPI) group represents approximately 25% of the national CPI basket as of 2024. Recent empirical evidence demonstrates that food price fluctuations create cascading effects throughout economic systems, influencing macroeconomic variables such as economic growth, employment levels, and income distribution patterns [2]. The Turkish Statistical Institute’s classification of food prices within the CPI framework underscores the systemic importance of developing accurate forecasting models for these price movements [3]
Traditional econometric approaches to food price analysis have predominantly focused on supply-side factors and historical price patterns, often failing to capture the complex cross-sector interdependencies that characterize modern economies. While machine learning techniques offer promising alternatives for capturing non-linear relationships and complex interactions among economic variables [4], limited research has systematically compared traditional econometric methods with modern machine learning approaches within a unified analytical framework or explored optimal prediction windows for timely policy interventions.
This research addresses these critical gaps by developing a comprehensive two-stage methodology that first systematically identifies key economic drivers through multiple econometric techniques, then evaluates ten different approaches for forecasting accuracy. Using monthly data spanning 2012–2024 from the Central Bank of Türkiye and 2017–2024 from the World Food Programme database, this study provides both theoretical insights into inflation transmission mechanisms and practical tools for economic forecasting [5,6].
The primary innovation of this research lies in its two-stage hybrid architecture, which harmonizes econometric rigor with machine learning interpretability. Unlike existing ‘black-box’ forecasting approaches, our framework utilizes a union-based ensemble strategy for feature selection and subsequently decodes model predictions through Robust Rolling SHAP analysis. Furthermore, this study introduces a novel ‘Log-Return Transformation Strategy’ that preserves high-frequency volatility signals, offering a significant methodological departure from traditional smoothing techniques that often dilute critical information in volatile emerging markets. Building upon this methodological foundation, the primary contributions of this study include demonstrating the predictive superiority of ensemble machine learning methods over traditional baselines in high-volatility environments, establishing a robust validation framework to address sample size limitations common in emerging market data, and providing empirical evidence on the structural cost-push factors driving food price dynamics.

2. Literature Review

The literature on food price prediction using economic indicators and machine learning techniques has expanded significantly in recent years. The reviewed studies can be categorized under two headings: relationship and causality and prediction models.
Table 1 summarizes key findings from the literature review across these categories:
These studies provide valuable insights into the application of various statistical methods and machine learning techniques for analyzing economic data and predicting food prices. They highlight different approaches to feature selection, model development, and evaluation metrics that have guided the current research.

3. Methodology

3.1. Research Framework

This study employs a two-stage analytical framework: Stage 1 systematically identifies economic drivers affecting overall food prices through CPI analysis; Stage 2 validates these drivers through pilot application to rice price prediction, providing a scalable methodology for broader food price forecasting applications.

3.2. Data Collection and Preprocessing

3.2.1. Dataset Creation

The dataset was created by including Dollar Sale, Euro Sale, Gold Bullion Sale prices, and the entire CPI group from the Electronic Data Delivery System (EVDS) of the Central Bank of the Republic of Türkiye. The data spans from 2012 to 2024, providing a comprehensive time period for analysis [5,6].
Rice was selected as the pilot case due to its status as a staple food for over half of the world’s population [33] and extensive documentation of its price volatility relationships with macroeconomic indicators [34]. Rice prices were restricted to January 2017 to October 2024 (94 months) due to rice price data availability from the World Food Programme database. The city is selected as Ankara, Türkiye to assure data consistency.
The proposed analytical framework and all predictive models were implemented using the Python 3.12.12 programming language within the Google Colab cloud environment, leveraging specialized libraries.

3.2.2. Data Preprocessing

In the first stage of the study, missing value analysis revealed only variable ‘103.POST-SECONDARY AND PRE-UNIVERSITY EDUCATION’ with 106 missing values (71.1% missing rate) across the 2012–2024 EVDS dataset. This variable was removed because high missing proportion requires imputation introducing bias, education costs have minimal direct impact on food prices, and preliminary analysis showed negligible correlation (r < 0.10). All remaining variables had complete data.
Following the data cleaning process for Stage 1, a standard Min–Max Normalization technique was applied to all independent variables. This transformation scaled the data to a fixed range of [0, 1], eliminating scale discrepancies between diverse macroeconomic indicators and ensuring that the feature selection algorithms (e.g., Correlation, ARDL) were not biased by the magnitude of the variables.
In the second stage of the study (Prediction), a distinct and strictly temporal preprocessing pipeline was implemented to prevent look-ahead bias and ensure realistic forecasting conditions. The dataset was split chronologically into a training set (January 2017–December 2023, 80%) and a test set (January 2024–October 2024, 20%).
Unlike the global normalization in Stage 1, the normalization in Stage 2 was conducted in a temporally aware manner. The Min-Max Scaler was fitted exclusively on the training set, and these learned parameters were subsequently applied to transform the test set:
X_normalized = (X − X_min)/(X_max − X_min)
This approach ensures that no future information contaminates the training phase. Additionally, prior to scaling, the non-stationary price series were transformed into stationary momentum signals using logarithmic differencing, as detailed in the Feature Engineering Section 3.4.1.

3.3. Relationship and Causality Analysis

The relationship and causality analysis phase aims to systematically identify which economic indicators have significant relationships with food prices, following the methodological approaches established in recent econometric literature.

3.3.1. Time Series Analysis and Stationarity Testing

Partial Autocorrelation Function (PACF) analysis was conducted to measure direct effects of the relationship between an observation and its lagged values in the time series [35]. This technique helps determine the appropriate lag structure for autoregressive models by identifying the direct correlation between current and past values:
PACF(k) = corr(Xt, Xt−k | Xt−1, Xt−2, …, Xt−k+1)
Augmented Dickey–Fuller (ADF) Test was used to check for stationarity in the data series, as required for causality testing [36]. The test examines the presence of unit roots in time series:
Δyt = α + βt + γyt−1 + δ1Δyt−1 + … + δp−1Δyt−p+1 + εt
where the null hypothesis H0: γ = 0 (unit root exists) versus H1: γ < 0 (stationary). This test is crucial for ensuring the validity of subsequent causality analyses, as non-stationary series can lead to spurious regression results.

3.3.2. Correlation Analysis

Three different correlation methods were employed to identify variables influencing the food CPI, providing complementary perspectives on variable relationships [37]:
Pearson Correlation measures linear relationships between variables:
r   =   Σ [ ( x i     x ¯ ) ( y i     y ¯ ) ] / [ Σ ( x i     x ¯ ) 2 Σ ( y i     y ¯ ) 2 ]
This parametric measure is most appropriate when variables follow normal distributions and relationships are linear.
Spearman Rank Correlation captures monotonic relationships regardless of linearity:
Ρ = 1 − (6Σdi2)/[n(n2 − 1)]
This non-parametric approach is robust to outliers and suitable for variables that may not meet normality assumptions.
Kendall’s Tau provides robust association measure based on concordant and discordant pairs:
τ = (C − D)/[n(n − 1)/2]
where C and D represent concordant and discordant pairs, respectively. This measure is particularly useful for small sample sizes and provides more reliable inference.

3.3.3. Lag Features Analysis

Lag correlation analysis was performed for previous months, creating lagged features for all columns and calculating correlations with the target variable using all three correlation methods. This approach helps identify optimal temporal relationships and leads/lags in the data, which is crucial for understanding how economic indicators influence food prices over time [4].

3.3.4. Autoregressive Distributed Lag (ARDL) Model

The ARDL analysis examined relationships from both short-term and long-term perspectives, following the methodology established by Pesaran et al. [38]. This approach has been particularly effective in food price analysis, as demonstrated by Özçelik and Uslu [2]:
yt = α + Σ(i = 1 to p) βiyt− + Σ(i = 0 to q) γixt−i + εt
The model considered 12-month lags for the food CPI and 1-month lags for other items. ARDL models are advantageous because they can handle variables with different orders of integration and provide both short-run dynamics and long-run equilibrium relationships.

3.3.5. Cointegration Test (Engle-Granger)

The cointegration analysis tested for long-term relationships between non-stationary series using the two-step Engle-Granger method [39]. This technique is essential for identifying stable long-run relationships between economic variables, as applied in similar studies examining price relationships [9]:
Step 1:
yt = α + βxt + ut (estimate long-run relationship)
Step 2: Test residuals for stationarity to confirm cointegration
Cointegration testing helps distinguish between spurious and genuine long-term relationships among economic variables.

3.3.6. Random Forest Feature Importance

Random Forest feature importance scores were used to identify variables with the strongest relationship to the target variable, employing both random 80%/20% split and chronological split approaches [40]. This machine learning-based approach complements traditional econometric methods and has shown effectiveness in feature selection for economic forecasting applications.

3.3.7. Attribute Shortlist

To construct a robust and representative subset of predictors, this study employed a union-based ensemble feature selection strategy, following the methodology proposed by Saeys et al. [41]. Economic data often exhibits heterogeneity in relationships; for instance, some indicators may show strong linear correlations (captured by Pearson), while others may have non-linear dependencies (captured by Random Forest) or long-term equilibrium relationships (captured by Cointegration/ARDL) that simple correlations might miss.
Relying on a single feature selection technique carries the risk of method-specific bias and instability, particularly in small-sample domains. Therefore, we adopted a consensus approach by selecting the top two performing variables from each of the six distinct analysis methods. This strategy ensures the inclusion of diverse economic drivers while controlling the dimensionality of the feature space to prevent overfitting in the predictive models.

3.3.8. Granger Causality Test

After stabilizing the dataset, Granger causality tests were applied to identify causal relationships between variables [42]. This approach has been successfully used in food price studies to establish predictive relationships [8]:
yt = α0 + Σ(i = 1 to p) αiyt−i + Σ(i = 1 to p) βixt−i + εt
where rejection of H0: β1 = β2 = … = βp = 0 indicates x Granger-causes y. This test determines whether past values of economic indicators contain information useful for predicting food prices beyond what is contained in past values of food prices alone.

3.4. Predictive Model Development

Based on the relationship analysis results, ten key predictor variables were identified for food price prediction. Multiple prediction models were then developed and compared for their forecasting performance, following the comparative modeling approach established in recent food price prediction literature [18].

3.4.1. Feature Engineering Strategy

To address the non-stationary nature of food price series and capture the rapid transmission of economic shocks in an emerging market context, we adopted a specific feature engineering strategy termed “Log-Return Transformation Strategy” prioritizes the preservation of market volatility signals over noise reduction.
The strategy involves two critical transformations applied to the raw dataset:
1.
Logarithmic Differencing for Stationarity:
Consistent with standard econometric protocols for volatile commodity markets, we applied logarithmic differencing (log-returns) to all price series. This transformation is defined as:
Rt = ln(Pt) − ln(Pt−1)
where Pt is the price at time t [43].
2.
Lagged Features vs. Smoothing:
While traditional approaches often employ rolling window averages to smooth out noise, some studies indicate that such smoothing can obscure high-frequency signals required to detect sudden price spikes. To capture the immediate autoregressive structure of market shocks, we constructed the input vector using raw lagged observations (Lag-1 and Lag-2) of the log-returns. This method ensures that the models—particularly non-linear ones like XGBoost and LSTM—have access to the immediate momentum of the market without the signal dilution associated with moving averages.
Consequently, the predictive models in this study were trained using a feature set consisting of the log-returns of the identified economic indicators and the target variable at lags t − 1 and t − 2 [44].

3.4.2. Linear Regression

Linear regression serves as the baseline model for comparison, testing the fundamental assumption of linear relationships between economic indicators and food prices [45]:
Y = β0 + β1X1 + β2X2 + … + βₚXₚ + ε
β ^   =   ( X X ) 1 X Y
Despite its simplicity, linear regression often provides competitive performance in economic forecasting and offers excellent interpretability for policymakers.

3.4.3. Random Forest

Random Forest combines multiple decision trees using bootstrap aggregating to capture non-linear relationships and interactions between variables [41]. This ensemble method has demonstrated superior performance in agricultural price prediction, as shown by Atalan [18] for Turkish milk price forecasting:
ĥβ(x) = Σi=1n wi(x)yi
ĥ_RF(x) = (1/B) Σβ=1B ĥβ(x)
VI(Xj) = (1/B) Σβ=1B Σt p(t)[ΔI(t)]I(v(t) = j)
where B is the total number of trees, and VI represents variable importance. The model’s ability to provide feature importance rankings makes it valuable for understanding which economic indicators most strongly influence food prices.

3.4.4. Gradient Boosting

Gradient Boosting sequentially combines weak learners to minimize residual errors, building upon the boosting framework developed by Friedman [46]. This method excels at capturing complex patterns in economic data:
F(x) = Σm=1ᴹ γmhm(x)
Fm(x) = Fm−1(x) + γmhm(x)
rim = −[∂L(yi, F(xi))/∂F(xi)]_{F = Fm−1}
where γm is the step size and rim are pseudo-residuals. The sequential error correction mechanism makes gradient boosting particularly effective for economic time series with complex underlying patterns.

3.4.5. XGBoost

XGBoost implements an optimized gradient boosting framework with regularization techniques to prevent overfitting [47]. This algorithm has gained widespread adoption in economic forecasting due to its robust performance:
L(θ) = Σi l(yi, ŷi) + Σk Ω(fk)
Ω(f) = γT + (1/2)λ||w||2
L ~ ( t )   =   Σ i = 1 n [ g i f t ( x i )   +   ( 1 / 2 ) h i f t 2 ( x i ) ]   +   Ω ( f t )
where Ω(f) represents regularization terms. The built-in regularization and efficient implementation make XGBoost suitable for economic datasets with high dimensionality.

3.4.6. Support Vector Regression (SVR)

SVR uses ε-insensitive loss function for robust predictions against outliers, which is particularly valuable in economic data that often contains extreme values [48]:
f(x) = Σi=1ni − αi*)K(xi, x) + b
K(xi, xj) = exp(-γ||xi − xj||2)
min (1/2)||w||2 + C Σi=1ni + ξi*)
The kernel approach allows SVR to capture non-linear relationships while maintaining robustness to outliers, making it suitable for volatile economic indicators.

3.4.7. Long Short-Term Memory (LSTM)

LSTM addresses vanishing gradient problems in RNNs and is designed to capture long-term dependencies in time series data. Deep learning approaches like LSTM have shown promise in agricultural price prediction [49]:
ft = σ(Wf × [ht−1, xt] + bf)
it = σ(Wi × [ht−1, xt] + bi)
C ~ t   =   tanh ( W C   ×   [ h t 1 , x t ]   +   b C )
C t = f t   ×   C t 1   +   i t   ×   C ~ t
ot = σ(Wo × [ht−1, xt] + bo)
ht = ot × tanh(Ct)
The gate mechanisms allow LSTM to selectively remember and forget information, potentially capturing complex temporal patterns in economic relationships.

3.4.8. Artificial Neural Networks (ANN)

ANNs use backpropagation for learning complex non-linear patterns between economic indicators and food prices [50]:
yj = f(Σi wijxi + bj)
wij(t + 1) = wij(t) − η ∂E/∂wij
Multi-layer perceptrons can approximate complex non-linear functions, making them suitable for modeling intricate economic relationships.

3.4.9. NARX-RNN

NARX networks model non-linear dynamic systems with exogenous inputs, combining temporal dependencies with external economic factors [51]:
y(t) = f(y(t − 1), y(t − 2), …, y(t − nγ), u(t − 1), u(t − 2), …, u(t − nu))
h(t) = tanh(Wγγh(t − 1) + Wu(t) + Wγγy(t − 1) + b)
y(t) = Wγγh(t) + bγ
The model was tested with different lag periods (3, 6, 9, and 12 months) to find optimal prediction windows. This approach bridges traditional econometric modeling with neural network capabilities.

3.4.10. ANFIS

ANFIS combines fuzzy logic with neural networks, providing interpretable non-linear modeling [52]:
x = A, Y = B ise z = px + qy + r
μai(x) = exp(−(x − ci)2/2σi2)
wi = μai(x) × μβj(y)
w ¯ i   =   w i / Σ i w i
w ¯ i f i   =   w ¯ i   ( p i x   +   q i y   +   r i )
Output = Σ i w ¯ i f i
Due to architectural constraints, ANFIS was limited to the two highest-ranked features from the feature selection process. The fuzzy rule-based approach offers transparency in decision-making processes.

3.4.11. SHAP-Based Ensemble Interpretability

To ensure the interpretability of the predictive framework and address potential instability arising from the limited sample size, we employed SHAP (Shapley Additive exPlanations) values based on game theory. Unlike standard static analysis which relies on a single test set, we implemented a Rolling Window SHAP framework to guarantee the robustness of feature importance [53,54]. The specific SHAP estimator is selected contingent upon the architecture of the optimal model identified in the evaluation phase (e.g., TreeExplainer for tree-based ensembles or Kernel Explainer for model-agnostic approximation). To validate that the identified economic drivers are not artifacts of a specific time period, SHAP values were computed across 5 distinct temporal validation folds. The final importance score for each feature (j) was derived by averaging the absolute SHAP values across all folds (K = 5) and time steps (T):
Ij = (1/(K × T)) ∑Kk=1Tt=1(k)(j)|
We also calculated the standard deviation of these values to quantify the volatility of each feature’s impact over time. This approach allows us to distinguish between structural economic drivers (low variance) and conjuncture-specific factors (high variance).

3.5. Model Evaluation

3.5.1. Evaluation Metrics

To ensure temporally valid evaluation, models were assessed using a chronological 80–20 train–test split, preserving the temporal ordering of observations. The training set encompasses data from January 2017 to December 2022, while the test set covers January 2023 to October 2024.
Three performance metrics provided complementary perspectives on prediction accuracy:
  • Mean Absolute Error (MAE):
MAE = (1/n) Σ|yi − ŷi|
  • Root Mean Squared Error (RMSE):
RMSE = √[(1/n) Σ(yi − ŷi)2]
  • Coefficient of Determination (R2):
R2 = 1 − (SSres/SStot)
These metrics enable comprehensive assessment of model performance, with MAE providing interpretable average error, RMSE penalizing larger deviations, and R2 indicating explained variance.

3.5.2. Comparative Feature Engineering Strategy

To determine the optimal representation of market volatility, we designed and tested two distinct feature engineering strategies:
  • Log-Return Transformation Strategy: Focuses on immediate logarithmic returns (rr) and short-term autoregressive lags (t − 1, t − 2). This strategy aims to capture high-frequency volatility without signal dilution.
  • Rolling Statistics Strategy: Incorporates rolling means and standard deviations (window size = 3) to test whether smoothing short-term fluctuations improves predictive stability.

3.5.3. Recursive Walk-Forward Validation

Recursive Walk-Forward Validation addressing the limitations of static train–test splits in small-sample regimes. Unlike standard k-fold cross-validation which can suffer from data leakage in time series, this method respects temporal order.
For each month t in the validation set (covering the period from 2022 to 2024). The model is trained on all available historical data up to month t − 1. Then, normalization parameters (Min-Max) are recalculated based solely on this training window to prevent look-ahead bias. The model predicts the value for month t (One-step-ahead forecasting). The actual value of month t is then added to the training set for the next iteration.
This rigorous procedure simulates a real-world operational environment where the model is continuously updated as new economic data becomes available.

4. Relationship and Causality Analyses and Results

4.1. Time Series Analysis

Time series analysis was conducted to observe the changes in the ‘TP FG J011’ (CPI food index) value over the last 10 years. The analysis revealed a significant increase, especially during the pandemic period. However, this increase follows a trend, with a notable difference between the trend until the end of 2021 and the trend afterward. This provides a suitable basis for examining the precursor causes.

4.2. Partial Autocorrelation Function (PACF)

PACF analysis was used to measure the direct effects of the relationship between an observation and its lagged values in the time series. When performing this test for the correlation of the “TP FG J011” column with previous months, it was observed that it only has a direct relationship with the previous month.

4.3. Correlation Analysis

To identify other items that influence the “TP FG J011” dependent variable, three different correlation methods were used: Pearson, Spearman, and Kendall methods. Table 2, Table 3 and Table 4 show the top three items with the highest correlation values for each method.
As shown in Table 2, household-related CPI components demonstrate the strongest Pearson correlation with food prices, suggesting shared underlying economic factors.
Table 3 reveals that service-oriented CPI components show strong rank-based correlations with food prices, indicating similar monotonic relationships with underlying economic conditions.
Kendall’s tau correlation results in Table 4 confirm the findings from the Spearman correlation analysis, with service-related CPI items showing the strongest associations with food prices.
The combined correlation results in Table 5 provide a more comprehensive view, highlighting both household maintenance services and non-alcoholic beverages as having the strongest overall correlation with food prices.

4.4. Lag Features Analysis

A lag correlation analysis was performed for previous months. The analysis created lagged features for all columns and calculated correlations with the target variable using all three correlation methods.
As shown in Table 6, a one-month lag of household appliances prices demonstrates the strongest correlation with current food prices, suggesting a potential leading indicator relationship.

4.5. Stationarity Test

The Augmented Dickey–Fuller (ADF) test was used to check for stationarity in the data series, which is required for some causality relationship tests. The test showed that all data series were non-stationary, necessitating a data transformation process to achieve stationarity.
To achieve stationarity required for causality tests, sequential differencing was applied until ADF tests confirmed stationarity (p < 0.05) for all 59 variables. The transformation varied by variable: food CPI (TP FG J011) required 2 differences, healthcare services (TP FG J063) required 4 differences, while most variables achieved stationarity with 1–2 differences. For ARDL analysis, both levels and differences were incorporated as this framework accommodates mixed I(0)/I(1) variables.

4.6. Autoregressive Distributed Lag (ARDL) Model

The ARDL analysis examined the relationship between the dependent variable and multiple independent variables from both short-term and long-term perspectives. The model considered 12-month lags for ‘TP FG J011’ and 1-month lags for other items. Initial screening of all 58 CPI variables revealed 17 significant coefficients (p < 0.05). After excluding aggregate indices (J01, J06, J08) to prevent multicollinearity with their constituent components, 13 significant coefficients from 12 unique variables remained. Table 7 presents the top 3 by coefficient magnitude, reflecting substantive economic impact.
The ARDL results in Table 7 highlight that healthcare-related CPI components and telecommunication services have significant statistical relationships with food prices, controlling for other factors.
Due to the high parameter-to-observation ratio when including all 58 variables (129 parameters for 137 observations), model performance metrics from the full specification were unreliable. Therefore, we re-estimated the ARDL(12,1) model using only the 12 selected variables (35 parameters for 137 observations) to obtain valid diagnostic metrics. Table 8 shows the model performance for selected 13 variables.

4.7. Cointegration Test (Engle-Granger)

The cointegration analysis tested for long-term relationships between non-stationary series.
The cointegration test results in Table 9 indicate strong long-term relationships between food prices and education, social protection, and transportation costs. The Engle-Granger two-step method first estimates the long-run relationship via OLS, then tests residual stationarity using the Augmented Dickey–Fuller test. The cointegration statistic represents the ADF test on OLS residuals; values more negative than the 5% critical value (−3.38) indicate rejection of the no-cointegration null hypothesis, confirming stable long-run equilibrium [10].

4.8. Random Forest Feature Importance Analysis

Random Forest feature importance scores were used to identify variables with the strongest relationship to the target variable. Two approaches were used: random 80%/20% split and chronological split.
The Random Forest analysis in Table 10 highlights insurance, medical products, and transport services as having the most predictive importance for food prices.

4.9. Analysis Results and Attribute Shortlist

The analyses identified ten key factors taking two best performing items from each Relationship and Causality analysis that have the most significant influence on the “TP FG J011 FOOD” CPI item, along with their most influential month lag:
As summarized in Table 11, healthcare-related items, communication services, house items, and education costs emerged as the most influential predictors for food prices in Türkiye.
Based on the optimal lag periods identified in the relationship and causality analysis (Table 10), temporal features were constructed for each predictor. Specifically:
-
TP FG J053 (Household Appliances): 1-month lag
-
TP FG J073 (Transportation Services): 6-month lag
-
TP FG J011 (Food CPI): 1-month lag (based on PACF analysis)
-
All other predictors: Current period (0-month lag)
These lagged features were consistently applied across all predictive models except NARX-RNN, which internally optimizes temporal dependencies. For ARIMA, which does not accommodate exogenous variables with mixed lag structures, only the autoregressive component of TP FG J011 was utilized.

4.10. Granger Causality Test

After stabilizing the dataset, Granger causality tests were applied to identify causal relationships between variables. Items with p-values less than 0.05 indicated significant causal relationships with food prices at specific lag months.
It is important to note that Granger causality testing served as a validation mechanism rather than the primary selection criterion. The ten features in our shortlist were identified through consensus across multiple analytical approaches (correlation, PACF, ARDL, Cointegration, Random Forest), each addressing different dimensions of economic relationships. Granger testing subsequently confirmed that all selected features exhibited predictive causality (p < 0.05 across relevant lag periods), validating the robustness of the multi-method selection approach.

5. Prediction Model Development and Results

5.1. Dataset Preparation

Using the identified features from previous analyses, a combined dataset was constructed for the price prediction model. Rice prices from the World Food Programme database were selected as a pilot case due to rice being a staple food item with price volatility that can be analyzed in relation to economic indicators. The dataset included both Turkish Lira (TL) and US Dollar (USD) prices, though only TL-based prices were considered for modeling purposes.
The target variable (Rice TL prices) and features were separated, and Min–Max normalization was applied to ensure that all variables were on the comparable scale. This preprocessing step was particularly important when using economic indicators with different value ranges to prevent any single variable from dominating the model based solely on its scale.
Critical temporal considerations guided the dataset construction. To eliminate look-ahead bias, all features were lagged by one month relative to the target variable. For instance, when predicting rice prices for March 2024, the model utilizes economic indicators from February 2024 and earlier. This temporal structure reflects realistic forecasting constraints where contemporaneous economic data are not yet published at the prediction time. The final dataset after temporal alignment and missing value removal comprised 91 monthly observations spanning 2017–2024.

5.2. Model Selection and Implementation

Based on the literature review, several models were selected and implemented to predict rice prices. Each model was evaluated using a chronologically train–test split of 80–20%, respectively. The performance was assessed using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and coefficient of determination (R2). The following models were implemented: Linear Regression, Random Forest, XGBoost, Support Vector Regressor (SVR), Long Short-Term Memory (LSTM), Artificial Neural Network (ANN), Gradient Boosting Regressor, ARIMA, NARX-RNN (with varying lag periods), ANFIS (Adaptive Neuro-Fuzzy Inference System).

5.3. Model Performance Results

The comparative performance of the predictive models, evaluated using the chronological test set (January 2024–October 2024), is summarized in Table 12. The models are ranked by their Coefficient of Determination (R2) on the unseen test data.
As shown in Table 12, XGBoost significantly outperformed other architectures, achieving the highest explanatory power (R2 = 0.9324) and the lowest error rates (MAE = 1.68). Notably, the transition to stationary log-return features dramatically improved the stability of deep learning models compared to preliminary trials; LSTM achieved a robust R2 of 0.8487. Conversely, the traditional Linear Regression model failed to capture the highly volatile momentum dynamics (R2 < 0), underscoring the necessity of non-linear modeling techniques for food price forecasting in emerging markets. In addition to results, hyper parameters shared in Table 13.

5.4. Model-Specific Findings

5.4.1. Xgboost and Gradient Boosting

The tree-based boosting ensemble methods demonstrated superior capability in capturing the high-frequency volatility of food prices. XGBoost emerged as the top-performing model (R2 = 0.9324), effectively utilizing the momentum features (lagged log-returns) to anticipate price shifts. Its regularization parameters (optimized via GridSearch) prevented the overfitting often seen in small datasets. Similarly, standard Gradient Boosting performed robustly (R2 = 0.8585), confirming that sequential error correction is highly effective for this domain.

5.4.2. NARX-RNN

The NARX-RNN model, which integrates exogenous economic indicators with autoregressive delays, achieved the second-best performance (R2 = 0.8902). Testing various lag configurations (3, 6, 9, 12 months) revealed that a 6-month lag provides the optimal prediction window, suggesting that economic shocks (e.g., transport cost increases) take approximately half a year to fully manifest in consumer rice prices.

5.4.3. LSTM (Long Short-Term Memory)

In contrast to preliminary trials where non-stationary data led to poor convergence, the implementation of the log-return strategy significantly stabilized the LSTM model. It achieved a very good performance (R2 = 0.8487), proving that deep learning models can be effective in small-sample economic forecasting when data stationarity is strictly enforced. The model successfully learned the temporal dependencies in the volatility clusters.

5.4.4. Artificial Neural Network (ANN)

The standard feed-forward ANN (Multilayer Perceptron) exhibited poor performance (R2 = 0.2684), contrasting sharply with the success of recurrent architectures like LSTM and NARX-RNN. This result empirically validates that the temporal dependency in food price formation cannot be adequately captured by static feed-forward layers alone. Without the “memory” mechanisms (gates or feedback loops) present in LSTM and NARX, the standard ANN treated observations as independent instances, failing to grasp the sequential nature of economic shocks.

5.4.5. Random Forest

The Random Forest model demonstrated “Good” performance (R2 = 0.6590) but lagged significantly behind boosting-based ensembles (XGBoost, Gradient Boosting). While the bagging approach successfully reduced variance, it struggled to capture the sharp directional shifts in the log-return series as effectively as the sequential error-correction mechanism of boosting algorithms. This indicates that for high-momentum volatility, refining errors sequentially (boosting) yields better accuracy than averaging independent learners (bagging).

5.4.6. ANFIS

The ANFIS model, previously limited by dimensionality issues, was optimized using a “Conservative” configuration (2 Gaussian membership functions, Learning Rate = 0.005) with the top-3 influential features. This focused approach yielded a strong generalization capability (R2 = 0.8577), demonstrating that neuro-fuzzy systems can offer high accuracy and interpretability when feature selection is rigorously applied to prevent rule explosion.

5.4.7. Linear Regression

The Linear Regression model yielded a negative R2 value (−0.1238) on the test set. This failure is significant as it empirically validates the non-linear nature of the problem. While linear models may fit the general trend of raw prices, they fail to capture the complex, non-linear fluctuations present in the stationary log-return series used in this stage. This confirms that food price volatility in emerging markets is driven by complex interactions that simple linear functions cannot approximate.

5.4.8. Ridge Regression

A notable finding is the divergent performance between standard Linear Regression (R2 = −0.1238) and Ridge Regression, which achieved an excellent R2 of 0.8729. While standard OLS failed due to the high volatility of the momentum features, Ridge Regression’s L2 regularization (α = 1.0) effectively managed multicollinearity and penalized extreme coefficients. This suggests that linear assumptions can hold in short-term forecasting only when strictly regularized against noise, providing a computationally efficient alternative to complex non-linear models.

5.4.9. Support Vector Regressor (SVR)

Despite optimization via Grid Search CV with an RBF kernel, SVR showed only moderate performance (R2 = 0.3906). The model struggled to generalize the decision boundaries in the high-dimensional momentum space. Unlike tree-based models which naturally handle non-linear interactions through splits, SVR appeared more sensitive to the noise inherent in the log-return features, failing to delineate a robust regression hyperplane for the test period.

5.4.10. SHAP Interpretability Insights

Following the comparative evaluation where XGBoost was identified as the champion model, we utilized the TreeExplainer algorithm to derive exact Shapley values. To address the stability concerns inherent in small-sample forecasting, we implemented a rolling window analysis across 5 temporal folds. Table 14 presents the consolidated Mean Absolute SHAP values and their standard deviations, offering a robust hierarchy of price determinants.
The analysis highlights three critical mechanisms driving rice price volatility:
  • Dominance of Service-Based Cost Push: The variable TP FG J125 (Insurance) emerged as the primary determinant (Rank 1, Mean SHAP: 0.0072), closely followed by TP FG J073 (Transportation Services) (Rank 2, Mean SHAP: 0.0066). The high standard deviations associated with these features (σ > μ) indicate that their influence is highly dynamic; they likely act as shock transmitters during periods of economic turbulence (e.g., policy rate changes or fuel price hikes) rather than providing a constant baseline effect.
  • Autoregressive Dynamics and Food Inflation Inertia: TP FG J011 (Food) ranked third (Mean SHAP: 0.0052), serving as a proxy for the broader momentum in the food market. This confirms that while the intrinsic inertia of food prices (autoregression) is significant, it is outweighed by external cost pressures from the services and logistics sectors (Insurance and Transport). This finding validates the hybrid modeling approach: predicting rice prices requires monitoring non-food macroeconomic indicators rather than relying solely on historical price trends.
  • Secondary Structural Drivers: TP FG J053 (Household Appliances) and TP FG J062 (Outpatient Services) rounded out the top five. These variables exhibited relatively lower standard deviations compared to the top two factors, suggesting they provide a more stable, albeit smaller, contribution to the price formation process, likely reflecting the general purchasing power parity and labor cost rigidities in the economy.

5.5. Model Robustness Assessment

5.5.1. Impact of Feature Engineering on Performance

The comparative analysis reveals that smoothing techniques are detrimental to forecasting accuracy in this specific domain. The Log-Return Transformation strategy significantly outperformed the Rolling Statistics approach. The lower performance of the Rolling Features strategy indicates that in high-inflation emerging markets, applying moving averages tends to obscure critical high-frequency price signals. The market memory is “short and sharp,” responding more to immediate lags than to smoothed trends.

5.5.2. Walk-Forward Validation Stability

To validate the best performed model’s reliability in a production-like environment, we conducted Recursive Walk-Forward Validation over the last 31 months (2022–2024). The XGBoost model achieved a Walk-Forward R2 of 0.8703 and a Mean Absolute Error (MAE) of 1.82 TL.
The results presented in Table 15 confirm the model’s robustness. Although the model faced challenges during the extreme volatility shocks of mid-2022 (as reflected in the RMSE), it demonstrated exceptional adaptation capability in the recent period (2024), maintaining low error rates. This validates that the model is not overfitting to a specific static window but is capable of learning evolving market dynamics dynamically.

6. Discussion

This study develops a comprehensive two-stage hybrid framework to forecast food prices in Türkiye. By integrating econometric feature selection with machine learning forecasting, the research addresses the specific challenges of modeling inflation in emerging markets, where volatility is high and sample sizes are often limited.

6.1. Drivers of Price Volatility: Services and Logistics

The relationship and causality analyses in Phase 1 revealed that food prices are not isolated agricultural phenomena but are deeply entrenched in cross-sectoral economic dynamics. The identification of Transportation Services (TP FG J073) and Insurance (TP FG J125) as top predictors via both ARDL and Random Forest selection challenges demand-side inflation theories. Instead, the data points to a strong cost-push mechanism driven by logistics and operational overheads.
The prominence of Insurance expenses—often a proxy for generalized financial risk and service inflation—as the top predictor in the SHAP analysis (Section 5.4.10) is particularly telling. It suggests that in high-inflation environments, pricing behavior is heavily influenced by anticipated risks and overhead costs rather than just raw material supply. Similarly, the structural link with Transportation confirms that logistics costs are immediately passed through to consumer food prices, a finding consistent with transmission mechanisms observed in import-dependent economies.

6.2. Market Memory and Price Inertia

The SHAP analysis identified the Lag-1 Food Price (autoregressive term) as the third most critical predictor. This finding provides empirical evidence of price inertia in the Turkish food market. It indicates that the current month’s price is significantly conditioned by the previous month’s momentum. However, the fact that external cost factors (Insurance and Transport) outranked this autoregressive term suggests that while the market has a “memory,” it is ultimately driven by structural cost shocks rather than purely historical trends.

6.3. Methodological Implications: Momentum vs. Smoothing

A critical methodological contribution of this study is the empirical comparison of feature engineering strategies. Contrary to common practices in time-series forecasting that employ moving averages to reduce noise, our results demonstrate that the Log-Return Transformation strategy (using raw log-returns and lags) significantly outperforms the Rolling Statistics approach.
The lower performance of models using rolling features suggests that in volatile emerging markets, “smoothing” is effectively “signal dilution.” Economic shocks in Türkiye (e.g., currency spikes or fuel price hikes) are sharp and immediate. By averaging these shocks over a 3-month window, critical information regarding the onset of a price spike is lost. This finding advocates for the use of stationary, high-frequency momentum features (rt, rt−1) over smoothed trends when modeling financial variables in unstable economies.

6.4. Predictive Robustness and Generalizability

Addressing the concerns regarding small sample sizes in macroeconomic modeling, this study moved beyond static validation. The Recursive Walk-Forward Validation results (Section 5.5.2) provide strong evidence of the model’s generalizability.
  • Operational Stability: The XGBoost model achieved a Walk-Forward R2 of 0.8703 and an MAE of 1.82 TL over a 31-month simulation (2022–2024). This confirms that the model maintains high accuracy even when retrained monthly with new data, making it suitable for continuous monitoring.
  • Line of Defense against Overfitting: The fact that the regularized linear baseline (Ridge Regression, R2 is approximately 0.87 in static tests) performed well confirms that the selected economic indicators carry a strong, genuine signal. However, the superior performance of XGBoost confirms that non-linear interactions (e.g., threshold effects between transport costs and food prices) are critical for minimizing error.
  • Data Leakage Prevention: By strictly retraining the scaler and model at each step of the walk-forward loop, we ensured that the reported accuracy reflects realistic, real-time forecasting capabilities, free from look-ahead bias.

6.5. Policy Implications

The findings offer a practical “early warning” framework for policymakers. The dominance of Insurance and Transportation costs suggests that interventions aimed at stabilizing food prices should not be limited to agricultural subsidies. Instead, stabilizing the service sector’s cost structure and managing exchange rate volatility (which drives transport costs) are prerequisites for controlling food inflation. Furthermore, the 6-month lag identified in the NARX-RNN analysis provides a tangible window for monetary or fiscal intervention before supply chain shocks fully materialize in consumer prices.

7. Conclusions

This study bridges econometric rigor with machine learning innovation to address a pressing challenge in emerging economies: accurate and interpretable food price forecasting under high volatility. By developing a two-stage framework that first identifies causally relevant economic indicators and then evaluates a diverse set of predictive models, we demonstrate that food inflation in Türkiye is structurally driven by non-food sectors, specifically Transportation and Insurance/Services.
Our findings yield three key contributions to the literature. First, we provide empirical evidence of cost-push transmission mechanisms. The dominance of Insurance and Transportation costs as the top predictors in the SHAP analysis reveals that food prices are highly sensitive to logistics and operational overheads. Furthermore, the significance of the autoregressive (Lag-1) term confirms the presence of inflation inertia, suggesting that pricing behavior is partly adaptive and backward-looking. Second, regarding methodological selection, we establish the superiority of the XGBoost model within a Log-Return engineering framework. A critical finding of this study is that traditional “smoothing” techniques (rolling means) dilute valuable volatility signals in high-inflation markets. The Log-Return Transformation strategy, which preserves high-frequency log-return dynamics, proved essential for capturing sudden price shocks. Third, we addressed the “small sample size” challenge through rigorous Recursive Walk-Forward Validation. Instead of relying on static splits, we demonstrated that the champion model maintains high predictive accuracy (Walk-Forward R2 = 0.8703, MAE = 1.82 TL) even when retrained monthly over a 31-month simulation period (2022–2024). This validates the model’s operational robustness and lack of overfitting, satisfying the stringent requirements for real-world policy deployment.
These results carry direct implications for inflation targeting. They suggest that stabilizing the service sector’s cost structure is a prerequisite for controlling food inflation. Additionally, the six-month lag identified in the NARX-RNN analysis offers a concrete temporal window for pre-emptive fiscal or monetary intervention.
Beyond empirical performance, these findings have profound implications for economic and social sustainability, as food security underpins equitable development and resilience. Food price stability remains a cornerstone of this security; however, our results suggest that ensuring stability requires a multi-sectoral policy approach. Specifically, the dominance of transportation and insurance costs indicates that sustainable agri-food policies should transition from traditional supply-side subsidies toward integrated risk management and the optimization of logistics infrastructures to manage cost-push mechanisms effectively.
From a broader perspective, the high sensitivity of food prices to transport costs underscores the potential benefits of ‘green logistics’ and reducing dependency on volatile energy inputs. Such a transition not only addresses logistics-driven inflation but also buffers food markets against systemic energy shocks. Consequently, this study provides a resilient early-warning tool for policymakers to implement preemptive interventions, effectively safeguarding market stability and the long-term sustainability of the food supply chain. Future research could extend this framework to other commodities or integrate real-time high-frequency data streams to further refine the early warning capabilities.

Author Contributions

Conceptualization, U.T.Ş., N.A., M.N. and H.P.; Methodology, U.T.Ş., N.A., M.N. and H.P.; Formal Analysis, U.T.Ş.; Investigation, U.T.Ş., N.A., M.N. and H.P.; Data Curation, U.T.Ş.; Writing—Original Draft Preparation, U.T.Ş., N.A., M.N. and H.P.; Writing—Review and Editing, U.T.Ş., N.A., M.N. and H.P.; Visualization U.T.Ş., N.A., M.N. and H.P.; Supervision, N.A., M.N. and H.P.; Project Administration, U.T.Ş., N.A., M.N. and H.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are publicly available from the Central Bank of the Republic of Türkiye (Electronic Data Delivery System—EVDS) at https://evds2.tcmb.gov.tr (accessed on 1 December 2024) and from the World Food Programme Price Database at https://data.world/wfp/7d7224ed-eff6-421f-9f96-9c8d43905f3c (accessed on 1 December 2024).

Acknowledgments

During the preparation of this manuscript, the authors used large language models (LLMs) exclusively for improving English language clarity and grammar. These tools were not involved in study design, data analysis, interpretation, or content generation. The authors have reviewed and edited all outputs and take full responsibility for the final content of the publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Karagöl, V. The effect of economic policy uncertainty on food prices: Time-varying causality analysis for selected countries. J. Econ. Policy Res. 2023, 10, 409–433. (In Turkish) [Google Scholar]
  2. Özçelik, Ö.; Uslu, N. An analysis on the determinants of food inflation: The case of Türkiye. Dumlupınar Univ. J. Soc. Sci. 2024, 79, 289–309. (In Turkish) [Google Scholar]
  3. Eştürk, Ö.; Albayrak, N. Investigation of the relationship between agricultural products-food price increases and inflation. Int. J. Econ. Adm. Inq. 2018, 18, 147–158. (In Turkish) [Google Scholar]
  4. Baumeister, C.; Kilian, L. Do oil price increases cause higher food prices? Econ. Policy 2014, 29, 691–747. [Google Scholar] [CrossRef]
  5. CBRT (Central Bank of the Republic of Türkiye). Electronic Data Delivery System. Available online: https://evds2.tcmb.gov.tr/index.php?/evds/serieMarket (accessed on 1 December 2024).
  6. World Food Programme. WFP Price Database. Available online: https://data.world/wfp/7d7224ed-eff6-421f-9f96-9c8d43905f3c (accessed on 1 December 2024).
  7. Fan, X.; Xu, Z.; Qin, Y.; Škare, M. Quantifying the short- and long-run impact of inflation-related price volatility on knowledge asset investment. J. Bus. Res. 2023, 165, 114048. [Google Scholar] [CrossRef]
  8. Cerveny, D. PPI and CPI: What Is the Relationship? Bachelor’s Thesis, Charles University, Faculty of Social Sciences, Prague, Czech Republic, 2023. [Google Scholar]
  9. Ozpolat, A. Causal link between consumer prices index and producer prices index: An evidence from Central and Eastern European Countries (CEECs). Adam Acad. J. Soc. Sci. 2020, 10, 319–332. [Google Scholar]
  10. Oyeleke, O.J.; Ojediran, S. Exploring the relationship between consumer price index (CPI) and producer price index (PPI) in Nigeria. Int. J. Stat. Appl. 2018, 8, 42–46. [Google Scholar]
  11. Akmercan, T. Estimation of Household Consumption Expenditures with Non-Parametric Regression Method: The Case of Turkey. Master’s Thesis, Dumlupınar University Institute of Social Sciences, Kütahya, Turkey, 2016. (In Turkish). [Google Scholar]
  12. Oktay, D.E. Comparison of Ordered and Unordered Restricted Choice Models: An Application on Fuel Type Choices of Households in Turkey. Ph.D. Thesis, Pamukkale University Institute of Social Sciences, Denizli, Turkey, 2016. (In Turkish). [Google Scholar]
  13. Yu, C.P. Why are there always inconsistent answers to the relation between the PPI and CPI? Re-examination using panel data analysis. Int. Rev. Account. Bank. Financ. 2016, 8, 14–31. [Google Scholar]
  14. Galodikwe, I.K. Exploring the Relationship Between Producer Price Index and Consumer Price Index in South Africa. Ph.D. Thesis, North-West University, Potchefstroom, South Africa, 2014. [Google Scholar]
  15. Emeç, H. Ordered Logit and Tobit Models for Different Expenditure Groups: Inter-Regional Comparison. Ph.D. Thesis, Dokuz Eylül University Institute of Social Sciences, İzmir, Turkey, 2001. (In Turkish). [Google Scholar]
  16. Özden, K. The dynamics affecting the export import ratio in Turkey: A hybrid model proposal with econometrics and machine learning approach. J. Econ. Policy Res. 2022, 9, 261–286. [Google Scholar] [CrossRef]
  17. Selim, S.; Balyaner, İ. Investigation of factors determining the number of IT products owned by households in Turkey. Pamukkale Univ. J. Soc. Sci. Inst. 2017, 26, 333–356. (In Turkish) [Google Scholar]
  18. Atalan, A. Forecasting drinking milk price based on economic, social, and environmental factors using machine learning algorithms. Agribusiness 2023, 39, 214–241. [Google Scholar] [CrossRef]
  19. Katsumbe, T.I. A Systems Dynamics Model for Utilities Optimization in the Food and Beverage Industry. Ph.D. Thesis, University of Johannesburg, Johannesburg, South Africa, 2022. [Google Scholar]
  20. Wanjuki, T.M.; Wagala, A.; Muriithi, D.K. Evaluating the predictive ability of seasonal autoregressive integrated moving average (SARIMA) models using food and beverages price index in Kenya. Eur. J. Math. Stat. 2022, 3, 28–38. [Google Scholar] [CrossRef]
  21. Warren-Vega, W.M.; Aguilar-Hernández, D.E.; Zárate-Guzmán, A.I.; Campos-Rodríguez, A.; Romero-Cano, L.A. Development of a predictive model for agave prices employing environmental, economic, and social factors: Towards a planned supply chain for agave-tequila industry. Foods 2022, 11, 1138. [Google Scholar] [CrossRef]
  22. Ji, M.; Liu, P.; Deng, Z.; Wu, Q. Prediction of national agricultural products wholesale price index in China using deep learning. Prog. Artif. Intell. 2022, 11, 121–129. [Google Scholar] [CrossRef]
  23. Venkateswara Rao, K.; Srilatha, D.; Jagan Mohan Reddy, D.; Desanamukula, V.S.; Kejela, M.L. Regression based price prediction of staple food materials using multivariate models. Sci. Program. 2022, 2022, 9547039. [Google Scholar] [CrossRef]
  24. Kresova, S.; Hess, S. Identifying the determinants of regional raw milk prices in Russia using machine learning. Agriculture 2022, 12, 1006. [Google Scholar] [CrossRef]
  25. Lutoslawski, K.; Hernes, M.; Radomska, J.; Hajdas, M.; Walaszczyk, E.; Kozina, A. Food demand prediction using the nonlinear autoregressive exogenous neural network. IEEE Access 2021, 9, 146123–146136. [Google Scholar] [CrossRef]
  26. Sarangi, P.K.; Gena, D.; Gena, S.; Vittal, N. Machine learning approach for the prediction of consumer food price index. In Proceedings of the 2021 6th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO), Noida, India, 14–15 April 2021; pp. 1–5. [Google Scholar]
  27. Tosun, N. Predictions of OECD Countries Fresh Fruit and Vegetable Imports with Data Mining Techniques and Machine Learning Models. Unpublished Doctoral Thesis, Marmara University, Istanbul, Türkiye, 2020. (In Turkish). [Google Scholar]
  28. Strader, T.J.; Rozycki, J.J.; Roots, T.H.; Huang, Y. Machine learning stock market prediction studies: Review and research directions. J. Int. Technol. Inf. Manag. 2020, 28, 63–83. [Google Scholar] [CrossRef]
  29. Selim, S.; Demirkıran, E. Socio-economic factors affecting household food expenditures in Türkiye: A comparative analysis. Hacet. Univ. J. Econ. Adm. Sci. 2019, 37, 147–172. (In Turkish) [Google Scholar]
  30. Abidoye, R.B.; Chan, A.P.; Abidoye, F.A.; Oshodi, O.S. Predicting property price index using artificial intelligence techniques: Evidence from Hong Kong. Int. J. Hous. Mark. Anal. 2019, 12, 1072–1092. [Google Scholar] [CrossRef]
  31. Soltani-Fesaghandis, G.; Pooya, A. Design of an artificial intelligence system for predicting success of new product development and selecting proper market-entry strategy. Neural Comput. Appl. 2018, 30, 2465–2484. [Google Scholar]
  32. Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. Statistical and machine learning forecasting methods: Concerns and ways forward. PLoS ONE 2018, 13, e0194889. [Google Scholar] [CrossRef]
  33. Muthayya, S.; Sugimoto, J.D.; Montgomery, S.; Maberly, G.F. An overview of global rice production, supply, trade, and consumption. Ann. N. Y. Acad. Sci. 2014, 1324, 7–14. [Google Scholar] [CrossRef]
  34. Valera, H.G.A. Is rice price a major source of inflation in the Philippines? A panel data analysis. Appl. Econ. Lett. 2022, 29, 1528–1532. [Google Scholar] [CrossRef]
  35. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
  36. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
  37. Dickey, D.A.; Fuller, W.A. Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 1979, 74, 427–431. [Google Scholar] [PubMed]
  38. Cohen, J.; Cohen, P.; West, S.G.; Aiken, L.S. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd ed.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 2003. [Google Scholar]
  39. Pesaran, M.H.; Shin, Y.; Smith, R.J. Bounds testing approaches to the analysis of level relationships. J. Appl. Econom. 2001, 16, 289–326. [Google Scholar] [CrossRef]
  40. Engle, R.F.; Granger, C.W.J. Co-integration and error correction: Representation, estimation, and testing. Econometrica 1987, 55, 251–276. [Google Scholar] [CrossRef]
  41. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  42. Saeys, Y.; Abeel, T.; Van de Peer, Y. Robust feature selection using ensemble feature selection techniques. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2008; pp. 313–325. [Google Scholar]
  43. Granger, C.W.J. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 1969, 37, 424–438. [Google Scholar] [CrossRef]
  44. Capitanio, F.; Rivieccio, G.; Adinolfi, F. Food Price Volatility and Asymmetries in Rural Areas of South Mediterranean Countries: A Copula-Based GARCH Model. Int. J. Environ. Res. Public Health 2020, 17, 5855. [Google Scholar] [CrossRef] [PubMed]
  45. Pal, A.; Wong, W.-K. Financial time series forecasting: A comprehensive review of signal processing and optimization-driven intelligent models. Comput. Econ. 2025, 1–27. [Google Scholar] [CrossRef]
  46. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  47. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  48. Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
  49. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  50. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  51. Lin, T.; Horne, B.G.; Tino, P.; Giles, C.L. Learning long-term dependencies in NARX recurrent neural networks. IEEE Trans. Neural Netw. 1996, 7, 1329–1338. [Google Scholar] [PubMed]
  52. Jang, J.S.R. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
  53. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
  54. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. NeurIPS 2017, 30, 4765–4774. [Google Scholar]
Table 1. Key Findings from Literature Review.
Table 1. Key Findings from Literature Review.
Year Category Methodological Approach Content
2024 Relationship and CausalityThe ARDL (Autoregressive Distributed Lag) method was employed.Özçelik and Uslu investigate the determinants of food inflation within the Turkish economy [2]. Based on ARDL modeling, the study finds that the Consumer Price Index for Food and Non-Alcoholic Beverages is positively influenced by the Domestic Producer Price Index for Agriculture, Forestry, and Fishing (UFET) and the Consumer Price Index for Electricity, Gas, and Other Fuels (TUFEE), while the Real Effective Exchange Rate based on CPI (REDK) exerts a negative impact. Furthermore, results from the ARDL Error Correction Model, which examines short-term dynamics, indicate that short-term imbalances are corrected in the long run.
2023 Relationship and CausalityPanel Structural Vector Autoregression (PSVAR) technique to assess inflation effectsFan et al. [7] explore the relationship between information asset investments and inflation. Utilizing the PSVAR method, they analyze both short- and long-term dynamics. Their findings suggest that low to moderate inflation levels are positively correlated with the market value of R&D firms, whereas high inflation has a negative effect.
2023 Relationship and CausalityGranger causality test to examine the PPI-CPI relationshipCerveny [8] investigates the link between Producer Price Index (PPI) and Consumer Price Index (CPI) in the Czech Republic and the Eurozone. Applying the Granger causality test, the study reveals that PPI influences CPI in the Czech Republic, whereas no such causal relationship is observed in the Eurozone.
2020 Relationship and CausalityPanel cointegration and panel causality testsOzpolat [9] analyzes the causal relationship between CPI and PPI in Central and Eastern European Countries (CEECs), using panel cointegration and panel causality tests. The results indicate a long-term, bidirectional causality between CPI and PPI in these countries.
2018 Relationship and CausalityEconometric methods: DF-GLS unit root test, Johansen and Engle-Granger cointegration approaches, VAR modelOyeleke and Ojediran [10] examine the relationship between PPI and CPI in Nigeria using various econometric techniques. The DF-GLS unit root test is applied to assess stationarity, Johansen and Engle-Granger methods are used for long-run cointegration, and a VAR model is employed to analyze interactions. The study concludes that the PPI-CPI relationship in Nigeria does not follow a simple cause-effect pattern and lacks a long-term equilibrium relationship.
2016 Relationship and CausalityNon-parametric regression using the LOESS techniqueAkmercan [11] investigates the relationships among household expenditures, income, and OECD household size data using the LOESS (Locally Estimated Scatterplot Smoothing) non-parametric regression method. Essential consumption items are aggregated into a single expenditure category for analysis.
2016 Relationship and CausalityComparison of ordered and unordered discrete choice models (LOGIT and PROBIT)Oktay [12] analyzes factors influencing household fuel choices for heating in Türkiye using TÜİK data. The study compares ordered and unordered discrete choice models, particularly LOGIT and PROBIT variants. Model performance is assessed using OLOGIT, GOLOGIT, PPO, HOLOGIT, AIC, BIC, and MNL statistics to determine the most suitable approach.
2016 Relationship and CausalityPanel data analysis and Dumitrescu-Hurlin panel causality testChih-Ping Yu [13] first applies panel data analysis to explore the general dynamics between CPI and PPI, then uses the Dumitrescu-Hurlin panel causality test for a deeper investigation into the causal nature of this relationship. This dual approach allows for a more nuanced understanding of inconsistencies in CPI-PPI transmission across countries.
2014 Relationship and CausalityCorrelation, regression, ANOVA, and coefficient of determination (R2)Galodikwe [14] investigates the PPI-CPI relationship using correlation analysis, regression models, ANOVA, and the coefficient of determination. The findings confirm that PPI indices significantly influence CPI indices.
2001 Relationship and CausalityLimitations of OLS and use of Tobit modelsEmeç [15] examines household consumption expenditures, highlighting the limitations of the Ordinary Least Squares (OLS) method when applied to continuous or ordinal dependent variables across regions. As a solution, Tobit models are suggested, where zero expenditures are bounded at zero, and certain continuous variables are categorized to fit ordered logit models. Results are interpreted in the context of Engel curves.
2022 Relationship and CausalityCombined econometric (ARDL) and machine learning (Support Vector Machine) approach; hybrid model proposed. VIF test used to avoid multicollinearity.
Evaluation Metrics: RMSE, MAE, R2
Ozden [16] investigates macroeconomic and financial determinants of Türkiye’s export-import ratio using both econometric and machine learning methods. The ARDL model is applied to monthly data (2010–2021) on normalized GDP, exchange rate, CPI, PPI, crude oil prices, and trade ratio. Trends of each variable are presented. A VIF test confirms no multicollinearity issues. Subsequently, Support Vector Machine (SVM) is used to capture complex patterns. Results from ARDL, SVM, and a hybrid ARDL-SVM model are compared using RMSE, MAE, and R2. The hybrid model, supported by machine learning, demonstrates superior performance in capturing variable interactions.
2016 Prediction ModelPoisson Quasi Maximum Likelihood estimation; Bootstrap validation testSelim and Balyaner [17] estimates the number of information technology devices owned by households using the Poisson Quasi Maximum Likelihood (PQML) estimation method. The validity of the model is assessed through bootstrap resampling techniques.
2023 Prediction ModelComparison of Random Forest, Gradient Boosting, SVM, Neural Networks, and AdaBoost
Evaluation Metrics: MSE, RMSE, MAE, R2
Atalan [18] evaluates economic, social, and environmental factors affecting unit prices of milk in Türkiye. Five machine learning algorithms—Random Forest, Gradient Boosting, Support Vector Machine (SVM), Artificial Neural Network, and AdaBoost—are used for price prediction. Performance is assessed using MSE, RMSE, MAE, and R2. Random Forest yields the best results. Additionally, Random Forest performance is reported across tree counts ranging from 10 to 2000.
2022 Prediction ModelSystem dynamics model for energy efficiency and resource optimization in the food and beverage industryKatsumbe [19] proposes a system dynamics model to optimize energy efficiency and resource use in the food and beverage sector. Separate sub-models are developed for water, electricity, and production lines, with input variables defined for each. Total consumption is formulated and compared against a baseline. The model is used to simulate one-year forecasts.
2022 Prediction ModelSARIMA model for forecasting food and beverage prices in Kenya, accounting for seasonality.
Evaluation Metrics: MSE, MAE, MAPE, Theil’s U statistic
Wanjuki et al. [20] propose a model for forecasting food and beverage prices in Kenya. Given seasonal fluctuations, the Seasonal Autoregressive Integrated Moving Average (SARIMA) model is employed. Model accuracy is evaluated using MSE, MAE, MAPE, and Theil’s U statistic. High predictive accuracy is achieved, and the model is recommended for short-term price forecasting in the food and beverage sector.
2022 Prediction ModelMultiple regression model implemented. Warren et al. [21] develop a multiple regression model to forecast agave (a key input in tequila production) prices. Variables include rainfall, harvest volume, tequila production, costs, exchange rates, and export volumes. The modelshows strong predictive performance (R = 0.86).
2022 Prediction ModelComparison of deep learning models: DA-RNN, NARX-RNN, MV-LSTM
Evaluation Metrics: RMSE, MAE, MAPE
Ji et al. [22] investigate deep learning approaches for forecasting wholesale agricultural prices in China. The Dual-Stage Attention-Based Recurrent Neural Network (DA-RNN) outperforms NARX-RNN and MV-LSTM models. Performance is evaluated using RMSE, MAE, and MAPE.
2022 Prediction ModelARCH and GARCH models for forecasting prices of food items (tomato, garlic, okra, pepper)Venkateswara et al. [23] present a regression-based multivariate approach to forecast prices of key food commodities. Emphasizing the importance of price volatility for governments, producers, and consumers, they apply ARCH (Autoregressive Conditional Heteroskedasticity) and GARCH (Generalized ARCH) models. While ARCH generally yields more consistent results, GARCH performs better for certain items.
2022 Prediction ModelRandom Forest with three cross-validation techniques: temporal, spatial, spatiotemporalKresove and Hess [24] analyze factors influencing raw milk prices in Russia using 17 variables. Feature selection is performed using Boruta analysis, confirming all variables as relevant. The Random Forest model is tested with three cross-validation strategies: temporal (for time-series), spatial (for geographical), and spatiotemporal (combined). The spatiotemporal approach is found to be the most effective.
2021 Prediction ModelNARXNN model for forecasting food demandLutoslawski et al. [25] employ the Nonlinear Autoregressive Exogenous Neural Network (NARXNN) model to forecast food demand. The study highlights that NARXNN, commonly used in time series forecasting, provides more accurate predictions than traditional regression models.
2021 Prediction ModelBackpropagation-trained ANN model for CPI forecasting
Evaluation: MAPE
Sarangi et al. [26] aim to forecast the Consumer Food Price Index (CFPI) in India using a machine learning approach. A backpropagation-trained Artificial Neural Network (ANN) is implemented using the Zaitun statistical software. MAPE values are used to validate model accuracy, which is reported to be very high, indicating strong predictive performance.
2020 Prediction ModelANN, Random Forest, and XGBoost models
Evaluation Metrics: R2, MAE, RMSE
Tosun [27] forecasts fresh fruit and vegetable imports for OECD countries using data mining and machine learning techniques. ANN, Random Forest, and XGBoost models are applied and compared using R2, RMSE, and MAE. XGBoost demonstrates the best overall performance.
2020 Prediction ModelApplicability of ANN, SVM, genetic algorithms, and hybrid techniques in stock price forecastingStrader et al. [28] conduct a study on stock price forecasting. Their findings suggest that:
Artificial Neural Networks (ANN) are best suited for predicting numerical stock index values;
Support Vector Machines (SVM) perform well in classification tasks, such as predicting market direction;
Hybrid machine learning techniques may overcome limitations of single-method approaches.
2019 Prediction ModelSuperiority of ANN over logarithmic regressionSelim and Demirkıran [29] analyze household budget survey data from TÜİK to identify factors affecting food expenditures and track temporal changes. They develop predictive models using logarithmic regression and Artificial Neural Networks (ANN). Results show that the ANN model outperforms the semi-logarithmic regression model in forecasting accuracy.
2019 Prediction ModelComparison of ANN, SVM, and ARIMA modelsAbidoye et al. [30] collect data on factors influencing real estate prices in Hong Kong and apply ARIMA, ANN, and SVM models. The models are used for out-of-sample forecasting. The ANN model outperforms both SVM and ARIMA in predictive accuracy.
2018 Prediction ModelANFIS (Adaptive Neuro-Fuzzy Inference System) combining fuzzy logic and neural networksSoltani and Pooya [31] design an AI system to predict the success of new food products. The ANFIS algorithm integrates fuzzy logic and neural networks, processing data from diverse sources such as market research and social media to forecast product performance.
2018 Prediction ModelEvaluation of machine learning as an alternative to statistical methods in time series forecastingMakridakis et al. [32] assess machine learning methods as alternatives to traditional statistical approaches in time series forecasting. Eight classical statistical methods and ten machine learning techniques are compared using sMAPE. The results show that statistical methods generally outperform machine learning models. However, the authors note that recent advancements may soon close this gap.
Table 2. Top 3 Items with Highest Pearson Correlation to Food CPI.
Table 2. Top 3 Items with Highest Pearson Correlation to Food CPI.
Item Code Description Rank Correlation Value
TP FG J053 053.Household Appliances10.99891
TP FG J051 051.Furniture, Furnishings, Carpets And Other Floor Coverings20.998844
TP FG J056 056.Goods And Services For Household Maintenance30.997916
Table 3. Top 3 Items with Highest Spearman Correlation to Food CPI.
Table 3. Top 3 Items with Highest Spearman Correlation to Food CPI.
Item Code Description Rank Correlation Value
TP FG J127 127.Other Services N.E.C.10.99776
TP FG J124 124.Social Protection20.997718
TP FG J062 062.Outpatient Services30.997687
Table 4. Top 3 Items with Highest Kendall Tau Correlation to Food CPI.
Table 4. Top 3 Items with Highest Kendall Tau Correlation to Food CPI.
Item Code Description Rank Correlation Value
TP FG J127 127.Other Services N.E.C.10.970884
TP FG J124 124.Social Protection20.970381
TP FG J062 062.Outpatient Services30.97034
Table 5. Combined Correlation Results (Average of Three Methods).
Table 5. Combined Correlation Results (Average of Three Methods).
Item Code Description Rank Average Correlation
TP FG J056 056.Goods And Services For Household Maintenance10.988253667
TP FG J012 012.Non-Alcoholic Beverages20.988152
TP FG J062 062.Outpatient Services30.988128
Table 6. Items with Highest Lagged Pearson Correlation to Food CPI.
Table 6. Items with Highest Lagged Pearson Correlation to Food CPI.
Item Code Description Lag Period Correlation Value
TP FG J053 053.Household Appliances10.999222306
TP FG J051 051.Furniture, Furnishings, Carpets And Other Floor Coverings10.998482096
TP FG J012 012.Non-Alcoholic Beverages10.997243669
Table 7. Significant Items in ARDL Analysis (p < 0.05).
Table 7. Significant Items in ARDL Analysis (p < 0.05).
Item Code Description Coefficient Std Error p-Value
TP FG J062.L0 062.Outpatient Services0.00920.0040.034
TP FG J061.L0 061.Medical Products, Appliances And Equipment0.00810.0030.028
TP FG J083.L1 083.Telephone And Telefax Services0.00790.0030.015
Table 8. Model Performance Results.
Table 8. Model Performance Results.
Metric Value Interpretation
R20.7867Strong explanatory power
Adjusted R20.7128Confirms model parsimony
F-statistic 10.65 (p < 0.001)Highly significant
Bounds Test F-statistic 25.14Strong cointegration evidence (I(1) bound = 4.35)
Durbin-Watson 2.47No severe autocorrelation
Table 9. Significant Items in Cointegration Test (p < 0.05).
Table 9. Significant Items in Cointegration Test (p < 0.05).
Item Code Description Cointegration Statistic p-Value Critical Value
TP FG J105 105.Education Programmes Of Unspecified Level−5.595188320.0000115−3.3777
TP FG J124 124.Social Protection−5.2395230440.0000583−3.3777
TP FG J072 072.Operation Of Personal Transport Equipment−4.5657300170.000953−3.3777
Table 10. Top 3 Features by Importance (Chronological Split).
Table 10. Top 3 Features by Importance (Chronological Split).
Item Code Description Rank Importance Score
TP FG J125 125.Insurance10.068466
TP FG J061 061.Medical Products, Appliances And Equipment20.039167
TP FG J073 073.Transport Services30.036515
Table 11. Key Predictor Variables for Food Price Prediction.
Table 11. Key Predictor Variables for Food Price Prediction.
Item Description Method Delay (Months)
TP FG J053 053. Household AppliancesPearson1
TP FG J051 051. Furniture, Fixtures, Carpets And Other Floor CoveringsPearson0
TP FG J073 073. Transportation ServicesSpearman6
TP FG J127 127. Other Unclassified ServicesSpearman and Kendall Tau0
TP FG J124 124. Social ProtectionSpearman, Kendall Tau, Cointegration0
TP FG J062 062. Outpatient ServicesKendall Tau, ARDL0
TP FG J105 105. Educational Programs Not Determined By LevelCointegration Test0
TP FG J061 061. Medical Products, Instruments And EquipmentARDL, Random Forest0
TP FG J125 125. InsuranceRandom Forest0
TP FG J011 011. FoodPACF1
Table 12. Performance Comparison of Prediction Models.
Table 12. Performance Comparison of Prediction Models.
Rank Model MAE RMSE R2 Performance Status
1 XGBoost1.68342.06840.9324Excellent
2 NARX-RNN (6-Lag)1.83632.65210.8902Excellent
3 Ridge Regression2.33532.83480.8729Excellent
4 Gradient Boosting2.28602.99200.8585Excellent
5 ANFIS2.38232.99980.8577Very Good
6 LSTM2.12753.09360.8487Very Good
7 Random Forest3.70444.64390.6590Good
8 SVR (RBF)5.03786.20840.3906Moderate
9 ANN (MLP)6.09036.80230.2684Poor
10 Linear Regression7.12398.4310−0.1238Poor
Table 13. Optimized Hyperparameters.
Table 13. Optimized Hyperparameters.
Model Optimized Hyperparameters
XGBoost n_estimators: 100, learning_rate: 0.1, max_depth: 5, reg_lambda: 1, subsample: 1.0
Ridge Regression alpha: 1.0
LSTM units: 64, learning_rate: 0.001, epochs: 50, batch_size: 16
Gradient Boosting n_estimators: 200, learning_rate: 0.01, max_depth: 3
SVR (RBF) C: 1, epsilon: 0.01, gamma: ‘scale’, kernel: ‘rbf’
Random Forest n_estimators: 100, max_depth: None, min_samples_split: 2
NARX-RNN hidden_size: 32, learning_rate: 0.001, epochs: 100, n_lags: 6
ANN (MLP) hidden_layer_sizes: (50, 50), activation: ‘relu’, alpha: 0.0001, learning_rate: ‘adaptive’
Linear Regression Default parameters (No regularization)
Table 14. SHAP Analysis results for the top 5 items.
Table 14. SHAP Analysis results for the top 5 items.
Rank Feature Description Mean SHAP Importance Std. Dev.
1 TP FG J125125. Insurance0.00720.0087
2 TP FG J073073. Transportation Services0.00660.0071
3 TP FG J011011. Food (Lag 1)0.00520.0038
4 TP FG J053053. Household Appliances0.00390.0022
5 TP FG J062062. Outpatient Services0.00340.0025
Table 15. Walk-Forward Validation Results.
Table 15. Walk-Forward Validation Results.
Metric Value Interpretation
R2 Score 0.8703High variance explanation despite volatility
MAE 1.8158 TLLow average deviation from actual prices
RMSE 4.1519 TLPenalizes large errors during shock periods (e.g., 2022)
MAPE 5.75%Excellent relative accuracy (<10%)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Şenel, U.T.; Arıcı, N.; Narin, M.; Polat, H. From Feature Selection to Forecasting: A Two-Stage Hybrid Framework for Food Price Prediction Using Economic Indicators in Türkiye. Sustainability 2026, 18, 503. https://doi.org/10.3390/su18010503

AMA Style

Şenel UT, Arıcı N, Narin M, Polat H. From Feature Selection to Forecasting: A Two-Stage Hybrid Framework for Food Price Prediction Using Economic Indicators in Türkiye. Sustainability. 2026; 18(1):503. https://doi.org/10.3390/su18010503

Chicago/Turabian Style

Şenel, Uğur Tahsin, Nursal Arıcı, Müslüme Narin, and Hüseyin Polat. 2026. "From Feature Selection to Forecasting: A Two-Stage Hybrid Framework for Food Price Prediction Using Economic Indicators in Türkiye" Sustainability 18, no. 1: 503. https://doi.org/10.3390/su18010503

APA Style

Şenel, U. T., Arıcı, N., Narin, M., & Polat, H. (2026). From Feature Selection to Forecasting: A Two-Stage Hybrid Framework for Food Price Prediction Using Economic Indicators in Türkiye. Sustainability, 18(1), 503. https://doi.org/10.3390/su18010503

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop