From Feature Selection to Forecasting: A Two-Stage Hybrid Framework for Food Price Prediction Using Economic Indicators in Türkiye
Abstract
1. Introduction
2. Literature Review
3. Methodology
3.1. Research Framework
3.2. Data Collection and Preprocessing
3.2.1. Dataset Creation
3.2.2. Data Preprocessing
3.3. Relationship and Causality Analysis
3.3.1. Time Series Analysis and Stationarity Testing
3.3.2. Correlation Analysis
3.3.3. Lag Features Analysis
3.3.4. Autoregressive Distributed Lag (ARDL) Model
3.3.5. Cointegration Test (Engle-Granger)
3.3.6. Random Forest Feature Importance
3.3.7. Attribute Shortlist
3.3.8. Granger Causality Test
3.4. Predictive Model Development
3.4.1. Feature Engineering Strategy
- 1.
- Logarithmic Differencing for Stationarity:
- 2.
- Lagged Features vs. Smoothing:
3.4.2. Linear Regression
3.4.3. Random Forest
3.4.4. Gradient Boosting
3.4.5. XGBoost
3.4.6. Support Vector Regression (SVR)
3.4.7. Long Short-Term Memory (LSTM)
3.4.8. Artificial Neural Networks (ANN)
3.4.9. NARX-RNN
3.4.10. ANFIS
3.4.11. SHAP-Based Ensemble Interpretability
3.5. Model Evaluation
3.5.1. Evaluation Metrics
- Mean Absolute Error (MAE):
- Root Mean Squared Error (RMSE):
- Coefficient of Determination (R2):
3.5.2. Comparative Feature Engineering Strategy
- Log-Return Transformation Strategy: Focuses on immediate logarithmic returns (rr) and short-term autoregressive lags (t − 1, t − 2). This strategy aims to capture high-frequency volatility without signal dilution.
- Rolling Statistics Strategy: Incorporates rolling means and standard deviations (window size = 3) to test whether smoothing short-term fluctuations improves predictive stability.
3.5.3. Recursive Walk-Forward Validation
4. Relationship and Causality Analyses and Results
4.1. Time Series Analysis
4.2. Partial Autocorrelation Function (PACF)
4.3. Correlation Analysis
4.4. Lag Features Analysis
4.5. Stationarity Test
4.6. Autoregressive Distributed Lag (ARDL) Model
4.7. Cointegration Test (Engle-Granger)
4.8. Random Forest Feature Importance Analysis
4.9. Analysis Results and Attribute Shortlist
- -
- TP FG J053 (Household Appliances): 1-month lag
- -
- TP FG J073 (Transportation Services): 6-month lag
- -
- TP FG J011 (Food CPI): 1-month lag (based on PACF analysis)
- -
- All other predictors: Current period (0-month lag)
4.10. Granger Causality Test
5. Prediction Model Development and Results
5.1. Dataset Preparation
5.2. Model Selection and Implementation
5.3. Model Performance Results
5.4. Model-Specific Findings
5.4.1. Xgboost and Gradient Boosting
5.4.2. NARX-RNN
5.4.3. LSTM (Long Short-Term Memory)
5.4.4. Artificial Neural Network (ANN)
5.4.5. Random Forest
5.4.6. ANFIS
5.4.7. Linear Regression
5.4.8. Ridge Regression
5.4.9. Support Vector Regressor (SVR)
5.4.10. SHAP Interpretability Insights
- Dominance of Service-Based Cost Push: The variable TP FG J125 (Insurance) emerged as the primary determinant (Rank 1, Mean SHAP: 0.0072), closely followed by TP FG J073 (Transportation Services) (Rank 2, Mean SHAP: 0.0066). The high standard deviations associated with these features (σ > μ) indicate that their influence is highly dynamic; they likely act as shock transmitters during periods of economic turbulence (e.g., policy rate changes or fuel price hikes) rather than providing a constant baseline effect.
- Autoregressive Dynamics and Food Inflation Inertia: TP FG J011 (Food) ranked third (Mean SHAP: 0.0052), serving as a proxy for the broader momentum in the food market. This confirms that while the intrinsic inertia of food prices (autoregression) is significant, it is outweighed by external cost pressures from the services and logistics sectors (Insurance and Transport). This finding validates the hybrid modeling approach: predicting rice prices requires monitoring non-food macroeconomic indicators rather than relying solely on historical price trends.
- Secondary Structural Drivers: TP FG J053 (Household Appliances) and TP FG J062 (Outpatient Services) rounded out the top five. These variables exhibited relatively lower standard deviations compared to the top two factors, suggesting they provide a more stable, albeit smaller, contribution to the price formation process, likely reflecting the general purchasing power parity and labor cost rigidities in the economy.
5.5. Model Robustness Assessment
5.5.1. Impact of Feature Engineering on Performance
5.5.2. Walk-Forward Validation Stability
6. Discussion
6.1. Drivers of Price Volatility: Services and Logistics
6.2. Market Memory and Price Inertia
6.3. Methodological Implications: Momentum vs. Smoothing
6.4. Predictive Robustness and Generalizability
- Operational Stability: The XGBoost model achieved a Walk-Forward R2 of 0.8703 and an MAE of 1.82 TL over a 31-month simulation (2022–2024). This confirms that the model maintains high accuracy even when retrained monthly with new data, making it suitable for continuous monitoring.
- Line of Defense against Overfitting: The fact that the regularized linear baseline (Ridge Regression, R2 is approximately 0.87 in static tests) performed well confirms that the selected economic indicators carry a strong, genuine signal. However, the superior performance of XGBoost confirms that non-linear interactions (e.g., threshold effects between transport costs and food prices) are critical for minimizing error.
- Data Leakage Prevention: By strictly retraining the scaler and model at each step of the walk-forward loop, we ensured that the reported accuracy reflects realistic, real-time forecasting capabilities, free from look-ahead bias.
6.5. Policy Implications
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Karagöl, V. The effect of economic policy uncertainty on food prices: Time-varying causality analysis for selected countries. J. Econ. Policy Res. 2023, 10, 409–433. (In Turkish) [Google Scholar]
- Özçelik, Ö.; Uslu, N. An analysis on the determinants of food inflation: The case of Türkiye. Dumlupınar Univ. J. Soc. Sci. 2024, 79, 289–309. (In Turkish) [Google Scholar]
- Eştürk, Ö.; Albayrak, N. Investigation of the relationship between agricultural products-food price increases and inflation. Int. J. Econ. Adm. Inq. 2018, 18, 147–158. (In Turkish) [Google Scholar]
- Baumeister, C.; Kilian, L. Do oil price increases cause higher food prices? Econ. Policy 2014, 29, 691–747. [Google Scholar] [CrossRef]
- CBRT (Central Bank of the Republic of Türkiye). Electronic Data Delivery System. Available online: https://evds2.tcmb.gov.tr/index.php?/evds/serieMarket (accessed on 1 December 2024).
- World Food Programme. WFP Price Database. Available online: https://data.world/wfp/7d7224ed-eff6-421f-9f96-9c8d43905f3c (accessed on 1 December 2024).
- Fan, X.; Xu, Z.; Qin, Y.; Škare, M. Quantifying the short- and long-run impact of inflation-related price volatility on knowledge asset investment. J. Bus. Res. 2023, 165, 114048. [Google Scholar] [CrossRef]
- Cerveny, D. PPI and CPI: What Is the Relationship? Bachelor’s Thesis, Charles University, Faculty of Social Sciences, Prague, Czech Republic, 2023. [Google Scholar]
- Ozpolat, A. Causal link between consumer prices index and producer prices index: An evidence from Central and Eastern European Countries (CEECs). Adam Acad. J. Soc. Sci. 2020, 10, 319–332. [Google Scholar]
- Oyeleke, O.J.; Ojediran, S. Exploring the relationship between consumer price index (CPI) and producer price index (PPI) in Nigeria. Int. J. Stat. Appl. 2018, 8, 42–46. [Google Scholar]
- Akmercan, T. Estimation of Household Consumption Expenditures with Non-Parametric Regression Method: The Case of Turkey. Master’s Thesis, Dumlupınar University Institute of Social Sciences, Kütahya, Turkey, 2016. (In Turkish). [Google Scholar]
- Oktay, D.E. Comparison of Ordered and Unordered Restricted Choice Models: An Application on Fuel Type Choices of Households in Turkey. Ph.D. Thesis, Pamukkale University Institute of Social Sciences, Denizli, Turkey, 2016. (In Turkish). [Google Scholar]
- Yu, C.P. Why are there always inconsistent answers to the relation between the PPI and CPI? Re-examination using panel data analysis. Int. Rev. Account. Bank. Financ. 2016, 8, 14–31. [Google Scholar]
- Galodikwe, I.K. Exploring the Relationship Between Producer Price Index and Consumer Price Index in South Africa. Ph.D. Thesis, North-West University, Potchefstroom, South Africa, 2014. [Google Scholar]
- Emeç, H. Ordered Logit and Tobit Models for Different Expenditure Groups: Inter-Regional Comparison. Ph.D. Thesis, Dokuz Eylül University Institute of Social Sciences, İzmir, Turkey, 2001. (In Turkish). [Google Scholar]
- Özden, K. The dynamics affecting the export import ratio in Turkey: A hybrid model proposal with econometrics and machine learning approach. J. Econ. Policy Res. 2022, 9, 261–286. [Google Scholar] [CrossRef]
- Selim, S.; Balyaner, İ. Investigation of factors determining the number of IT products owned by households in Turkey. Pamukkale Univ. J. Soc. Sci. Inst. 2017, 26, 333–356. (In Turkish) [Google Scholar]
- Atalan, A. Forecasting drinking milk price based on economic, social, and environmental factors using machine learning algorithms. Agribusiness 2023, 39, 214–241. [Google Scholar] [CrossRef]
- Katsumbe, T.I. A Systems Dynamics Model for Utilities Optimization in the Food and Beverage Industry. Ph.D. Thesis, University of Johannesburg, Johannesburg, South Africa, 2022. [Google Scholar]
- Wanjuki, T.M.; Wagala, A.; Muriithi, D.K. Evaluating the predictive ability of seasonal autoregressive integrated moving average (SARIMA) models using food and beverages price index in Kenya. Eur. J. Math. Stat. 2022, 3, 28–38. [Google Scholar] [CrossRef]
- Warren-Vega, W.M.; Aguilar-Hernández, D.E.; Zárate-Guzmán, A.I.; Campos-Rodríguez, A.; Romero-Cano, L.A. Development of a predictive model for agave prices employing environmental, economic, and social factors: Towards a planned supply chain for agave-tequila industry. Foods 2022, 11, 1138. [Google Scholar] [CrossRef]
- Ji, M.; Liu, P.; Deng, Z.; Wu, Q. Prediction of national agricultural products wholesale price index in China using deep learning. Prog. Artif. Intell. 2022, 11, 121–129. [Google Scholar] [CrossRef]
- Venkateswara Rao, K.; Srilatha, D.; Jagan Mohan Reddy, D.; Desanamukula, V.S.; Kejela, M.L. Regression based price prediction of staple food materials using multivariate models. Sci. Program. 2022, 2022, 9547039. [Google Scholar] [CrossRef]
- Kresova, S.; Hess, S. Identifying the determinants of regional raw milk prices in Russia using machine learning. Agriculture 2022, 12, 1006. [Google Scholar] [CrossRef]
- Lutoslawski, K.; Hernes, M.; Radomska, J.; Hajdas, M.; Walaszczyk, E.; Kozina, A. Food demand prediction using the nonlinear autoregressive exogenous neural network. IEEE Access 2021, 9, 146123–146136. [Google Scholar] [CrossRef]
- Sarangi, P.K.; Gena, D.; Gena, S.; Vittal, N. Machine learning approach for the prediction of consumer food price index. In Proceedings of the 2021 6th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO), Noida, India, 14–15 April 2021; pp. 1–5. [Google Scholar]
- Tosun, N. Predictions of OECD Countries Fresh Fruit and Vegetable Imports with Data Mining Techniques and Machine Learning Models. Unpublished Doctoral Thesis, Marmara University, Istanbul, Türkiye, 2020. (In Turkish). [Google Scholar]
- Strader, T.J.; Rozycki, J.J.; Roots, T.H.; Huang, Y. Machine learning stock market prediction studies: Review and research directions. J. Int. Technol. Inf. Manag. 2020, 28, 63–83. [Google Scholar] [CrossRef]
- Selim, S.; Demirkıran, E. Socio-economic factors affecting household food expenditures in Türkiye: A comparative analysis. Hacet. Univ. J. Econ. Adm. Sci. 2019, 37, 147–172. (In Turkish) [Google Scholar]
- Abidoye, R.B.; Chan, A.P.; Abidoye, F.A.; Oshodi, O.S. Predicting property price index using artificial intelligence techniques: Evidence from Hong Kong. Int. J. Hous. Mark. Anal. 2019, 12, 1072–1092. [Google Scholar] [CrossRef]
- Soltani-Fesaghandis, G.; Pooya, A. Design of an artificial intelligence system for predicting success of new product development and selecting proper market-entry strategy. Neural Comput. Appl. 2018, 30, 2465–2484. [Google Scholar]
- Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. Statistical and machine learning forecasting methods: Concerns and ways forward. PLoS ONE 2018, 13, e0194889. [Google Scholar] [CrossRef]
- Muthayya, S.; Sugimoto, J.D.; Montgomery, S.; Maberly, G.F. An overview of global rice production, supply, trade, and consumption. Ann. N. Y. Acad. Sci. 2014, 1324, 7–14. [Google Scholar] [CrossRef]
- Valera, H.G.A. Is rice price a major source of inflation in the Philippines? A panel data analysis. Appl. Econ. Lett. 2022, 29, 1528–1532. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
- Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
- Dickey, D.A.; Fuller, W.A. Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 1979, 74, 427–431. [Google Scholar] [PubMed]
- Cohen, J.; Cohen, P.; West, S.G.; Aiken, L.S. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd ed.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 2003. [Google Scholar]
- Pesaran, M.H.; Shin, Y.; Smith, R.J. Bounds testing approaches to the analysis of level relationships. J. Appl. Econom. 2001, 16, 289–326. [Google Scholar] [CrossRef]
- Engle, R.F.; Granger, C.W.J. Co-integration and error correction: Representation, estimation, and testing. Econometrica 1987, 55, 251–276. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Saeys, Y.; Abeel, T.; Van de Peer, Y. Robust feature selection using ensemble feature selection techniques. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2008; pp. 313–325. [Google Scholar]
- Granger, C.W.J. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 1969, 37, 424–438. [Google Scholar] [CrossRef]
- Capitanio, F.; Rivieccio, G.; Adinolfi, F. Food Price Volatility and Asymmetries in Rural Areas of South Mediterranean Countries: A Copula-Based GARCH Model. Int. J. Environ. Res. Public Health 2020, 17, 5855. [Google Scholar] [CrossRef] [PubMed]
- Pal, A.; Wong, W.-K. Financial time series forecasting: A comprehensive review of signal processing and optimization-driven intelligent models. Comput. Econ. 2025, 1–27. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Lin, T.; Horne, B.G.; Tino, P.; Giles, C.L. Learning long-term dependencies in NARX recurrent neural networks. IEEE Trans. Neural Netw. 1996, 7, 1329–1338. [Google Scholar] [PubMed]
- Jang, J.S.R. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern. 1993, 23, 665–685. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. NeurIPS 2017, 30, 4765–4774. [Google Scholar]
| Year | Category | Methodological Approach | Content |
|---|---|---|---|
| 2024 | Relationship and Causality | The ARDL (Autoregressive Distributed Lag) method was employed. | Özçelik and Uslu investigate the determinants of food inflation within the Turkish economy [2]. Based on ARDL modeling, the study finds that the Consumer Price Index for Food and Non-Alcoholic Beverages is positively influenced by the Domestic Producer Price Index for Agriculture, Forestry, and Fishing (UFET) and the Consumer Price Index for Electricity, Gas, and Other Fuels (TUFEE), while the Real Effective Exchange Rate based on CPI (REDK) exerts a negative impact. Furthermore, results from the ARDL Error Correction Model, which examines short-term dynamics, indicate that short-term imbalances are corrected in the long run. |
| 2023 | Relationship and Causality | Panel Structural Vector Autoregression (PSVAR) technique to assess inflation effects | Fan et al. [7] explore the relationship between information asset investments and inflation. Utilizing the PSVAR method, they analyze both short- and long-term dynamics. Their findings suggest that low to moderate inflation levels are positively correlated with the market value of R&D firms, whereas high inflation has a negative effect. |
| 2023 | Relationship and Causality | Granger causality test to examine the PPI-CPI relationship | Cerveny [8] investigates the link between Producer Price Index (PPI) and Consumer Price Index (CPI) in the Czech Republic and the Eurozone. Applying the Granger causality test, the study reveals that PPI influences CPI in the Czech Republic, whereas no such causal relationship is observed in the Eurozone. |
| 2020 | Relationship and Causality | Panel cointegration and panel causality tests | Ozpolat [9] analyzes the causal relationship between CPI and PPI in Central and Eastern European Countries (CEECs), using panel cointegration and panel causality tests. The results indicate a long-term, bidirectional causality between CPI and PPI in these countries. |
| 2018 | Relationship and Causality | Econometric methods: DF-GLS unit root test, Johansen and Engle-Granger cointegration approaches, VAR model | Oyeleke and Ojediran [10] examine the relationship between PPI and CPI in Nigeria using various econometric techniques. The DF-GLS unit root test is applied to assess stationarity, Johansen and Engle-Granger methods are used for long-run cointegration, and a VAR model is employed to analyze interactions. The study concludes that the PPI-CPI relationship in Nigeria does not follow a simple cause-effect pattern and lacks a long-term equilibrium relationship. |
| 2016 | Relationship and Causality | Non-parametric regression using the LOESS technique | Akmercan [11] investigates the relationships among household expenditures, income, and OECD household size data using the LOESS (Locally Estimated Scatterplot Smoothing) non-parametric regression method. Essential consumption items are aggregated into a single expenditure category for analysis. |
| 2016 | Relationship and Causality | Comparison of ordered and unordered discrete choice models (LOGIT and PROBIT) | Oktay [12] analyzes factors influencing household fuel choices for heating in Türkiye using TÜİK data. The study compares ordered and unordered discrete choice models, particularly LOGIT and PROBIT variants. Model performance is assessed using OLOGIT, GOLOGIT, PPO, HOLOGIT, AIC, BIC, and MNL statistics to determine the most suitable approach. |
| 2016 | Relationship and Causality | Panel data analysis and Dumitrescu-Hurlin panel causality test | Chih-Ping Yu [13] first applies panel data analysis to explore the general dynamics between CPI and PPI, then uses the Dumitrescu-Hurlin panel causality test for a deeper investigation into the causal nature of this relationship. This dual approach allows for a more nuanced understanding of inconsistencies in CPI-PPI transmission across countries. |
| 2014 | Relationship and Causality | Correlation, regression, ANOVA, and coefficient of determination (R2) | Galodikwe [14] investigates the PPI-CPI relationship using correlation analysis, regression models, ANOVA, and the coefficient of determination. The findings confirm that PPI indices significantly influence CPI indices. |
| 2001 | Relationship and Causality | Limitations of OLS and use of Tobit models | Emeç [15] examines household consumption expenditures, highlighting the limitations of the Ordinary Least Squares (OLS) method when applied to continuous or ordinal dependent variables across regions. As a solution, Tobit models are suggested, where zero expenditures are bounded at zero, and certain continuous variables are categorized to fit ordered logit models. Results are interpreted in the context of Engel curves. |
| 2022 | Relationship and Causality | Combined econometric (ARDL) and machine learning (Support Vector Machine) approach; hybrid model proposed. VIF test used to avoid multicollinearity. Evaluation Metrics: RMSE, MAE, R2 | Ozden [16] investigates macroeconomic and financial determinants of Türkiye’s export-import ratio using both econometric and machine learning methods. The ARDL model is applied to monthly data (2010–2021) on normalized GDP, exchange rate, CPI, PPI, crude oil prices, and trade ratio. Trends of each variable are presented. A VIF test confirms no multicollinearity issues. Subsequently, Support Vector Machine (SVM) is used to capture complex patterns. Results from ARDL, SVM, and a hybrid ARDL-SVM model are compared using RMSE, MAE, and R2. The hybrid model, supported by machine learning, demonstrates superior performance in capturing variable interactions. |
| 2016 | Prediction Model | Poisson Quasi Maximum Likelihood estimation; Bootstrap validation test | Selim and Balyaner [17] estimates the number of information technology devices owned by households using the Poisson Quasi Maximum Likelihood (PQML) estimation method. The validity of the model is assessed through bootstrap resampling techniques. |
| 2023 | Prediction Model | Comparison of Random Forest, Gradient Boosting, SVM, Neural Networks, and AdaBoost Evaluation Metrics: MSE, RMSE, MAE, R2 | Atalan [18] evaluates economic, social, and environmental factors affecting unit prices of milk in Türkiye. Five machine learning algorithms—Random Forest, Gradient Boosting, Support Vector Machine (SVM), Artificial Neural Network, and AdaBoost—are used for price prediction. Performance is assessed using MSE, RMSE, MAE, and R2. Random Forest yields the best results. Additionally, Random Forest performance is reported across tree counts ranging from 10 to 2000. |
| 2022 | Prediction Model | System dynamics model for energy efficiency and resource optimization in the food and beverage industry | Katsumbe [19] proposes a system dynamics model to optimize energy efficiency and resource use in the food and beverage sector. Separate sub-models are developed for water, electricity, and production lines, with input variables defined for each. Total consumption is formulated and compared against a baseline. The model is used to simulate one-year forecasts. |
| 2022 | Prediction Model | SARIMA model for forecasting food and beverage prices in Kenya, accounting for seasonality. Evaluation Metrics: MSE, MAE, MAPE, Theil’s U statistic | Wanjuki et al. [20] propose a model for forecasting food and beverage prices in Kenya. Given seasonal fluctuations, the Seasonal Autoregressive Integrated Moving Average (SARIMA) model is employed. Model accuracy is evaluated using MSE, MAE, MAPE, and Theil’s U statistic. High predictive accuracy is achieved, and the model is recommended for short-term price forecasting in the food and beverage sector. |
| 2022 | Prediction Model | Multiple regression model implemented. | Warren et al. [21] develop a multiple regression model to forecast agave (a key input in tequila production) prices. Variables include rainfall, harvest volume, tequila production, costs, exchange rates, and export volumes. The modelshows strong predictive performance (R = 0.86). |
| 2022 | Prediction Model | Comparison of deep learning models: DA-RNN, NARX-RNN, MV-LSTM Evaluation Metrics: RMSE, MAE, MAPE | Ji et al. [22] investigate deep learning approaches for forecasting wholesale agricultural prices in China. The Dual-Stage Attention-Based Recurrent Neural Network (DA-RNN) outperforms NARX-RNN and MV-LSTM models. Performance is evaluated using RMSE, MAE, and MAPE. |
| 2022 | Prediction Model | ARCH and GARCH models for forecasting prices of food items (tomato, garlic, okra, pepper) | Venkateswara et al. [23] present a regression-based multivariate approach to forecast prices of key food commodities. Emphasizing the importance of price volatility for governments, producers, and consumers, they apply ARCH (Autoregressive Conditional Heteroskedasticity) and GARCH (Generalized ARCH) models. While ARCH generally yields more consistent results, GARCH performs better for certain items. |
| 2022 | Prediction Model | Random Forest with three cross-validation techniques: temporal, spatial, spatiotemporal | Kresove and Hess [24] analyze factors influencing raw milk prices in Russia using 17 variables. Feature selection is performed using Boruta analysis, confirming all variables as relevant. The Random Forest model is tested with three cross-validation strategies: temporal (for time-series), spatial (for geographical), and spatiotemporal (combined). The spatiotemporal approach is found to be the most effective. |
| 2021 | Prediction Model | NARXNN model for forecasting food demand | Lutoslawski et al. [25] employ the Nonlinear Autoregressive Exogenous Neural Network (NARXNN) model to forecast food demand. The study highlights that NARXNN, commonly used in time series forecasting, provides more accurate predictions than traditional regression models. |
| 2021 | Prediction Model | Backpropagation-trained ANN model for CPI forecasting Evaluation: MAPE | Sarangi et al. [26] aim to forecast the Consumer Food Price Index (CFPI) in India using a machine learning approach. A backpropagation-trained Artificial Neural Network (ANN) is implemented using the Zaitun statistical software. MAPE values are used to validate model accuracy, which is reported to be very high, indicating strong predictive performance. |
| 2020 | Prediction Model | ANN, Random Forest, and XGBoost models Evaluation Metrics: R2, MAE, RMSE | Tosun [27] forecasts fresh fruit and vegetable imports for OECD countries using data mining and machine learning techniques. ANN, Random Forest, and XGBoost models are applied and compared using R2, RMSE, and MAE. XGBoost demonstrates the best overall performance. |
| 2020 | Prediction Model | Applicability of ANN, SVM, genetic algorithms, and hybrid techniques in stock price forecasting | Strader et al. [28] conduct a study on stock price forecasting. Their findings suggest that: Artificial Neural Networks (ANN) are best suited for predicting numerical stock index values; Support Vector Machines (SVM) perform well in classification tasks, such as predicting market direction; Hybrid machine learning techniques may overcome limitations of single-method approaches. |
| 2019 | Prediction Model | Superiority of ANN over logarithmic regression | Selim and Demirkıran [29] analyze household budget survey data from TÜİK to identify factors affecting food expenditures and track temporal changes. They develop predictive models using logarithmic regression and Artificial Neural Networks (ANN). Results show that the ANN model outperforms the semi-logarithmic regression model in forecasting accuracy. |
| 2019 | Prediction Model | Comparison of ANN, SVM, and ARIMA models | Abidoye et al. [30] collect data on factors influencing real estate prices in Hong Kong and apply ARIMA, ANN, and SVM models. The models are used for out-of-sample forecasting. The ANN model outperforms both SVM and ARIMA in predictive accuracy. |
| 2018 | Prediction Model | ANFIS (Adaptive Neuro-Fuzzy Inference System) combining fuzzy logic and neural networks | Soltani and Pooya [31] design an AI system to predict the success of new food products. The ANFIS algorithm integrates fuzzy logic and neural networks, processing data from diverse sources such as market research and social media to forecast product performance. |
| 2018 | Prediction Model | Evaluation of machine learning as an alternative to statistical methods in time series forecasting | Makridakis et al. [32] assess machine learning methods as alternatives to traditional statistical approaches in time series forecasting. Eight classical statistical methods and ten machine learning techniques are compared using sMAPE. The results show that statistical methods generally outperform machine learning models. However, the authors note that recent advancements may soon close this gap. |
| Item Code | Description | Rank | Correlation Value |
|---|---|---|---|
| TP FG J053 | 053.Household Appliances | 1 | 0.99891 |
| TP FG J051 | 051.Furniture, Furnishings, Carpets And Other Floor Coverings | 2 | 0.998844 |
| TP FG J056 | 056.Goods And Services For Household Maintenance | 3 | 0.997916 |
| Item Code | Description | Rank | Correlation Value |
|---|---|---|---|
| TP FG J127 | 127.Other Services N.E.C. | 1 | 0.99776 |
| TP FG J124 | 124.Social Protection | 2 | 0.997718 |
| TP FG J062 | 062.Outpatient Services | 3 | 0.997687 |
| Item Code | Description | Rank | Correlation Value |
|---|---|---|---|
| TP FG J127 | 127.Other Services N.E.C. | 1 | 0.970884 |
| TP FG J124 | 124.Social Protection | 2 | 0.970381 |
| TP FG J062 | 062.Outpatient Services | 3 | 0.97034 |
| Item Code | Description | Rank | Average Correlation |
|---|---|---|---|
| TP FG J056 | 056.Goods And Services For Household Maintenance | 1 | 0.988253667 |
| TP FG J012 | 012.Non-Alcoholic Beverages | 2 | 0.988152 |
| TP FG J062 | 062.Outpatient Services | 3 | 0.988128 |
| Item Code | Description | Lag Period | Correlation Value |
|---|---|---|---|
| TP FG J053 | 053.Household Appliances | 1 | 0.999222306 |
| TP FG J051 | 051.Furniture, Furnishings, Carpets And Other Floor Coverings | 1 | 0.998482096 |
| TP FG J012 | 012.Non-Alcoholic Beverages | 1 | 0.997243669 |
| Item Code | Description | Coefficient | Std Error | p-Value |
|---|---|---|---|---|
| TP FG J062.L0 | 062.Outpatient Services | 0.0092 | 0.004 | 0.034 |
| TP FG J061.L0 | 061.Medical Products, Appliances And Equipment | 0.0081 | 0.003 | 0.028 |
| TP FG J083.L1 | 083.Telephone And Telefax Services | 0.0079 | 0.003 | 0.015 |
| Metric | Value | Interpretation |
|---|---|---|
| R2 | 0.7867 | Strong explanatory power |
| Adjusted R2 | 0.7128 | Confirms model parsimony |
| F-statistic | 10.65 (p < 0.001) | Highly significant |
| Bounds Test F-statistic | 25.14 | Strong cointegration evidence (I(1) bound = 4.35) |
| Durbin-Watson | 2.47 | No severe autocorrelation |
| Item Code | Description | Cointegration Statistic | p-Value | Critical Value |
|---|---|---|---|---|
| TP FG J105 | 105.Education Programmes Of Unspecified Level | −5.59518832 | 0.0000115 | −3.3777 |
| TP FG J124 | 124.Social Protection | −5.239523044 | 0.0000583 | −3.3777 |
| TP FG J072 | 072.Operation Of Personal Transport Equipment | −4.565730017 | 0.000953 | −3.3777 |
| Item Code | Description | Rank | Importance Score |
|---|---|---|---|
| TP FG J125 | 125.Insurance | 1 | 0.068466 |
| TP FG J061 | 061.Medical Products, Appliances And Equipment | 2 | 0.039167 |
| TP FG J073 | 073.Transport Services | 3 | 0.036515 |
| Item | Description | Method | Delay (Months) |
|---|---|---|---|
| TP FG J053 | 053. Household Appliances | Pearson | 1 |
| TP FG J051 | 051. Furniture, Fixtures, Carpets And Other Floor Coverings | Pearson | 0 |
| TP FG J073 | 073. Transportation Services | Spearman | 6 |
| TP FG J127 | 127. Other Unclassified Services | Spearman and Kendall Tau | 0 |
| TP FG J124 | 124. Social Protection | Spearman, Kendall Tau, Cointegration | 0 |
| TP FG J062 | 062. Outpatient Services | Kendall Tau, ARDL | 0 |
| TP FG J105 | 105. Educational Programs Not Determined By Level | Cointegration Test | 0 |
| TP FG J061 | 061. Medical Products, Instruments And Equipment | ARDL, Random Forest | 0 |
| TP FG J125 | 125. Insurance | Random Forest | 0 |
| TP FG J011 | 011. Food | PACF | 1 |
| Rank | Model | MAE | RMSE | R2 | Performance Status |
|---|---|---|---|---|---|
| 1 | XGBoost | 1.6834 | 2.0684 | 0.9324 | Excellent |
| 2 | NARX-RNN (6-Lag) | 1.8363 | 2.6521 | 0.8902 | Excellent |
| 3 | Ridge Regression | 2.3353 | 2.8348 | 0.8729 | Excellent |
| 4 | Gradient Boosting | 2.2860 | 2.9920 | 0.8585 | Excellent |
| 5 | ANFIS | 2.3823 | 2.9998 | 0.8577 | Very Good |
| 6 | LSTM | 2.1275 | 3.0936 | 0.8487 | Very Good |
| 7 | Random Forest | 3.7044 | 4.6439 | 0.6590 | Good |
| 8 | SVR (RBF) | 5.0378 | 6.2084 | 0.3906 | Moderate |
| 9 | ANN (MLP) | 6.0903 | 6.8023 | 0.2684 | Poor |
| 10 | Linear Regression | 7.1239 | 8.4310 | −0.1238 | Poor |
| Model | Optimized Hyperparameters |
|---|---|
| XGBoost | n_estimators: 100, learning_rate: 0.1, max_depth: 5, reg_lambda: 1, subsample: 1.0 |
| Ridge Regression | alpha: 1.0 |
| LSTM | units: 64, learning_rate: 0.001, epochs: 50, batch_size: 16 |
| Gradient Boosting | n_estimators: 200, learning_rate: 0.01, max_depth: 3 |
| SVR (RBF) | C: 1, epsilon: 0.01, gamma: ‘scale’, kernel: ‘rbf’ |
| Random Forest | n_estimators: 100, max_depth: None, min_samples_split: 2 |
| NARX-RNN | hidden_size: 32, learning_rate: 0.001, epochs: 100, n_lags: 6 |
| ANN (MLP) | hidden_layer_sizes: (50, 50), activation: ‘relu’, alpha: 0.0001, learning_rate: ‘adaptive’ |
| Linear Regression | Default parameters (No regularization) |
| Rank | Feature | Description | Mean SHAP Importance | Std. Dev. |
|---|---|---|---|---|
| 1 | TP FG J125 | 125. Insurance | 0.0072 | 0.0087 |
| 2 | TP FG J073 | 073. Transportation Services | 0.0066 | 0.0071 |
| 3 | TP FG J011 | 011. Food (Lag 1) | 0.0052 | 0.0038 |
| 4 | TP FG J053 | 053. Household Appliances | 0.0039 | 0.0022 |
| 5 | TP FG J062 | 062. Outpatient Services | 0.0034 | 0.0025 |
| Metric | Value | Interpretation |
|---|---|---|
| R2 Score | 0.8703 | High variance explanation despite volatility |
| MAE | 1.8158 TL | Low average deviation from actual prices |
| RMSE | 4.1519 TL | Penalizes large errors during shock periods (e.g., 2022) |
| MAPE | 5.75% | Excellent relative accuracy (<10%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Şenel, U.T.; Arıcı, N.; Narin, M.; Polat, H. From Feature Selection to Forecasting: A Two-Stage Hybrid Framework for Food Price Prediction Using Economic Indicators in Türkiye. Sustainability 2026, 18, 503. https://doi.org/10.3390/su18010503
Şenel UT, Arıcı N, Narin M, Polat H. From Feature Selection to Forecasting: A Two-Stage Hybrid Framework for Food Price Prediction Using Economic Indicators in Türkiye. Sustainability. 2026; 18(1):503. https://doi.org/10.3390/su18010503
Chicago/Turabian StyleŞenel, Uğur Tahsin, Nursal Arıcı, Müslüme Narin, and Hüseyin Polat. 2026. "From Feature Selection to Forecasting: A Two-Stage Hybrid Framework for Food Price Prediction Using Economic Indicators in Türkiye" Sustainability 18, no. 1: 503. https://doi.org/10.3390/su18010503
APA StyleŞenel, U. T., Arıcı, N., Narin, M., & Polat, H. (2026). From Feature Selection to Forecasting: A Two-Stage Hybrid Framework for Food Price Prediction Using Economic Indicators in Türkiye. Sustainability, 18(1), 503. https://doi.org/10.3390/su18010503

