Analyzing Internal and External Factors in Livestock Supply Forecasting Using Machine Learning: Sustainable Insights from South Korea
Abstract
:1. Introduction
- We propose a methodology to forecast pork supply using both internal and external factors. Internal factors include breeding count, feeding cost, shipment cost, and production size, covering the entire process from breeding to market delivery. External factors include market price, weather, exchange rates, gasoline prices, and disease outbreaks. Unlike previous methods, we consider a broader range of factors to create a more comprehensive method for forecasting livestock indexes.
- We propose SFE-NET, a stacked forest ensemble neural network, for more accurate and robust pork supply forecasting. SFE-NET uses a two-layer training process. The first layer trains member methods, specifically Random Forest (RFR), Gradient Boosting (GBR), and XGBoost (XGBR), on the same data. The second layer trains a neural network on the output of these methods. SFE-NET combines the strengths of different methods and reduces their weaknesses, resulting in more reliable forecasts for both short and long periods.
- We conducted experiments to compare the effectiveness of the proposed method with various state-of-the-art and well-established methods. These experiments evaluated the impact of using only internal factors, both internal and external factors, and selected factors on forecasting accuracy. SFE-NET was tested on daily, weekly, and monthly forecasting scenarios, showing accuracies of 91%, 84%, and 88%, respectively. External factors improved forecasting accuracy by 2% to 12% compared to using only internal factors in terms of R2. SFE-NET outperformed the comparative methods by 1% to 18% in R2.
2. Related Studies
2.1. Methods Based on Internal Factors
2.2. Methods Based on External Factors
2.3. Methods Based on Internal and External Factors
3. Materials and Methods
3.1. Overview
3.2. Dataset
3.3. Correlation Analysis
3.4. Data Preprocessing
3.4.1. Mitigating Outliers
- The mean daily pork supply is about 3500 tons. However, there are days when supplies drop to nearly 3 tons.
- The mean feeding cost is approximately KRW 368 million. However, there are a few cases where the cost is around KRW 110 billion.
- The mean market sales weight is about 940,000 kg. However, there are a few days when it exceeds 18 million kg.
- Sort the data points.
- Set the minimum () and maximum () percentages to replace.
- Replace the smallest values with the ()th smallest value.
- Replace the largest values with the ()th largest value.
3.4.2. Filling Missing Values
3.4.3. Data Normalization
3.5. SFE-NET: Stacked Forest Ensemble Neural Network
4. Results
4.1. Experimental Setup
- scikit-learn (version 1.1.3) for implementing RFR, GBR, and SFE-NET;
- xgboost (version 1.7.1) for implementing XGBR;
- lightgbm (version 4.4.0) for implementing LGBMR;
- gplearn (version 0.4.2) for implementing SYMR;
- scikit-elm (version 0.21) for implementing ELMR;
- scipy (version 1.9.3) for implementing statistical tests.
4.1.1. Dataset
4.1.2. Comparative Methods
- RFR combines multiple decision trees to perform robust and accurate predictions. It can handle high-dimensional data and provide feature importance insights.
- GBR is considered highly accurate for non-linear data with various characteristics. It is also effective for handling noisy data, including outliers.
- XGBR includes several regularization techniques to avoid overfitting, which makes it efficient for complex datasets.
- LGBMR is widely used in classification and regression due to its high efficiency and scalability in handling large-scale and complex datasets.
- SYMR searches the fixed forms of mathematical expressions to find the method for a given dataset. It tries to discover the method’s structure and parameters.
- ELMR is a neural network in which only the weights between the hidden and output layers are learned. It offers a fast and efficient training process compared to traditional neural networks.
4.1.3. Evaluation Metrics
- The Coefficient of Determination (R2) measures the proportion of actual values that forecasted values can explain. We use the percentage scale of R2 by multiplying the original scale (0 to 1) by 100. A value of 0 means the method explains nothing, while 100 means the method explains perfectly.
- Root Mean Squared Error (RMSE) is the average magnitude of prediction error using the same units as prediction. Lower RMSE values indicate better method performance.
- MAE stands for Mean Absolute Error, measures the average magnitude of the errors and is less sensitive to outliers compared to RMSE. Lower MAE values indicate better method performance.
- Symmetric Mean Absolute Percentage Error (sMAPE) measures the accuracy of time-series forecasting. Lower sMAPE values indicate better method performance.
4.2. Experimental Results
4.2.1. Results of Hyperparameter Tuning
4.2.2. Results of Daily Pork Supply Forecasting
4.2.3. Results of Weekly Pork Supply Forecasting
4.2.4. Results of Monthly Pork Supply Forecasting
4.2.5. Ablation Study on SFE-NET
4.2.6. Comparison of All Results and Discussion
- External features can significantly improve pork supply forecasting performance (i.e., Experiments 2, 6).
- Carefully selected relevant features can improve the performance of pork supply forecasting (i.e., Experiments 3, 4, 7).
- Selecting optimal hyperparameters is an essential step to achieving good forecasting performance (Experiment 4).
- The proposed SFE-NET method outperforms all base methods in terms of accuracy metrics, R2, RMSE, MAE, and sMAPE.
- The SFE-NET method provides accurate and robust short and long-term forecasting performance.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Agricultural Output—Meat Consumption—OECD Data. Available online: https://data.oecd.org/agroutput/meat-consumption.htm (accessed on 13 May 2024).
- Park, S.-H.; Moon, K.-M. The Economic Effects of Research-led Agricultural Development Assistance: The Case of Korean Programs on International Agriculture. Sustainability 2019, 11, 5224. [Google Scholar] [CrossRef]
- Manitoba. Pork Market in South Korea, 2024. Available online: https://www.gov.mb.ca/agriculture/markets-and-statistics/trade-statistics/pubs/pork-market-in-south-korea.pdf (accessed on 28 May 2024).
- Lim, H.; Ahn, B.-I. Asymmetric price transmission in the distribution channels of pork: Focusing on the effect of policy regulation of Sunday sales by hypermarkets in Korea. Agric. Econ. 2020, 66, 499–509. [Google Scholar] [CrossRef]
- Kim, J.; Han, H.-D.; Lee, W.Y.; Wakholi, C.; Lee, J.; Jeong, Y.-B.; Bae, J.H.; Cho, B.-K. Economic Analysis of the Use of VCS2000 for Pork Carcass Meat Yield Grading in Korea. Animals 2021, 11, 1297. [Google Scholar] [CrossRef]
- Cho, K.; Kim, H.; Kim, Y.; Kang, H.; Martínez-López, B.; Lee, J. Quantitative risk assessment of the African Swine Fever Introduction into the Republic of Korea via legal import of live pigs and pig products. Transbound. Emerg. Dis. 2020, 68, 385–396. [Google Scholar] [CrossRef] [PubMed]
- Zhang, F.; Wang, F.; Wang, F. Forecasting Model and Related Index of Pig Population in China. Symmetry 2021, 13, 114. [Google Scholar] [CrossRef]
- Gauthier, R.; Largouët, C.; Rozé, L.; Dourmad, J.-Y. Online forecasting of daily feed intake in lactating sows supported by offline time-series clustering, for precision livestock farming. Comput. Electron. Agric. 2021, 188, 106329. [Google Scholar] [CrossRef]
- Zhang, F.; Wang, F. Prediction of pork supply via the calculation of pig population based on population prediction model. Int. J. Agric. Eng. 2020, 13, 208–217. [Google Scholar] [CrossRef]
- Emediegwu, L.E.; Ubabukoh, C.L. Re-examining the impact of annual weather fluctuations on global livestock production. Ecol. Econ. 2023, 204, 107662. [Google Scholar] [CrossRef]
- Kim, H.N.; Choi, I.-C. The Economic Impact of Government Policy on Market Prices of Low-Fat Pork in South Korea: A Quasi-Experimental Hedonic Price Approach. Sustainability 2018, 10, 892. [Google Scholar] [CrossRef]
- Šrédl, K.; Prášilová, M.; Severová, L.; Svoboda, R.; Štěbeták, M. Social and Economic Aspects of Sustainable Development of Livestock Production and Meat Consumption in the Czech Republic. Agriculture 2021, 11, 102. [Google Scholar] [CrossRef]
- Ryu, G.-A.; Nasridinov, A.; Rah, H.; Yoo, K.-H. Forecasts of the Amount Purchase Pork Meat by Using Structured and Unstructured Big Data. Agriculture 2020, 10, 21. [Google Scholar] [CrossRef]
- Vu, T.N.; Ho, C.M.; Nguyen, T.C.; Vo, D.H. The Determinants of Risk Transmission between Oil and Agricultural Prices: An IPVAR Approach. Agriculture 2020, 10, 120. [Google Scholar] [CrossRef]
- Zafeiriou, E.; Arabatzis, G.; Karanikola, P.; Tampakis, S.; Tsiantikoudis, S. Agricultural Commodities and Crude Oil Prices: An Empirical Investigation of Their Relationship. Sustainability 2018, 10, 1199. [Google Scholar] [CrossRef]
- Rah, H.; Kim, H.-W.; Nasridinov, A.; Cho, W.-S.; Choi, S.; Yoo, K.-H. Threshold Effects of Infectious Disease Outbreaks on Livestock Prices: Cases of African Swine Fever and Avian Influenza in South Korea. Appl. Sci. 2021, 11, 5114. [Google Scholar] [CrossRef]
- Ma, Z.; Chen, Z.; Chen, T.; Du, M. Application of machine learning methods in pork price forecast. In Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 22–24 February 2019. [Google Scholar] [CrossRef]
- Song, H.; Zhang, H.; Yang, J.; Wang, J. Forecasting model for the number of breeding sows based on pig’s months of age transfer and improved flower pollination algorithm-back propagation neural network. Appl. Intell. 2024, 54, 5826–5858. [Google Scholar] [CrossRef]
- Yu, X.; Liu, B.; Lai, Y. Monthly Pork Price Prediction Applying Projection Pursuit Regression: Modeling, Empirical Research, Comparison, and Sustainability Implications. Sustainability 2024, 16, 1466. [Google Scholar] [CrossRef]
- Li, K.; Shen, N.; Kang, Y.; Chen, H.; Wang, Y.; He, S. Livestock product price forecasting method based on heterogeneous GRU neural network and energy decomposition. IEEE Access 2021, 9, 158322–158330. [Google Scholar] [CrossRef]
- Kim, I.; Kim, K.-S. Estimation of Water Footprint for Major Agricultural and Livestock Products in Korea. Sustainability 2019, 11, 2980. [Google Scholar] [CrossRef]
- Pang, J.; Yin, J.; Lu, G.; Li, S. Supply and Demand Changes, Pig Epidemic Shocks, and Pork Price Fluctuations: An Empirical Study Based on an SVAR Model. Sustainability 2023, 15, 13130. [Google Scholar] [CrossRef]
- Yuan, J.; Hao, J.; Liu, M.; Wu, D.; Li, J. A dynamic ensemble learning approach with spectral clustering for beef and lamb prices prediction. Procedia Comput. Sci. 2022, 214, 1190–1197. [Google Scholar] [CrossRef]
- Ko, E.; Jeong, K.; Oh, H.; Park, Y.; Choi, J.; Lee, E. A deep learning-based framework for predicting pork preference. J. Curr. Res. Food Sci. 2023, 6, 100495. [Google Scholar] [CrossRef] [PubMed]
- Taylor, C.; Guy, J.; Bacardit, J. Prediction of growth in grower-finisher pigs using recurrent neural networks. Biosyst. Eng. 2022, 220, 114–134. [Google Scholar] [CrossRef]
- Korea Institute for Animal Products Quality Evaluation. Available online: https://www.ekape.or.kr (accessed on 15 May 2024).
- Korea Meteorological Administration. Available online: https://www.kma.go.kr (accessed on 15 May 2024).
- Investing. Available online: https://kr.investing.com (accessed on 15 May 2024).
- Korea National Oil Corporation. Available online: https://www.opinet.co.kr (accessed on 15 May 2024).
- Ministry of Interior and Safety. Available online: https://www.data.go.kr (accessed on 15 May 2024).
- Drachal, K.; Pawłowski, M. Forecasting Selected Commodities’ Prices with the Bayesian Symbolic Regression. Int. J. Financ. Stud. 2024, 12, 34. [Google Scholar] [CrossRef]
- Drachal, K. Forecasting the Crude Oil Spot Price with Bayesian Symbolic Regression. Energies 2023, 16, 4. [Google Scholar] [CrossRef]
- Ali, M.; Deo, R.C.; Xiang, Y.; Prasad, R.; Li, J.; Farooque, A.; Yaseen, Z.M. Coupled online sequential extreme learning machine model with ant colony optimization algorithm for wheat yield prediction. Sci. Rep. 2022, 12, 5488. [Google Scholar] [CrossRef]
Factor | Source | Name | Features | Samples | Date |
---|---|---|---|---|---|
Internal | EKAPE [26] | Supply | 2 | 2060 | 1 January 2016–10 August 2022 |
Breeding | 11 | 1887 | |||
Process | 10 | 2414 | |||
Price | 5 | 1686 | |||
External | KMA [27] | Weather | 5 | 2354 | |
INVESTING [28] | Exchange rate | 4 | 1782 | ||
OPINET [29] | Gasoline | 5 | 2414 | ||
DATAGO [30] | Disease | 2 | 568 | ||
Total (Excluding DateTime) | 35 | 2414 |
Feature | Count | Zeros | Mean | Min | 25% | 75% | Max |
---|---|---|---|---|---|---|---|
Packaged Purchase Weight | 2060 | 0 | 6,621,952.2 | 514.5 | 3,450,553.65 | 9,200,377.25 | 27,269,761.02 |
Packaged Purchase Count | 2060 | 1 | 66,655.46 | 0.0 | 34,445.75 | 92,929.5 | 122,017.0 |
Packaged Shipment Weight | 2060 | 0 | 4,526,624.91 | 13,185.46 | 2,503,470.91 | 6,231,381.2 | 11,654,205.66 |
Packaged Shipment Count | 2060 | 0 | 126,730.31 | 1050.0 | 69,540.5 | 169,443.5 | 238,523.0 |
Sales Purchase Weight | 2060 | 0 | 1,498,249.56 | 14,710.62 | 922,222.99 | 2,007,284.41 | 3,522,359.17 |
Sales Purchase Count | 2060 | 0 | 17,158.64 | 620.0 | 11,853.75 | 22,043.25 | 35,995.0 |
Sales Shipment Weight | 2060 | 0 | 906,030.35 | 1057.34 | 487,615.53 | 1,235,821.39 | 2,065,810.64 |
Sales Shipment Count | 2060 | 0 | 19,950.11 | 59.0 | 13,119.75 | 26,520.75 | 46,251.0 |
Wholesale Price | 2060 | 0 | 5322.05 | 2707.0 | 4350.5 | 6157.5 | 7528.0 |
Large Mart Price | 2060 | 0 | 20,256.35 | 12,033.0 | 17,645.17 | 22,819.25 | 29,830.0 |
Supermarket Price | 2060 | 0 | 21,432.71 | 18,100.0 | 19,806.0 | 23,230.0 | 26,140.0 |
Retail Price | 2060 | 0 | 20,899.57 | 14,476.0 | 18,581.75 | 23,022.0 | 29,590.0 |
Name | Type | Features | Training | Testing |
---|---|---|---|---|
Experiment 1 | Daily | Internal (22) | 1895 (2016–2021) | 165 (2022) |
Experiment 2 | Internal + External (35) | |||
Experiment 3 | Feature Selection (13) | |||
Experiment 4 | Parameter tuning (13) | |||
Experiment 5 | Weekly | Internal (22) | 313 (2016–2021) | 33 (2022) |
Experiment 6 | Internal + External (35) | |||
Experiment 7 | Monthly | Feature Selection (7) | 72 (2016–2021) | 10 (2022) |
Method | Parameter | Options | Selected | Note |
---|---|---|---|---|
RFR | n_estimators | 40,50,60,100 | 50 | Number of trees in the forest |
max_depth | 10,15,20,25 | 15 | Maximum depth in each tree | |
min_samples_leaf | 1,2,3,4 | 1 | Minimum number of samples to be at a leaf node | |
GBR | n_estimators | 50,60,70,100 | 60 | Number of trees in the forest |
max_depth | 12,14,16,18 | 14 | Maximum depth in each tree | |
min_samples_leaf | 1,2,3,4 | 4 | Minimum number of samples to be at a leaf node | |
XGBR | n_estimators | 150,180,200 | 180 | Number of trees in the forest |
max_depth | 2,4,6,8 | 6 | Maximum depth in each tree | |
learning_rate | 0.01,0.03,0.05 | 0.03 | Tuning parameter of optimization | |
LGBMR | n_estimators | 100,200,500 | 500 | Number of trees in the forest |
learning_rate | 0.1,0.01,0.001 | 0.01 | Tuning parameter of optimization | |
max_depth | −1,10,20,30 | −1 | Maximum depth in each tree | |
SYMR | generations | 5,10,20 | 5 | The number of generations to evolve |
population_size | 100,1000,1500 | 1000 | The number of programs in each generation | |
init_depth | 2,4,6,8,10 | 2,6 | The range of tree depths | |
ELMR | n_neurons | 10,13,16,20 | 13 | The number of hidden layer neurons |
density | 0.1,0.3,0.5,0.7 | 0.3 | The proportion of connections NN layer | |
ufunc | relu,tanh,sigm | relu | Transformation function of hidden layer neurons | |
NN | hidden_layer_size | 20,30,40,100 | 30 | Number of hidden layers |
learning_rate_init | 1 × 10−3,1 × 10−4,5 × 10−4 | 5 × 10−4 | Tuning parameter of optimization |
Method | Evaluation | Exp 1 | Exp 2 | Exp 3 | Exp 4 | Exp 5 | Exp 6 | Exp 7 |
---|---|---|---|---|---|---|---|---|
RFR | R2 | 78.0 | 83.0 | 86.0 | 87.0 | 68.0 | 77.0 | 84.0 |
RMSE | 494.8 | 429.3 | 398.6 | 381.9 | 2044.1 | 1738.9 | 2674.5 | |
MAE | 347.4 | 288.2 | 282.5 | 246.2 | 1609.8 | 1317.2 | 1568.7 | |
sMAPE | 10.3 | 8.1 | 7.8 | 6.6 | 8.6 | 7.1 | 1.8 | |
GBR | R2 | 75.0 | 79.0 | 85.0 | 88.0 | 70.0 | 72.0 | 77.0 |
RMSE | 525.5 | 485.0 | 407.2 | 360.6 | 2001.1 | 1939.2 | 3266.3 | |
MAE | 377.4 | 349.4 | 287.7 | 255.3 | 1556.4 | 1457.9 | 2174.0 | |
sMAPE | 11.2 | 10.3 | 8.7 | 6.7 | 8.1 | 7.7 | 2.5 | |
XGBR | R2 | 78.0 | 80.0 | 87.0 | 88.0 | 63.0 | 72.0 | 87.0 |
RMSE | 489.0 | 467.2 | 385.1 | 360.7 | 2211.4 | 1928.7 | 2444.4 | |
MAE | 343.0 | 325.0 | 276.5 | 251.7 | 1448.0 | 1475.3 | 1461.3 | |
sMAPE | 9.6 | 9.2 | 7.5 | 6.8 | 7.8 | 7.8 | 1.6 | |
LGBMR | R2 | 79.0 | 83.0 | 86.0 | 86.0 | 66.0 | 68.0 | 71.0 |
RMSE | 480.0 | 431.3 | 394.0 | 388.6 | 2119.9 | 2071.6 | 3654.0 | |
MAE | 333.8 | 341.3 | 256.1 | 244.4 | 1565.2 | 1640.7 | 3465.8 | |
sMAPE | 9.6 | 9.1 | 7.4 | 6.9 | 8.4 | 8.6 | 3.8 | |
SYMR | R2 | 76.0 | 79.0 | 80.0 | 83.0 | 63.0 | 76.0 | 78.0 |
RMSE | 512.2 | 482.6 | 468.9 | 430.5 | 2203.7 | 1795.6 | 3152.9 | |
MAE | 397.4 | 357.0 | 332.8 | 326.0 | 1848.5 | 1508.1 | 2213.0 | |
sMAPE | 10.9 | 10.0 | 9.36 | 8.1 | 9.43 | 7.8 | 2.51 | |
ELMR | R2 | 71.0 | 75.0 | 77.0 | 78.0 | 66.0 | 78.0 | 84.0 |
RMSE | 566.36 | 523.2 | 510.5 | 497.8 | 2108.9 | 1711.3 | 2710.6 | |
MAE | 441.16 | 399.1 | 355.6 | 330.1 | 1756.0 | 1404.1 | 2149.4 | |
sMAPE | 13.31 | 12.0 | 12.5 | 10.6 | 8.8 | 7.2 | 2.38 | |
SFE-NET | R2 | 81.0 | 84.0 | 87.0 | 91.0 | 72.0 | 84.0 | 88.0 |
RMSE | 457.7 | 416.3 | 384.4 | 323.5 | 1924.3 | 1455.0 | 2289.3 | |
MAE | 318.5 | 274.4 | 269.8 | 235.9 | 1390.8 | 1144.0 | 1494.3 | |
sMAPE | 8.8 | 7.8 | 7.2 | 6.4 | 7.5 | 6.1 | 1.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chuluunsaikhan, T.; Kim, J.-H.; Park, S.-H.; Nasridinov, A. Analyzing Internal and External Factors in Livestock Supply Forecasting Using Machine Learning: Sustainable Insights from South Korea. Sustainability 2024, 16, 6907. https://doi.org/10.3390/su16166907
Chuluunsaikhan T, Kim J-H, Park S-H, Nasridinov A. Analyzing Internal and External Factors in Livestock Supply Forecasting Using Machine Learning: Sustainable Insights from South Korea. Sustainability. 2024; 16(16):6907. https://doi.org/10.3390/su16166907
Chicago/Turabian StyleChuluunsaikhan, Tserenpurev, Jeong-Hun Kim, So-Hyun Park, and Aziz Nasridinov. 2024. "Analyzing Internal and External Factors in Livestock Supply Forecasting Using Machine Learning: Sustainable Insights from South Korea" Sustainability 16, no. 16: 6907. https://doi.org/10.3390/su16166907
APA StyleChuluunsaikhan, T., Kim, J.-H., Park, S.-H., & Nasridinov, A. (2024). Analyzing Internal and External Factors in Livestock Supply Forecasting Using Machine Learning: Sustainable Insights from South Korea. Sustainability, 16(16), 6907. https://doi.org/10.3390/su16166907