Cotton Yield Prediction with Gaussian Distribution Sampling and Variational AutoEncoder
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Areas
2.2. Datasets
2.3. Methods
2.3.1. Gaussian Distribution Data Augmentation
2.3.2. Variational AutoEncoder
2.3.3. Experiment Setup
- Dataset division settings.To effectively verify the generalization ability of the model, we designed three different data partitioning methods. The first method is the cross-year validation setting, which is mainly applied to the 23-year data of Bahawalnagar. We use the data from 1999 to 2016 as the training set and the data from 2017 to 2021 as the test set. As shown in Figure 1, we can observe that the distribution of cotton data in Bahawalnagar fluctuates significantly in the 23-year data. Therefore, this setting can effectively verify the model’s generalization ability for future cotton data. The second is the cross-district setting, which is mainly applied to 81 districts in Turkey. We completely distinguish the districts of the training set and the test set so that the training and test set districts are disjointed. This setting can effectively help the yield prediction model be extended to unknown districts and, thus, has extremely high research value. The third setting is the cross-year and cross-district setting, which is also applied to the data in Turkey. In this setting, we use the data from 2019 to 2021 as the training set and the data from 2022 to 2023 as the test set. This setting has extremely high research value because it enables the model to predict future yields across districts. Please note that in all settings, the districts in the training set and the test set are completely disjointed. Additionally, the evaluation results of the data are based on the test set, and the training set is used by the model to learn the skills for predicting cotton yields. The statistical information of the dataset is shown in Table 2.
- Comparison benchmarks and assessment methods for cotton yield prediction.In order to verify that the proposed method is effective and advanced, we compared it with nine different baseline methods, including five machine learning algorithms: linear regression (LR), random forest regression (RFR), support vector regression (SVR), ridge regression (RR) and logistic regression (La), as well as four deep learning neural network models: multi-layer perceptron (MLP), multi-scale convolutional neural network (Multi-scale CNN), bidirectional recurrent neural network (Bi-LSTM), and Transformer [24].To evaluate and compare the performance of different methods, we calculated three important production prediction indicators, namely root mean square error (RMSE), mean absolute error (MAE), and the coefficient. The calculation methods for these three indicators are as follows:
3. Results and Discussion
3.1. Contrastive Analysis
3.2. Importance Feature Analysis
3.3. Data Augmentation Effectiveness Analysis
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Khan, H.; Khan, N.; Khan, Z.; Han, Y.; Yang, B.; Lei, Y.; Zhi, X.; Xiong, S.; Shang, S.; Ma, Y.; et al. Water and heat resource utilization influence cotton yield through sowing date optimization under varied climate. Agric. Water Manag. 2025, 313, 109491. [Google Scholar] [CrossRef]
- Chen, X.; Qi, Z.; Gui, D.; Gu, Z.; Ma, L.; Zeng, F.; Li, L. Simulating impacts of climate change on cotton yield and water requirement using RZWQM2. Agric. Water Manag. 2019, 222, 231–241. [Google Scholar] [CrossRef]
- Liu, S.; Zhang, W.; Shi, T.; Li, T.; Li, H.; Zhou, G.; Wang, Z.; Ma, X. Increasing exposure of cotton growing areas to compound drought and heat events in a warming climate. Agric. Water Manag. 2025, 308, 109307. [Google Scholar] [CrossRef]
- Subramanian, K.; Sarkar, M.K.; Wang, H.; Qin, Z.H.; Chopra, S.S.; Jin, M.; Kumar, V.; Chen, C.; Tsang, C.W.; Lin, C.S.K. An overview of cotton and polyester, and their blended waste textile valorisation to value-added products: A circular economy approach–research trends, opportunities and challenges. Crit. Rev. Environ. Sci. Technol. 2022, 52, 3921–3942. [Google Scholar] [CrossRef]
- Zhang, Z.; Huang, J.; Yao, Y.; Peters, G.; Macdonald, B.; La Rosa, A.D.; Wang, Z.; Scherer, L. Environmental impacts of cotton and opportunities for improvement. Nat. Rev. Earth Environ. 2023, 4, 703–715. [Google Scholar] [CrossRef]
- Ahmad, S.; Ahmad, I.; Ahmad, B.; Ahmad, A.; Wajid, A.; Khaliq, T.; Abbas, G.; Wilkerson, C.J.; Hoogenboom, G. Regional integrated assessment of climate change impact on cotton production in a semi-arid environment. Clim. Res. 2023, 89, 113–132. [Google Scholar] [CrossRef]
- Rajput, P.K. Machine learning approach for Forest Biomass Modelling with In-Situ and Remote Sensing Data in Narmadapuram central India. Model. Earth Syst. Environ. 2025, 11, 350. [Google Scholar] [CrossRef]
- Chen, W.; Xiang, X.; Liu, S.; Guo, J.; Li, T.; Zhou, X.; Peng, D.; Deng, Z.; Wang, B.; Wang, H.; et al. An integrated exergy efficiency and machine learning method for optimizing organic solid waste gasification process. Eng. Appl. Artif. Intell. 2025, 159, 111805. [Google Scholar] [CrossRef]
- Ogbonna, C.; Ohabuka, C.; Bartholomew, D.; Anyiam, K.; Adamu, I. Optimizing Nigerian Bank Lending Systems: The Power of Discrete Wavelet Transform (DWT) in Denoising and Regression Analysis. Ann. Data Sci. 2025, 1–37. [Google Scholar] [CrossRef]
- Menghani, G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
- Yang, S.; Du, Y.; Zheng, X.; Li, X.; Chen, X.; Li, Y.; Xie, C. Few-shot intent detection with self-supervised pretraining and prototype-aware attention. Pattern Recognit. 2024, 155, 110641. [Google Scholar] [CrossRef]
- Archana, R.; Jeevaraj, P.E. Deep learning models for digital image processing: A review. Artif. Intell. Rev. 2024, 57, 11. [Google Scholar] [CrossRef]
- Wang, H.; Wang, H. Research on Microseismic Magnitude Prediction Method Based on Improved Residual Network and Transfer Learning. Appl. Sci. 2025, 15, 8246. [Google Scholar] [CrossRef]
- Ma, T.; Yu, J.; Wang, B.; Gao, M.; Yang, Z.; Li, Y.; Fan, M. A Power Monitor System Cybersecurity Alarm-Tracing Method Based on Knowledge Graph and GCNN. Appl. Sci. 2025, 15, 8188. [Google Scholar] [CrossRef]
- Deng, G.; Zhou, F.; Dong, H.; Xu, Z.; Li, Y. Accurate Sugarcane Detection and Row Fitting Using SugarRow-YOLO and Clustering-Based Spline Methods for Autonomous Agricultural Operations. Appl. Sci. 2025, 15, 7789. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, L.; Yu, H.; Guo, Z.; Zhang, R.; Zhou, X. Research on the Strawberry Recognition Algorithm Based on Deep Learning. Appl. Sci. 2023, 13, 11298. [Google Scholar] [CrossRef]
- Wang, H.; Dai, Y.; Yao, Q.; Ma, L.; Zhang, Z.; Lv, X. Multi-task learning model driven by climate and remote sensing data collaboration for mid-season cotton yield prediction. Field Crops Res. 2025, 333, 110070. [Google Scholar] [CrossRef]
- Yu, S.H.; Kang, Y.; Lee, C.G. Comparison of the Spray Effects of Air Induction Nozzles and Flat Fan Nozzles Installed on Agricultural Drones. Appl. Sci. 2023, 13, 11552. [Google Scholar] [CrossRef]
- Li, N.; Li, Y.; Yang, Q.; Biswas, A.; Dong, H. Simulating climate change impacts on cotton using AquaCrop model in China. Agric. Syst. 2024, 216, 103897. [Google Scholar] [CrossRef]
- Shin, H.J.; Kim, S.; Kang, H.; Lee, A.G. Novel Instrument for Clinical Evaluations of Active Extraocular Muscle Tension. Appl. Sci. 2023, 13, 11431. [Google Scholar] [CrossRef]
- Istipliler, D.; Ekizoğlu, M.; Çakaloğulları, U.; Tatar, Ö. The impact of environmental variability on cotton fiber quality: A comparative analysis of primary cotton-producing regions in türkiye. Agronomy 2024, 14, 1276. [Google Scholar] [CrossRef]
- Alawneh, L.; Alsarhan, T.; Al-Zinati, M.; Al-Ayyoub, M.; Jararweh, Y.; Lu, H. Enhancing human activity recognition using deep learning and time series augmented data. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 10565–10580. [Google Scholar] [CrossRef]
- Gao, T.; Yao, X.; Chen, D. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online, Punta Cana, Dominican Republic, 7–11 November 2021. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Xu, W.; Chen, P.; Zhan, Y.; Chen, S.; Zhang, L.; Lan, Y. Cotton yield estimation model based on machine learning using time series UAV remote sensing data. Int. J. Appl. Earth Obs. Geoinf. 2021, 104, 102511. [Google Scholar] [CrossRef]
- Knutti, R.; Rugenstein, M.A. Feedbacks, climate sensitivity and the limits of linear models. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2015, 373, 20150146. [Google Scholar] [CrossRef]
- Pabuayon, I.L.B.; Kelly, B.R.; Mitchell-McCallister, D.; Coldren, C.L.; Ritchie, G.L. Cotton boll distribution: A review. Agron. J. 2021, 113, 956–970. [Google Scholar] [CrossRef]
- Krichen, M. Generative Adversarial Networks. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–7. [Google Scholar]
- Grundy, P.R.; Yeates, S.J.; Bell, K.L. Cotton production during the tropical monsoon season. I—The influence of variable radiation on boll loss, compensation and yield. Field Crops Res. 2020, 254, 107790. [Google Scholar] [CrossRef]
- Huete, A. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
- Khan, M.A.; Anwar, S.; Abbas, M.; Aneeq, M.; de Jong, F.; Ayaz, M.; Wei, Y.; Zhang, R. Impacts of climate change on cotton production and advancements in genomic approaches for stress resilience enhancement. J. Cotton Res. 2025, 8, 17. [Google Scholar] [CrossRef]
- Xu, W.; Yang, W.; Chen, P.; Zhan, Y.; Zhang, L.; Lan, Y. Cotton Fiber Quality Estimation Based on Machine Learning Using Time Series UAV Remote Sensing Data. Remote Sens. 2023, 15, 586. [Google Scholar] [CrossRef]
- Liu, Q.; Wang, C.; Jiang, J.; Wu, J.; Wang, X.; Cao, Q.; Tian, Y.; Zhu, Y.; Cao, W.; Liu, X. Multi-source data fusion improved the potential of proximal fluorescence sensors in predicting nitrogen nutrition status across winter wheat growth stages. Comput. Electron. Agric. 2024, 219, 108786. [Google Scholar] [CrossRef]
- Yu, T.; Wang, B.; Li, X.; Yu, Y. A Tensor Decomposition-Based Censored Regression Adaptive Filtering Algorithm. Circuits Syst. Signal Process. 2025, 44, 6151–6166. [Google Scholar] [CrossRef]
- Chang, W.; Yang, S.; Xi, X.; Wang, H.; Liu, Z.; Zhang, X.; Li, S.; Zhao, Y. Classification of seed maize using deep learning and transfer learning based on times series spectral feature reconstruction of remote sensing. Comput. Electron. Agric. 2025, 237, 110738. [Google Scholar] [CrossRef]
- Kamangir, H.; Hajiesmaeeli, M.; Earles, J.M. California Crop Yield Benchmark: Combining Satellite Image, Climate, Evapotranspiration, and Soil Data Layers for County-Level Yield Forecasting of Over 70 Crops. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Nashville, TN, USA, 11–15 June 2025; pp. 5491–5500. [Google Scholar]
- Lu, W.; Chen, S.B.; Shu, Q.L.; Tang, J.; Luo, B. Decouplenet: A lightweight backbone network with efficient feature decoupling for remote sensing visual tasks. IEEE Trans. Geosci. Remote Sens. 2024, 62. [Google Scholar] [CrossRef]
- Cai, Z.; An, X.; Xie, D.; Xue, Y.; Liu, X.; Wang, Q.; Chen, L.; Liu, L.; Zhang, C.; Xue, C. An attitude control method with model-aided estimation and parameter-adaptive optimization for high clearance sprayers. Comput. Electron. Agric. 2025, 237, 110572. [Google Scholar] [CrossRef]
Country | Districts |
---|---|
Pakistan | Bahawalpur |
Turkey | Ceyhan, Karatas, Yuregir, Incirliova, Germencik, Kocarli, Nazilli, Soke, Yenipazar, Bismil, Cinar, Sur, Yenisehir, Antakya, Kirikhan, Kumlu, Reyhanli, Bergama, Kinik, Akhisar, Saruhanli, Derik, Kiziltepe, Akcakale, Bozova, Ceylanpinar, Harran, Karakopru, Suruc, Menderes, Dikili, Turgutlu, Altinozu, Yumurtalik, Sultanhisar, Cine, Efeler, Kosk, Tarsus, Foca, Menemen, Tire, Torbali, Sehzadeler, Golmarmara, Yunusemre, Eyyubiye, Hilvan, Siverek, Viransehir, Bayindir, Ahmetli, Seyhan, Kayapinar, Selcuk, Didim, Kuyucak, Salihli, Kirkagac, Haliliye, Aliaga, Baglar, Buharkent, Osmaniye, Hassa, Kadirli, Ergani, Artuklu, Imamoglu, Soma, Kozan, Saricam, Cermik, Cigli, Odemis, Savur, Bozdogan, Mazidagi, Silvan, Egil, Alasehir |
Field | Setting | Characteristic, Index or Indicator | Train Year | Test Year | Train District | Test District | Total Year |
---|---|---|---|---|---|---|---|
Bahawalnagar | Cross-year | 45d | 18 (1999–2016) | 5 (2017–2021) | 1 | 1 | 23 (1999–2021) |
Turkey (81 districts) | Corss-district | 20d | 5 (2019–2023) | 5 (2019–2023) | 50 | 31 | 5 (2019–2023) |
Cross-district and -year | 20d | 3 (2019–2021) | 2 (2022–2023) | 50 | 31 | 5 (2019–2023) |
Method | Cross-Year | Cross-District | Cross-District and -Year | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE | MAE | RMSE | MAE | RMSE | MAE | ||||
GAN | 98.94 | 72.83 | −0.01 | 57.13 | 45.83 | −0.20 | 47.84 | 40.92 | 0.09 |
GD-VAE | 58.40 | 38.19 | 0.65 | 38.29 | 30.34 | 0.49 | 46.46 | 37.74 | 0.14 |
Method | All Bahawalnagar | Bahawalnagar | All Turkey | Turkey |
---|---|---|---|---|
RFR | 0.45 | 0.43 | −1.40 | −1.50 |
GD-VAE | 0.65 | 0.55 | 0.14 | 0.07 |
Method | Cross-Year | Cross-District | Cross-District and -Year | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE | MAE | RMSE | MAE | RMSE | MAE | ||||
w/o GD | 116.05 | 95.79 | −0.39 | 70.04 | 53.66 | −0.66 | 80.69 | 63.67 | −1.58 |
GD-VAE | 58.4 | 38.19 | 0.65 | 38.59 | 30.34 | 0.49 | 46.46 | 37.74 | 0.14 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lan, Y.; Wang, X.; Gao, L.; Chen, X. Cotton Yield Prediction with Gaussian Distribution Sampling and Variational AutoEncoder. Appl. Sci. 2025, 15, 9947. https://doi.org/10.3390/app15189947
Lan Y, Wang X, Gao L, Chen X. Cotton Yield Prediction with Gaussian Distribution Sampling and Variational AutoEncoder. Applied Sciences. 2025; 15(18):9947. https://doi.org/10.3390/app15189947
Chicago/Turabian StyleLan, Yaqi, Xiudong Wang, Lei Gao, and Xiaoliang Chen. 2025. "Cotton Yield Prediction with Gaussian Distribution Sampling and Variational AutoEncoder" Applied Sciences 15, no. 18: 9947. https://doi.org/10.3390/app15189947
APA StyleLan, Y., Wang, X., Gao, L., & Chen, X. (2025). Cotton Yield Prediction with Gaussian Distribution Sampling and Variational AutoEncoder. Applied Sciences, 15(18), 9947. https://doi.org/10.3390/app15189947