Experimental Modelling of Sunflower Seed Moisture Content During Controlled Drying Using Machine Learning Methods

Matin, Ana; Brandić, Ivan; Špelić, Karlo; Tomić, Ivana; Pavlović, Aleksandra; Matin, Božidar; Krička, Tajana; Galić, Ante

doi:10.3390/agriculture16060695

Open AccessArticle

Experimental Modelling of Sunflower Seed Moisture Content During Controlled Drying Using Machine Learning Methods

by

Ana Matin

¹,

Ivan Brandić

^1,*

,

Karlo Špelić

¹

,

Ivana Tomić

¹,

Aleksandra Pavlović

¹,

Božidar Matin

²

,

Tajana Krička

¹

and

Ante Galić

¹

Faculty of Agriculture, University of Zagreb, Svetošimunska Cesta 25, 10000 Zagreb, Croatia

²

Faculty of Forestry and Wood Technology, University of Zagreb, Svetošimunska Cesta 23, 10000 Zagreb, Croatia

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(6), 695; https://doi.org/10.3390/agriculture16060695

Submission received: 12 February 2026 / Revised: 12 March 2026 / Accepted: 19 March 2026 / Published: 20 March 2026

(This article belongs to the Section Seed Science and Technology)

Download

Browse Figures

Versions Notes

Abstract

The aim of this research was to experimentally analyze the influence of drying method, temperature, and drying time on moisture content (MC), elemental composition (percentages of C, H, N, S, and O), and protein and fat content in sunflower seeds, as well as to apply and compare different existing machine learning regression models for moisture content prediction. The study was conducted on three sunflower hybrids (Sumiko, Pioneer, and Agromatic Lidea) using conduction, vacuum, and fluidized bed drying at temperatures from 50 to 80 °C and durations from 15 to 60 min. The results showed that temperature and time are the main controllable parameters of drying, while drying methods and hybrid also significantly influence the process. In moisture content modelling, artificial neural networks (ANN) achieved the best predictive performance (R² = 0.97; RMSE = 0.46), while SVR models showed slightly weaker but still high accuracy. The results indicate that machine learning models can be useful tools for predicting moisture content based on drying parameters and may support improved monitoring and management of the sunflower seed drying process.

Keywords:

drying; regression modeling; mathematical modeling; prediction

1. Introduction

Sunflower is a major crop and ranks among the four most important and widely consumed oilseeds globally [1]. Sunflower seeds are highly nutritious, containing fiber, unsaturated fatty acids, antioxidants, proteins, amino acids, and vitamins [2]. Due to their favorable nutritional composition, they are an important component of a balanced diet and can contribute to the prevention and mitigation of certain chronic diseases [3,4]. Further processing of sunflower seeds yields various food and intermediate products [5]. To preserve product quality and extend shelf life, various post-harvest operations are necessary, with drying being one of the most important [6]. The preservation of nutritional values is also influenced by the choice of drying technique [7]. Consequently, newer drying technologies focus on achieving higher moisture removal performance, reducing drying time, increasing energy efficiency, and preserving nutrients [8]. Drying is one of the most energy-intensive industrial processes, making it essential to introduce efficient assessment, measurement, and optimization methods to reduce energy consumption [9]. For these reasons, there is growing interest in applying machine learning approaches to drying processes, as these methods can model complex, nonlinear relationships between process parameters and moisture dynamics that are difficult to describe using conventional methods [10]. Machine learning has emerged as a powerful tool for modelling complex nonlinear relationships in process data [11]. Modelling and monitoring the drying process are crucial for planning and developing controlled drying strategies, where machine learning algorithms can serve as useful tools for analyzing complex relationships between process parameters and drying behavior [12]. Moisture content assessment therefore plays a key role in quality control, ensuring safe storage conditions and the preservation of product quality [13]. Machine learning models can be used to describe and predict changes in moisture content based on drying process parameters within a defined experimental domain [14]. Huang et al. [15] conducted research using machine learning methods combined with hyperspectral imaging data, demonstrating the potential for assessing the vitality and moisture content of sunflower seeds. Similarly, Yang et al. [16] determined grain moisture content using a machine learning model and achieved a high coefficient of determination (R² = 0.87–0.91). Dmitriev et al. (2025) [17] conducted a study showing that it is possible to quickly, remotely, and non-invasively estimate the moisture content of sunflower seeds using a hyperspectral camera in the VNIR region (450–950 nm) and random forest regression, with results indicating very high model accuracy. Despite the growing use of machine learning models in agricultural process modelling [18] and in the analysis of agricultural product drying processes [19], studies systematically analysing the potential for predicting moisture content in sunflower seeds during drying remain limited. Most studies to date have focused on specific sensor technologies, such as hyperspectral analysis [17], or on individual machine learning models. However, comparative analyses of multiple regression and machine learning models under controlled experimental drying conditions, including different drying methods, temperatures, and process durations, can provide further insight into the applicability of these approaches. Therefore, a systematic evaluation of different machine learning models in assessing changes in moisture content during the drying of sunflower seeds is needed.

The main contribution of this study is the experimental evaluation and comparison of several machine learning regression models for predicting moisture content during sunflower seed drying under controlled process conditions. The aim of this study is to evaluate various regression and machine learning models for estimating the moisture content of sunflower seeds based on drying process parameters, including drying method, temperature, and process duration. The experimental drying process was conducted to generate a representative data set required for model development. The initial hypothesis was that, by using machine learning algorithms, high accuracy—i.e., low error levels—in moisture content modelling can be achieved, despite the complex and nonlinear relationships between the input variables. This approach enables a more reliable description and prediction of sunflower seed drying dynamics, supporting improved monitoring and management of the drying process.

2. Materials and Methods

2.1. Laboratory Analysis

Samples underwent various laboratory analyses to determine elemental composition (percentages of C, H, N, S, and O), protein content, oil content, and moisture content. Laboratory research was conducted in 2025. Hybrids Sumiko, Pioneer, and Agromatic Lidea were selected for their commercial relevance and availability within the regional production system, representing commonly cultivated sunflower genotypes under local agronomic conditions. Therefore, the findings primarily apply to these hybrids within the defined experimental domain, and extrapolation to broader global diversity requires further validation. For each treatment combination, representative samples were collected after the drying process and prepared for laboratory analysis. All analyses were performed on homogenized samples, with measurements conducted in triplicate to ensure reliability. The sample mass used for the determinations complied with the requirements of the respective ISO standard methods. All procedures and protocols are presented in Table 1.

Elemental composition (C, H, N, and S) was determined using a Macro CHNS analyser (Elementar Analysensysteme GmbH, Langenselbold, Germany) according to the specified ISO standards. Fat content was determined using a Soxhlet extraction system (R304 Soxhlet extractor, Düsseldorf, Germany) following ISO 659:2009. Protein content was determined using the Kjeldahl method, while moisture content was measured by oven drying in accordance with ISO 665:2020. All instruments were operated according to the manufacturers’ protocols and standard laboratory procedures.

2.2. Drying Process

The research was conducted on three sunflower hybrids: Sumiko, Pioneer, and Agromatic Lidea. Analyses were performed on both raw and dried samples. The samples were dried at four temperatures (50, 60, 70, and 80 °C) and four drying times (15, 30, 45, and 60 min). Three drying methods were applied: conduction drying, fluidized bed drying, and vacuum drying. Before the drying experiments, samples of each hybrid were mixed to achieve a uniform moisture distribution at the start of the experiment. Conduction drying was carried out using a laboratory device designed for conduction drying (Setting, Delnice, Croatia), with temperature and drying time controlled during the process. Fluidized bed drying was performed using a laboratory fluidized bed dryer (Retsch TG 200, Retsch GmbH, Haan, Germany), in which hot air passes through the sample layer to ensure uniform drying conditions. Vacuum drying was conducted in a laboratory vacuum oven (Memmert VO101, Memmert GmbH, Büchenbach, Germany) under reduced pressure.

2.3. Data Processing

Statistical analyses were conducted using the Python programming language (Python 3.10.) [25] in the Jupyter notebook (v 7.5.4) environment with associated packages. The results of the statistical analysis are presented as means and standard deviations. To analyze differences between the observed samples, ANOVA (analysis of variance) and Tukey’s post hoc HSD test were used. In this context, the observed patterns refer to combinations of experimental factors such as drying method, hybrid, drying temperature, and drying duration. Statistically significant differences are indicated by different letters in the column.

2.4. Data Cleaning and Encoding

Before creating regression models, the data were cleaned to ensure that their format and structure were suitable for further analysis and modelling. Since the model input variables were categorical, they were converted into numerical form suitable for machine learning models [26,27].

Table 2 presents a representative subset of the experimental dataset used for developing and evaluating machine learning models. The complete dataset comprised 144 experimental observations obtained from combinations of drying method, sample type, drying temperature, and drying time.

2.5. Evaluation of Existing Machine Learning Models

Several machine learning models were evaluated in this study: artificial neural networks (ANN), random forest regression (RFR), boosted tree regression (BTR), support vector regression (SVR), linear model, and multivariate adaptive regression splines (MARS). The main reason for using these machine learning models is their computational efficiency and ability to model nonlinear relationships [9,28]. The total dataset comprised 144 experimental observations (3 hybrids × 3 drying methods × 4 drying temperatures × 4 drying times). The data were randomly divided into training, validation, and testing sets in a 70:15:15 ratio. To reduce the impact of random data partitioning, the process of splitting and training the model was repeated 3 times. The model hyperparameters were selected based on performance on the validation set. The hyperparameters of individual models were determined through an iterative tuning procedure. For each model, several combinations of relevant hyperparameters were tested, with performance evaluated on a validation dataset. The coefficient of determination (R²) and model errors (RMSE and MAE) were used as criteria for selecting the optimal combination. The combination of hyperparameters that produced the lowest error and highest R² value was chosen. To reduce the influence of random data distribution, the model training procedure was repeated three times, and the final hyperparameter values were selected based on the stability of results across iterations. For the ANN model, different network architectures were tested, including the number of neurons in the hidden layer (5–20), the learning rate (0.001–0.05), and the number of learning cycles (50,000–150,000). For the RF model, the effects of the number of trees and node size were analyzed, while for the BTR model, different numbers of trees, tree depths, and learning rates were tested. For the SVR model, different values of the C, epsilon, and kernel parameters (RBF) were tested. For the MARS model, the maximum number of basis functions and the degree of interaction were analyzed. The final hyperparameter values shown in Table 3 were those that demonstrated the best predictive ability on the validation set. The reported model performance metrics (R², RMSE, MAE, and MAPE) therefore represent evaluation results obtained across repeated runs rather than a single data split.

Table 3 provides an overview of the models used in this study, including their settings, general regression equation, and their application to the analyzed problem.

To determine the optimal configuration of the evaluated machine learning models, various combinations of relevant hyperparameters were examined during model development. The tested hyperparameter configurations for each model are presented in Table 4. Based on these tests, the corresponding hyperparameter search ranges and the final selected values used for model training are summarized in Table 5.

2.6. Performance of Evaluated Machine Learning Models

All machine learning models used in this research were evaluated in terms of model error and a specific regression indicator. The following model error metrics were used, root mean squared error (RMSE) (1), mean absolute error (MAE) (2), and mean absolute percentage error (MAPE) (3), while the coefficient of determination (R²) (4) was used as a specific indicator of regression or model performance. The metrics were computed using the following formulas [35,36,37]:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \bar{y_{i}})}^{2}}

(1)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - \bar{y_{i} |}

(2)

M A P E = \frac{1}{m_{k}} \sum_{k = 1}^{m} |\frac{t_{k} - y_{k}}{t_{k}}| \cdot 100

(3)

R^{2} = 1 - \frac{\sum_{i = 1}^{m} (X_{i} - Y_{i})^{2}}{\sum_{i = 1}^{m} (\bar{Y} - Y_{i})^{2}}

(4)

3. Results

The results for the initial moisture content of the different samples are shown in Table 6.

Table 6 shows that the MC differed by variety, and the highest proportion was for Sumiko (18.35%), while the lowest value was for Pioneer, 15.4%.

Descriptive statistics of all measured variables obtained during the drying experiments are presented in Table 7, including the range (min–max), mean values, and standard deviations.

Table 8 presents the univariate analysis to determine the influence of the observed research parameters on the variables under investigation. Univariate analysis refers to separate ANOVA tests conducted for each dependent variable (C, H, N, S, O, protein, and fat content) to evaluate the effects of drying method, hybrid, temperature, drying time, and interactions.

The results of the univariate analysis indicate that the drying method, sample, temperature, and time have a statistically significant effect on the proportions of oxygen, nitrogen, and carbon, as well as on the protein and fat content. Interactions between two factors were most often significant for the elemental composition of C, N, and O, and for the protein and fat content, while they were mostly absent for sulfur and hydrogen. Interactions involving three and four factors remained statistically significant primarily for C, N, O, proteins, and fats, indicating their combined response to changes in process conditions.

Figure 1 shows surface contour plots of moisture content (MC, %) as a function of drying temperature and drying time for different drying methods and sunflower hybrids.

In all drying methods (Figure 1), a trend of decreasing moisture with increasing temperature and drying time is evident. At lower temperatures (50–60 °C) and shorter drying times, the highest moisture content values were recorded. The shape and layout of the contours confirm the interactions between temperature and drying time, showing the intensity of moisture removal during the process.

Table 9 presents a summary of the model performance analysis results, including error metrics and regression indicators.

The table shows that the ANN achieves the best overall predictive performance, with the highest R² (0.97) and the lowest errors (RMSE, MAE, and MAPE), indicating very good agreement between the model and the measurements. SVR also demonstrates high accuracy (R² = 0.94) but is significantly weaker than the ANN in all error measures. RFR, BTR, and MARS show intermediate performance, while the linear model produces the weakest results, clearly suggesting that the relationship between the variables is not linear and that nonlinear models describe the system much better.

Figure 2 shows the target scatter plot versus the predicted output value (MC) for the evaluated machine learning models.

Figure 2 shows that the ANN and SVR models exhibit the highest values of the specific regression indicator, namely the coefficient of determination (R² = 0.96; 0.94), as well as the lowest modelling error, and have proven to be the most suitable models for predicting the MC output value.

4. Discussion

The moisture content in sunflower seeds changes primarily with variations in temperature and the duration of the drying process [38], as shown in the surface contour plot (Figure 1). In all drying methods, temperature and time are the main control parameters. Conduction drying exhibits a completely linear response, while drying in a fluidized bed and under vacuum shows differences, indicating a complex and non-linear process [39]. Kabutey et al. [40] state that temperatures in the range of 40 to 80 °C yield the highest oil content. Detailed means (±SD) and Tukey HSD test results for ultimate analysis (CHNSO), protein and fat, for all cultivars (Sumiko, Pioneer, Aromatik Lidea), four drying temperatures (50, 60, 70 and 80 °C) and four durations (15, 30, 45 and 60 min) are shown in Table S1 (Supplementary Materials). Considering the influence of all research parameters, including drying method, cause, temperature, process duration and interactions, it is evident that in most cases all parameters have a statistically significant effect on the change in the content of the examined variables. The most significant influence on protein content was the interaction between drying method and process duration (Table 8). Since moisture content is the most important variable in drying and exhibits non-linear dynamics [41], it was necessary to model this using various machine learning models, which are considered suitable tools for estimating drying parameters [42,43,44]. It is important to emphasize that the model was developed using experimental data within an initial moisture content range of approximately 15–18%, depending on the hybrid. Therefore, its application is considered reliable primarily within this range, while use at significantly higher initial moisture contents would require further experimental measurements and model validation. The models were evaluated with categorical and continuous input data [45], with different settings as detailed in Table 3. The most effective model for MC modelling with respect to the specific input variables was the ANN model, as it achieved the highest regression indicator (R² = 0.97) and a low level of error (RMSE = 0.46; MAE = 0.32; MAPE = 2.97%). The ANN model achieves high performance due to its ability to process and summarize complex and non-linear data [46,47]. Simonič et al. [48] conducted research to model the moisture content of maize in continuous drying systems using neural network models. The authors report that the model is highly efficient for such tasks, as indicated by the low error levels (RMSE = 0.645, MAE = 0.352, MAPE = 2.555). The application of ANN models as successful and accurate high-performance tools in MC estimation has been confirmed by several studies [49,50,51,52]. In addition to ANN models, SVR models also showed high accuracy, with R² = 0.69 and a slightly higher error (RMSE = 0.66; MAE = 0.51; MAPE = 4.60%). Other models showed lower predictive performance, reflected in reduced R² values and higher prediction errors compared to the ANN model. Although the ANN model demonstrated very high accuracy in predicting moisture content, it is important to note that the results were obtained within a clearly defined experimental domain of temperature, time, drying method and hybrid. Therefore, the model can be considered a reliable tool for MC estimation and process optimization within the tested conditions, while its application outside this domain would require additional validation. The integration of experimental data, multivariate statistical methods and machine learning in this study confirms that the complex and non-linear dynamics of sunflower seed drying can be effectively described by data-driven approaches. Such an approach forms the basis for developing advanced drying management systems aimed at preserving raw material quality and process energy efficiency. In this study, the energy consumption of individual drying methods was not directly measured; therefore, the method comparison is based solely on the dynamics of moisture content changes during the drying process. The results thus relate to the efficiency of moisture removal, while an assessment of overall energy efficiency would require additional experimental measurements. Future research could explore the application of advanced machine learning approaches, including deep reinforcement learning [53], as such data-driven methods have the potential to capture complex nonlinear relationships and improve prediction, monitoring, and optimization of complex processes.

5. Conclusions

The results of this study confirm that the moisture content in sunflower seeds is primarily determined by the temperature and duration of the drying process, with the drying method and hybrid significantly influencing the dynamics of moisture removal. Conductive drying exhibited an almost linear response, while fluidized bed and vacuum drying resulted in a pronounced nonlinear dependence of moisture content on process parameters. Multivariate statistical analysis revealed significant interactions between drying method, temperature and duration, particularly for elemental composition, protein and fat content. The evaluated machine learning models demonstrated that nonlinear drying dynamics can be described very successfully by data-driven approaches, with artificial neural networks achieving the highest accuracy in predicting moisture content. Although the models were validated within a limited experimental domain, the results indicate their potential for assessing and optimizing the drying process. The integration of experimental measurements, multivariate statistics and machine learning represents an effective approach to developing advanced drying management systems aimed at preserving product quality and increasing energy efficiency. The main contribution of this research is the systematic evaluation and comparison of multiple regression and machine learning models for predicting the moisture content of sunflower seeds during the drying process under controlled experimental conditions. Future research should focus on expanding the dataset with additional drying conditions and a larger number of sunflower hybrids, as well as applying advanced machine learning models to improve prediction accuracy and develop intelligent systems for managing the drying process.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agriculture16060695/s1, Table S1: Descriptive statistics (mean ± SD) and Tukey HSD results for CHNSO, protein and fat.

Author Contributions

Conceptualization, I.B. and A.M.; methodology, I.B.; software, K.Š.; validation, I.T., A.P. and T.K.; formal analysis, B.M.; investigation, A.G.; data curation, A.G.; writing—original draft preparation, A.M.; writing—review and editing, A.G.; visualization, I.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data related to the research are included in this scientific paper and the available Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial neural networks
ANOVA	Analysis of variance
BTR	Boosted tree regression
C	Carbon
DM	Drying method
F	Fat content
GCV	Generalized cross validation
H	Hydrogen
HSD	Honestly significant difference
LM	Linear model
MAE	Mean average error
MAPE	Mean absolute percentage error
MARS	Multivariate Adaptive Regression Splines
MC	Moisture content
N	Nitrogen
OLS	Ordinary least squares
P	Protein content
RBF	Radial basis function
RFR	Random forest regression
RMSE	Root mean squared error
R²	Coefficient of determination
S	Sulfur
SD	Standard deviation
Smp.	Sample
T	Temperature
t	Time

References

Li, Z.; Xiang, F.; Huang, X.; Liang, M.; Ma, S.; Gafurov, K.; Gu, F.; Guo, Q.; Wang, Q. Properties and Characterization of Sunflower Seeds from Different Varieties of Edible and Oil Sunflower Seeds. Foods 2024, 13, 1188. [Google Scholar] [CrossRef]
Petraru, A.; Ursachi, F.; Amariei, S. Nutritional Characteristics Assessment of Sunflower Seeds, Oil and Cake. Perspective of Using Sunflower Oilcakes as a Functional Ingredient. Plants 2021, 10, 2487. [Google Scholar] [CrossRef] [PubMed]
Puraikalan, Y.; Scott, M. Sunflower Seeds (Helianthus annuus) and Health Benefits: A Review. Recent Prog. Nutr. 2023, 3, 010. [Google Scholar] [CrossRef]
Rehman, A.; Saeed, A.; Kanwal, R.; Ahmad, S.; Changazi, S.H. Therapeutic Effect of Sunflower Seeds and Flax Seeds on Diabetes. Cureus 2021, 13, e17256. [Google Scholar] [CrossRef] [PubMed]
De Oliveira Filho, J.G.; Egea, M.B. Sunflower Seed Byproduct and Its Fractions for Food Application: An Attempt to Improve the Sustainability of the Oil Process. J. Food Sci. 2021, 86, 1497–1510. [Google Scholar] [CrossRef] [PubMed]
Coradi, P.C.; Dubal, Í.T.P.; Bilhalva, N.D.S.; Fontoura, C.N.; Teodoro, P.E. Correlation Using Multivariate Analysis and Control of Drying and Storage Conditions of Sunflower Grains on the Quality of the Extracted Vegetable Oil. J. Food Process. Preserv. 2020, 44, e14961. [Google Scholar] [CrossRef]
Dabbour, M.; Sami, R.; Mintah, B.K.; He, R.; Wahia, H.; Khojah, E.; Petkoska, A.T.; Fikry, M. Effect of Drying Techniques on the Physical, Functional, and Rheological Attributes of Isolated Sunflower Protein and Its Hydrolysate. Processes 2021, 10, 13. [Google Scholar] [CrossRef]
Jimoh, K.A.; Hashim, N.; Shamsudin, R.; Man, H.C.; Jahari, M.; Onwude, D.I. Recent Advances in the Drying Process of Grains. Food Eng. Rev. 2023, 15, 548–576. [Google Scholar] [CrossRef]
Levent, İ.; Şahin, G.; Işık, G.; Van Sark, W.G.J.H.M. Comparative Analysis of Advanced Machine Learning Regression Models with Advanced Artificial Intelligence Techniques to Predict Rooftop PV Solar Power Plant Efficiency Using Indoor Solar Panel Parameters. Appl. Sci. 2025, 15, 3320. [Google Scholar] [CrossRef]
Ashtiani, S.-H.M.; Martynenko, A. Nature-Inspired Approaches for Optimizing Food Drying Processes: A Critical Review. Food Eng. Rev. 2025, 17, 270–290. [Google Scholar] [CrossRef]
Sharifani, K.; Amini, M. Machine Learning and Deep Learning A Review of Methods and Applications. World Inf. Technol. Eng. J. 2023, 10, 3897–3904. [Google Scholar]
Zuo, Y.; Jibril, A.N.; Yan, J.; Xia, Y.; Liu, R.; Chen, K. Optimization of Online Moisture Prediction Model for Paddy in Low-Temperature Circulating Heat Pump Drying System with Artificial Neural Network. Sensors 2025, 25, 2308. [Google Scholar] [CrossRef]
Uyeh, D.D.; Kim, J.; Lohumi, S.; Park, T.; Cho, B.-K.; Woo, S.; Lee, W.S.; Ha, Y. Rapid and Non-Destructive Monitoring of Moisture Content in Livestock Feed Using a Global Hyperspectral Model. Animals 2021, 11, 1299. [Google Scholar] [CrossRef]
LeBlanc, A.P.; Trabelsi, S.; Rasheed, K.; Miller, J.A. Machine Learning Algorithms for Nondestructive Sensing of Moisture Content in Grain and Seed. IEEE Open J. Instrum. Meas. 2025, 4, 2500214. [Google Scholar] [CrossRef]
Huang, P.; Yuan, J.; Yang, P.; Xiao, F.; Zhao, Y. Nondestructive Detection of Sunflower Seed Vigor and Moisture Content Based on Hyperspectral Imaging and Chemometrics. Foods 2024, 13, 1320. [Google Scholar] [CrossRef]
Yang, M.-D.; Hsu, Y.-C.; Tseng, W.-C.; Lu, C.-Y.; Yang, C.-Y.; Lai, M.-H.; Wu, D.-H. Assessment of Grain Harvest Moisture Content Using Machine Learning on Smartphone Images for Optimal Harvest Timing. Sensors 2021, 21, 5875. [Google Scholar] [CrossRef]
Dmitriev, P.A.; Dmitrieva, A.A.; Kozlovsky, B.L. Evaluation of Sunflower Seed Moisture Content by Spectral Characteristics of Inflorescences in the VNIR. Seeds 2025, 4, 55. [Google Scholar] [CrossRef]
Botero-valencia, J.; García-pineda, V.; Valencia-arias, A.; Valencia, J.; Reyes-vera, E.; Mejia-herrera, M.; Hernández-garcía, R. Machine Learning in Sustainable Agriculture: Systematic Review and Research Perspectives. Agriculture 2025, 15, 377. [Google Scholar] [CrossRef]
Fan, L.; Pei, Y.; Zhang, L.; Kong, J.; Xu, W. Applications of Machine Learning Models in Agricultural Product Drying: A Comprehensive Review of Advances, Challenges, and Prospects. Food Bioprocess Technol. 2025, 18, 10047–10085. [Google Scholar] [CrossRef]
ISO 16948:2015; Solid Biofuels—Determination of Total Content of Carbon, Hydrogen and Nitrogen. International Organization for Standardization: Geneva, Switzerland, 2015.
ISO 15178:2000; Soil Quality—Determination of Total Sulfur by Dry Combustion. International Organization for Standardization: Geneva, Switzerland, 2000.
Aguirre, J. The Kjeldahl Method: 140 Years; Springer: Cham, Switzerland, 2023; ISBN 978-3-031-31458-2. [Google Scholar] [CrossRef]
ISO 659:2009; Oilseeds—Determination of Oil Content (Reference Method). International Organization for Standardization: Geneva, Switzerland, 2009.
ISO 665:2020; Oilseeds—Determination of Moisture and Volatile Matter Content. International Organization for Standardization: Geneva, Switzerland, 2020.
van Rossum, G.; Python Development Team. Python Tutorial Release 3.7.0; Python Software Foundation: Wilmington, DE, USA, 2018; pp. 1–155. [Google Scholar]
Bolikulov, F.; Nasimov, R.; Rashidov, A.; Akhmedov, F.; Cho, Y.-I. Effective Methods of Categorical Data Encoding for Artificial Intelligence Algorithms. Mathematics 2024, 12, 2553. [Google Scholar] [CrossRef]
Koukaras, P.; Tjortjis, C. Data Preprocessing and Feature Engineering for Data Mining: Techniques, Tools, and Best Practices. AI 2025, 6, 257. [Google Scholar] [CrossRef]
Singgih, I.K.; Singgih, M.L. Regression Machine Learning Models for the Short-Time Prediction of Genetic Algorithm Results in a Vehicle Routing Problem. World Electr. Veh. J. 2024, 15, 308. [Google Scholar] [CrossRef]
Brandić, I.; Pezo, L.; Bilandžija, N.; Peter, A.; Šurić, J.; Voća, N. Artificial Neural Network as a Tool for Estimation of the Higher Heating Value of Miscanthus Based on Ultimate Analysis. Mathematics 2022, 10, 3732. [Google Scholar] [CrossRef]
N., G.; Jain, P.; Choudhury, A.; Dutta, P.; Kalita, K.; Barsocchi, P. Random Forest Regression-Based Machine Learning Model for Accurate Estimation of Fluid Flow in Curved Pipes. Processes 2021, 9, 2095. [Google Scholar] [CrossRef]
Shehab, E.Q.; Taha, F.F.; Muhodir, S.H.; Imran, H.; Ostrowski, K.A.; Piechaczek, M. Gradient Boosting Regression Tree Optimized with Slime Mould Algorithm to Predict the Higher Heating Value of Municipal Solid Waste. Energies 2024, 17, 4213. [Google Scholar] [CrossRef]
Tufail, S.; Riggs, H.; Tariq, M.; Sarwat, A.I. Advancements and Challenges in Machine Learning: A Comprehensive Review of Models, Libraries, Applications, and Algorithms. Electronics 2023, 12, 1789. [Google Scholar] [CrossRef]
Kim, S.-J.; Bae, S.-J.; Jang, M.-W. Linear Regression Machine Learning Algorithms for Estimating Reference Evapotranspiration Using Limited Climate Data. Sustainability 2022, 14, 11674. [Google Scholar] [CrossRef]
Abed, M.S.; Kadhim, F.J.; Almusawi, J.K.; Imran, H.; Bernardo, L.F.A.; Henedy, S.N. Utilizing Multivariate Adaptive Regression SPLines (MARS) for Precise Estimation of Soil Compaction Parameters. Appl. Sci. 2023, 13, 11634. [Google Scholar] [CrossRef]
Chicco, D.; Warrens, M.J.; Jurman, G. The Coefficient of Determination R-Squared Is More Informative than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
Das, B.K.; Kader, M.A.; Hoque, S.M.N. Energy Recovery Potential from Municipal Solid Waste in Rajshahi City by Landfill Technique. Int. J. Renew. Energy Res. 2014, 4, 349–354. [Google Scholar]
Vivas, E.; Allende-Cid, H.; Salas, R. A Systematic Review of Statistical and Machine Learning Methods for Electrical Power Forecasting with Reported MAPE Score. Entropy 2020, 22, 1412. [Google Scholar] [CrossRef]
Abasi, S.; Mousavi, S.M.; Mohebi, M.; Kiani, S. Effect of Time and Temperature on Moisture Content, Shrinkage, and Rehydration of Dried Onion. Iran. J. Chem. Eng. 2009, 6, 57–70. [Google Scholar]
Khan, M.I.H.; Batuwatta-Gamage, C.P.; Karim, M.A.; Gu, Y. Fundamental Understanding of Heat and Mass Transfer Processes for Physics-Informed Machine Learning-Based Drying Modelling. Energies 2022, 15, 9347. [Google Scholar] [CrossRef]
Kabutey, A.; Herák, D.; Mizera, Č. Determination of Maximum Oil Yield, Quality Indicators and Absorbance Spectra of Hulled Sunflower Seeds Oil Extraction under Axial Loading. Foods 2022, 11, 2866. [Google Scholar] [CrossRef]
Zhilin, A.; Zhilin, A.; Fedorov, A.; Fedorov, A.; Grebenshchikov, D.; Grebenshchikov, D. Dynamics of Acousto-Convective Drying of Sunflower Cake Compared with Drying by a Traditional Thermo-Convective Method. Foods Raw Mater. 2018, 6, 370–378. [Google Scholar] [CrossRef]
Çetin, N. Prediction of Moisture Ratio and Drying Rate of Orange Slices Using Machine Learning Approaches. J. Food Process. Preserv. 2022, 46, e17011. [Google Scholar] [CrossRef]
Khan, M.I.H.; Sablani, S.S.; Joardder, M.U.H.; Karim, M.A. Application of Machine Learning-Based Approach in Food Drying: Opportunities and Challenges. Dry. Technol. 2020, 40, 1051–1067. [Google Scholar] [CrossRef]
Martynenko, A.; Misra, N.N. Machine Learning in Drying. Dry. Technol. 2019, 38, 596–609. [Google Scholar] [CrossRef]
Cang, S.; Yu, H. A probabilty neural network for continuous and categorical data. IFAC Proc. Vol. 2005, 38, 203–208. [Google Scholar] [CrossRef]
Brandić, I.; Pezo, L.; Bilandžija, N.; Peter, A.; Šurić, J.; Voća, N. Comparison of Different Machine Learning Models for Modelling the Higher Heating Value of Biomass. Mathematics 2023, 11, 2098. [Google Scholar] [CrossRef]
Zhou, Z.; Qiu, C.; Zhang, Y. A Comparative Analysis of Linear Regression, Neural Networks and Random Forest Regression for Predicting Air Ozone Employing Soft Sensor Models. Sci. Rep. 2023, 13, 22420. [Google Scholar] [CrossRef]
Simonič, M.; Ficko, M.; Klančnik, S. Predicting Corn Moisture Content in Continuous Drying Systems Using LSTM Neural Networks. Foods 2025, 14, 1051. [Google Scholar] [CrossRef]
Aghbashlo, M.; Hosseinpour, S.; Mujumdar, A.S. Application of Artificial Neural Networks (ANNs) in Drying Technology: A Comprehensive Review. Dry. Technol. 2015, 33, 1397–1462. [Google Scholar] [CrossRef]
Chai, H.; Chen, X.; Cai, Y.; Zhao, J. Artificial Neural Network Modeling for Predicting Wood Moisture Content in High Frequency Vacuum Drying Process. Forests 2018, 10, 16. [Google Scholar] [CrossRef]
Liu, X.; Chen, X.; Wu, W.; Peng, G. A Neural Network for Predicting Moisture Content of Grain Drying Process Using Genetic Algorithm. Food Control 2006, 18, 928–933. [Google Scholar] [CrossRef]
Martínez-Martínez, V.; Gomez-Gil, J.; Stombaugh, T.S.; Montross, M.D.; Aguiar, J.M. Moisture Content Prediction in the Switchgrass (Panicum virgatum) Drying Process Using Artificial Neural Networks. Dry. Technol. 2015, 33, 1708–1719. [Google Scholar] [CrossRef]
Hazem, Z.B.; Saidi, F.; Guler, N.; Altaif, A.H. A Hybrid Reinforcement Learning Framework Combining TD3 and PID Control for Robust Trajectory Tracking of a 5-DOF Robotic Arm. Automation 2025, 6, 56. [Google Scholar] [CrossRef]

Figure 1. Surface contour plots of moisture content (MC, %) as a function of drying temperature and drying time for different sunflower hybrids.

Figure 2. Target vs. predicted MC for (a) ANN, (b) RFR, (c) BTR, (d) SVR, (e) Linear, (f) MARS.

Table 1. Summary of laboratory analyses performed, including devices and protocols used.

Laboratory Analysis	Protocol/Reference
Elemental analysis (determination of C, H, N, S and O)	ISO 16948:2015 [20] ISO 15178:2000 [21]
Determination of protein content	Kjeldahl Method [22]
Determination of fat content	ISO 659:2009 [23]
Moisture content determination	ISO 665:2020 [24]

Table 2. Representative subset of the experimental dataset used as input for machine learning models.

Input Values					Output Value
No.	Drying Method	Sample	Temperature (°C)	Time (Minutes)	MC (%)
1	1	1	50	15	16.82
2	1	1	50	30	15.59
…
21	1	2	60	15	14.87
22	1	2	60	30	13.91
23	1	2	60	45	11.16
24	1	2	60	60	11.12
25	1	2	70	15	13.68
26	1	2	70	30	12.87
…
86	2	3	60	30	12.40
87	2	3	60	45	9.96
88	2	3	60	60	9.70
89	2	3	70	15	13.79
…
120	3	2	60	60	8.09
121	3	2	70	15	10.36
122	3	2	70	30	9.66
…
141	3	3	80	15	11.73
142	3	3	80	30	9.76
143	3	3	80	45	8.98
144	3	3	80	60	7.23

Drying method: 1—fluid drying, 2—vacuum drying, 3—conduction drying; Sample: 1—Sumiko, 2—Pioneer, 3—Agromatic Lidea.

Table 3. Settings of created regression models in research.

Abb.	Model Settings	Equation	Explanation	Ref.
ANN	– learning cycles: 100,000 – data split: 70/15/15 – hidden neurons: 10 – learning rate: 0.01	$Y = f_{1} (W_{2} \cdot f_{2} (W_{1} \cdot X + B_{1}) + B_{2}$	X denotes a vector of input variables. W1 is the weight matrix between the input and hidden layers. B1 is the bias of the hidden layer. f2 is the activation function of the hidden layer. W2 is the weight matrix between the hidden and output layers. B2 is the bias of the output layer. f1 is the output activation function. Y is the output value of the model.	[29]
RFR	– number of trees: 500 – feature subset size: p/3 – bootstrap: enabled – node size: 5	$Y = \frac{1}{K} \sum_{k = 1}^{K} h_{k} (x)$	Y is the final prediction of the model. K is the total number of regression trees in the ensemble, h_k(x) is the output of the kth regression tree for the given input. The final value is obtained by averaging all individual predictions.	[30]
BTR	– number of trees: 1000 – learning rate: 0.05 – tree depth: 3 – subsample: 0.7	$f (x) = f_{0} (x) + \sum_{m = 1}^{M} \sum_{j = 1}^{J} c_{m j} I (x \in R_{m j})$	f(x) is the final prediction of the model. f0(x) is the initial baseline estimate. M is the number of iterations or trees. J is the number of terminal regions in each tree. cmj is the contribution of the jth region in the mth tree. Rmj is the corresponding region of the input variable space. The indicator function shows whether the input belongs to that region.	[31]
SVR	– kernel: RBF – C: 10 – epsilon: 0.1 – gamma: 1/p	$f (x) = W^{T} φ (X) + b$	f(x) is the output regression function. W is the weight vector. W^T is the transposed weight vector. φ(X) is the mapping of the input variables into the feature space. b is the free term of the model. X is the input vector.	[32]
LM	– method: OLS – predictor scaling: yes – validation: 10-fold – significance level: 0.05	$y = w_{1} x_{1} + b$	y is the dependent variable, x1 is the independent variable, w1 is the regression coefficient, and b is the constant term.	[33]
MARS	– max basis functions: 30 – interaction degree: 2 – pruning: GCV – knot penalty: default	$y = f (x) + e$	y is the dependent variable. f(x) is the estimated nonlinear function composed of the basis functions. x is the predictor vector. e is the residual error of the model.	[34]

ANN—Artificial neural networks; RFR—Random Forest regression; BTR—Boosted tree regression; SVR—Support vector regression; LM—Linear model; MARS—Multivariate adaptive regression splines.

Table 4. Hyperparameter tuning configurations for the evaluated machine learning models.

Model	Test No.	Learning Rate	Max Iter	Hidden Neurons	Trees	Tree Depth	Node Size	C	Epsilon	Max Basis Functions	Interaction Degree
ANN	Test 1	0.001	50,000	5	–
	Test 2	0.01	100,000	10
	Test 3	0.01	100,000	15
	Test 4	0.05	150,000	10
	Test 5	0.01	100,000	20
	Selected	0.01	100,000	10
RFR	Test 1	–			100	–	3	–
	Test 2				300		5
	Test 3				500		5
	Test 4				300		10
	Test 5				500		10
	Selected	–	–		500		5
	Selected	–	–		500
BTR	Test 1	0.01			500	2	–
	Test 2	0.05			800	3
	Test 3	0.05			1000	3
	Test 4	0.10			800	3
	Test 5	0.05			1000	4
	Selected	0.05			1000	3
SVR	Test 1	–						1	0.01	–
	Test 2							5	0.1
	Test 3							10	0.1
	Test 4							10	0.2
	Test 5							20	0.1
	Selected							10	0.1
MARS	Test 1							–		20	1
	Test 2									30	1
	Test 3									30	2
	Test 4									40	2
	Test 5									40	3
	Selected									30	2

Table 5. Hyperparameter search ranges and selected values for the evaluated machine learning models.

Model	Hyperparameter	Search Range	Selected Value
ANN	Hidden neurons	[5, 10, 15, 20]	10
	Learning rate	[0.001, 0.01, 0.05]	0.01
	Max iterations	[50,000, 100,000, 150,000]	100,000
RFR	Number of trees	[100, 300, 500]	500
RFR	Node size (min samples leaf)	[3, 5, 10]	5
BTR	Number of trees	[500, 800, 1000]	1000
	Learning rate	[0.01, 0.05, 0.10]	0.05
	Tree depth	[2, 3, 4]	3
SVR	C	[1, 5, 10, 20]	10
SVR	Epsilon	[0.01, 0.1, 0.2]	0.1
MARS	Max basis functions	[20, 30, 40]	30
MARS	Interaction degree	[1, 2, 3]	2
Linear model	Method	OLS	OLS
	Predictor scaling	[No, Yes]	Yes
	Validation	[5-fold, 10-fold]	10-fold
	Significance level	[0.01, 0.05]	0.05

Table 6. Initial moisture content of different sunflower seed samples.

Sample	MC (%)
Sumiko	18.35 ± 0.06 ^c
Pioneer	15.4 ± 0.18 ^a
Agromatic Lidea	16.1 ± 0.10 ^b
Statistical significance	*

MC—Moisture content; Statistical significance: * p < 0.01. Different letters in the MC column indicate a statistically significant difference according to the post hoc Tukey HSD test (p < 0.05).

Table 7. Descriptive statistics of measured variables obtained during sunflower seed drying experiments.

Variable	Range		Mean	SD	Mean ± SD
Variable	Minimum	Maximum	Mean	SD	Mean ± SD
O (%)	21.70	54.66	47.03	7.62	47.03 ± 7.62
N (%)	1.75	3.12	2.37	0.30	2.37 ± 0.3
C (%)	36.10	65.19	42.95	6.88	42.95 ± 6.88
S (%)	0.12	0.52	0.20	0.05	0.2 ± 0.05
H (%)	6.67	9.74	7.50	0.62	7.5 ± 0.62
Proteins (%)	9.26	16.51	12.56	1.62	12.56 ± 1.62
Fat (%)	38.71	51.22	47.12	2.85	47.12 ± 2.85

SD—Standard deviation.

Table 8. Univariate analysis of the influence of categorical factors and their interactions on the variables examined.

Effect	O (%)	N (%)	C (%)	S (%)	H (%)	Proteins (%)	Fat (%)
DM	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001
Smp.	<0.001	<0.001	<0.001	0.27	0.67	<0.001	<0.001
T	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001
t	<0.001	<0.001	<0.001	0.04	0.16	<0.001	<0.001
DM × Smp.	<0.001	<0.001	<0.001	0.62	0.05	<0.001	<0.001
DM × T	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001
Smp. × T	<0.001	<0.001	0.01	0.32	0.03	<0.001	<0.001
DM × t	0.01	0.02	<0.001	0.36	0.17	0.02	<0.001
Smp. × t	0.12	0.15	0.07	0.29	0.80	0.15	<0.001
T × t	<0.001	0.32	<0.001	0.10	0.06	0.32	<0.001
DM × Smp. × T	<0.001	<0.001	<0.001	0.24	<0.001	<0.001	<0.001
DM × Smp. × t	0.03	0.02	0.02	0.37	0.56	0.02	<0.001
DM × T × t	<0.001	0.11	<0.001	0.15	0.08	0.11	<0.001
Smp. × T × t	0.04	0.08	0.14	0.09	0.27	0.08	<0.001
DM × Smp. × T × t	0.02	<0.001	0.03	0.76	0.37	<0.001	<0.001

DM—Drying method; Smp.—Sample; T—Temperature; t—Time.

Table 9. Performance of evaluated machine learning models for predicting the output variable moisture content.

Model	R²	RMSE (%)	MAE (%)	MAPE (%)
ANN	0.97	0.46	0.32	2.97
RFR	0.76	1.28	1.03	9.47
BTR	0.85	1.00	0.82	7.46
SVR	0.94	0.66	0.51	4.60
Linear	0.69	1.45	1.11	10.13
MARS	0.85	1.00	0.80	7.33

ANN—Artificial neural networks; RFR—Random Forest regression; BTR—Boosted tree regression; SVR—Support vector regression; MARS—Multivariate adaptive regression splines.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Matin, A.; Brandić, I.; Špelić, K.; Tomić, I.; Pavlović, A.; Matin, B.; Krička, T.; Galić, A. Experimental Modelling of Sunflower Seed Moisture Content During Controlled Drying Using Machine Learning Methods. Agriculture 2026, 16, 695. https://doi.org/10.3390/agriculture16060695

AMA Style

Matin A, Brandić I, Špelić K, Tomić I, Pavlović A, Matin B, Krička T, Galić A. Experimental Modelling of Sunflower Seed Moisture Content During Controlled Drying Using Machine Learning Methods. Agriculture. 2026; 16(6):695. https://doi.org/10.3390/agriculture16060695

Chicago/Turabian Style

Matin, Ana, Ivan Brandić, Karlo Špelić, Ivana Tomić, Aleksandra Pavlović, Božidar Matin, Tajana Krička, and Ante Galić. 2026. "Experimental Modelling of Sunflower Seed Moisture Content During Controlled Drying Using Machine Learning Methods" Agriculture 16, no. 6: 695. https://doi.org/10.3390/agriculture16060695

APA Style

Matin, A., Brandić, I., Špelić, K., Tomić, I., Pavlović, A., Matin, B., Krička, T., & Galić, A. (2026). Experimental Modelling of Sunflower Seed Moisture Content During Controlled Drying Using Machine Learning Methods. Agriculture, 16(6), 695. https://doi.org/10.3390/agriculture16060695

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Experimental Modelling of Sunflower Seed Moisture Content During Controlled Drying Using Machine Learning Methods

Abstract

1. Introduction

2. Materials and Methods

2.1. Laboratory Analysis

2.2. Drying Process

2.3. Data Processing

2.4. Data Cleaning and Encoding

2.5. Evaluation of Existing Machine Learning Models

2.6. Performance of Evaluated Machine Learning Models

3. Results

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI