Exploring Risk Factors of Mycotoxin Contamination in Fresh Eggs Using Machine Learning Techniques

Omar, Eman; Alsaidi, Eman; Aref, Abdullah; Omar, Sharaf; Mustafa, Wafa’ Bani; Milhem, Hind

doi:10.3390/computers15010034

Open AccessArticle

Exploring Risk Factors of Mycotoxin Contamination in Fresh Eggs Using Machine Learning Techniques

by

Eman Omar

^1,*

,

Eman Alsaidi

^2,*,

Abdullah Aref

³

,

Sharaf Omar

⁴

,

Wafa’ Bani Mustafa

⁵ and

Hind Milhem

¹

Department of Information Technology, Faculty of Prince Al-Hussein Bin Abdallah II for Information Technology, The Hashemite University, Zarqa 13133, Jordan

²

Department of Autonomous Systems, Al-Balqa Applied University, Al-Salt 19117, Jordan

³

Department of Data Science, King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Amman 11941, Jordan

⁴

Department of Nutrition and Food Processing, Al-Balqa Applied University, Al-Salt 19117, Jordan

⁵

Department of Computer Science, Philadelphia University, Jerash 19392, Jordan

^*

Authors to whom correspondence should be addressed.

Computers 2026, 15(1), 34; https://doi.org/10.3390/computers15010034

Submission received: 24 October 2025 / Revised: 25 December 2025 / Accepted: 25 December 2025 / Published: 7 January 2026

(This article belongs to the Special Issue Machine Learning: Techniques, Industry Applications, Code Sharing, and Future Trends)

Download

Browse Figures

Versions Notes

Abstract

Mycotoxins are toxic compounds produced by certain fungi, whose health effects may be significant when they contaminate fresh eggs. Conventional methods of mycotoxin analysis, while accurate, are labor-intensive, time-consuming, and impractical for large-scale screening applications. This study attempts to use using machine learning techniques to predict the concentration and presence of deoxynivalenol (DON), aflatoxin B1 (AFB1), and ochratoxin A (OTA) in fresh eggs from Jordan. Rather than replacing analytical detection methods, the proposed approach can enable a risk-based prioritization of samples for laboratory testing by identifying high-risk samples based on environmental and production factors. A dataset consisting of 1250 poultry egg samples collected between January and July 2024 under several factors involving environmental conditions and chemical assay results regarding mycotoxin content in eggs was used. Several machine learning algorithms were used in this study to build predictive models, including decision trees, support vector machines, and neural networks. The results indicate that machine learning can accurately and reliably predict mycotoxin contamination, which demonstrates the potential for integrating machine learning into food safety protocols. This study contributes toward developing predictive analytics for food safety and lays the groundwork for future research aimed at improving contamination monitoring systems.

Keywords:

mycotoxins; aflatoxin B1; risk assessment; poultry eggs; Jordan; machine learning

1. Introduction

Small-molecule fungal metabolites known as mycotoxins are generated by a diverse array of fungal strains [1,2]. Mycotoxins, which are toxic fungal by-products, contaminate a wide range of global agricultural products (including grain, fruit, and fresh eggs) when storage conditions favor fungal growth; this widespread contamination, often alongside other contaminants in products like beverages and animal feed, poses significant risks to human and animal health, potentially causing immediate or long-term ailments like a weakened immune system, cancer, and neurological abnormalities. Special attention is required for highly toxic and widely distributed mycotoxins, such as aflatoxins and ochratoxins, necessitating stringent food safety measures, especially for essential nutritional sources like fresh eggs [3,4,5].

Conditions that permit feedstuff contamination with fungus and their toxins vary from high humidity to certain temperatures. In tropical regions, it is unavoidable and predictable that during harvest periods, transit, and storage, mycotoxins will be present [6]. Of the more than 300 mycotoxins that have been identified, aflatoxins (AFs), deoxynivalenol (DON), ochratoxins, zearalenone (ZEA), fumonisin, and T-2 toxin are consistently regarded as the most critical based on their demonstrated toxicity and high rates of occurrence [7]. However, a small number of them are of practical interest in public health based on their effects concerning animal health, transmissivity, and human toxicity, including aflatoxins, ochratoxins, and trichothecenes, which consist of deoxynivalenol, zearalenone, and fumonisins [8].

Studies on broilers and layer chickens show that layer chickens can tolerate higher amounts of the same substance than broilers, but not beyond 50 parts per billion [6]. Aflatoxin poisoning could reduce the birds’ ability to withstand stress through diminished immune response, hence reduced egg size in addition to lower egg production. It is also crucial to carefully consider the inclusion of contaminated feed due to the direct consumption of eggs as food by humans and the presence of aflatoxin metabolites in egg yolks [9].

Safety is a main pillar of production. Since poultry and its by-products are considered among the high protein supply as major elements in the food chain, they are of great concern in the current emphasis on safety. It is essential to note that mycotoxin represents possible contamination of food and feed, since it develops as a consequence of fungal contamination [10].

Food and feed contaminated with mycotoxins can pose a potential danger to human and animal health mainly through their carcinogenic and toxic effects [11]. International studies on several mycotoxins were carried out, particularly those mostly found in human food at high occurrence. These substances are classified into several groups based on their effects. Aflatoxins are classed as group 1 carcinogens, ochratoxin as group 2B carcinogens, and zearalenone as group 3 carcinogens [12].

Mycotoxins, toxic secondary metabolites produced by fungi, pose a significant threat to food safety and public health. These hazardous compounds are often generated when storage conditions favor fungal growth, specifically temperatures between 10 °C and 40 °C, a pH range of 4–8, and water activity exceeding 0.70 [13]. The resulting mycotoxin contamination incurs substantial economic losses, primarily due to compromised food quality and decreased production performance in livestock. Crucially, mycotoxins are associated with serious health risks, including hepatic carcinomas, esophageal cancers, immunosuppression, neurotoxicity, mutagenicity, and oestrogenic effects [14,15,16]. The International Agency for Research on Cancer (IARC) classifies aflatoxins (AFs) as group 1 carcinogens and zearalenone (ZEA) as group 3 carcinogens [17]. The degree of contamination in animal feeds is not constant, as it is heavily influenced by factors such as the duration of exposure, the feed consumption rate, the concentration of contaminants, and environmental conditions (e.g., geography and season) [18]. Numerous studies have highlighted mycotoxin presence in poultry feed. For instance, recent research detected aflatoxins

{AFB}_{1}

(1.0 mg/kg),

{AFB}_{2}

(0.30 mg/kg),

{AFG}_{1}

(0.14 mg/kg),

{AFG}_{2}

(0.05 mg/kg), and total AFs (1.30 mg/kg) in bird samples [19], while another study reported

{AFB}_{1}

(0.81 mg/kg) and

{AFB}_{2}

(0.83 mg/kg) in hen and duck samples [20]. Maintaining food health is paramount along the entire supply chain, particularly for high-protein animal products like poultry and its by-products [20]. Mycotoxins can be carried over from contaminated feed into the bird, negatively impacting key parameters of poultry production, such as growth, laying performance, egg production, and overall quality [21]. Moreover, the animal’s sensitivity to mycotoxin-laden feed is modulated by intrinsic factors, including its gender, physiological state, and age [22]. Chicken eggs provide a balanced diet for intake at all levels of ages and specifically offer high nutrients with low calories and easy digestibility for adults [23]. Mycotoxins get absorbed in the food, particularly eggs in layers, from the consumption of feed by birds. According to one review: Mycotoxin carryover into edible tissues depends on the particular type of mycotoxins and species of animals being fed.

Chicken eggs are extremely valuable commodities in international trade. They provide a well-rounded meal suitable for all age groups and are considered beneficial for adults due to their high nutrient content, low calories count, and ease of digestion [23]. Mycotoxins absorption to food—such as eggs in chicken—takes place by the way the bird consumes contaminated feed. According to a review carried out by Blank [8], the transfer of mycotoxins into edible tissues depends on a particular type of mycotoxins and what species of animal is involved [24].

In Jordan, eggs are considered a major source of nutrients. However, the presence of food-borne residues, such as mycotoxins, in eggs is considered a health hazard to consumers. This is particularly alarming when considering children, who are more vulnerable than adults and consume more eggs [10]. After consumption, mycotoxins are metabolized in the liver and kidneys. The results of this metabolism depend on the parent molecules, which makes tracking their potential effects very difficult.

Common ways to find mycotoxins in eggs and other foods use chemical tests, such as high-performance liquid chromatography (HPLC) and enzyme-linked immunosorbent assays (ELISA). While these methods work well, they can be expensive, time consuming and need special equipment and knowledge. The sample-by-sample check may lime the scalability of such methods.

Machine learning (ML) has emerged as a promising predictive tool in many disciplines, including the domain of food safety [25]. Models based on decision trees, support vector machines, and neural networks can improve prediction accuracy of mycotoxin contamination based on various input variables—such as environmental factors, feed ingredients, historical contamination information—without the need for laborious chemical analyses. The integration of ML into food safety practice can enhance the monitoring and regulation practice on mycotoxin levels with improved efficiency in risk forecasting; thus reducing health risks related to the presence of mycotoxins.

This work attempts to discuss the appropriateness of machine learning techniques in determining the degree of residue of mycotoxins, for example, Fumonisin B, Aflatoxin B1, deoxynivalenol, and zearalenone in locally produced hen’s table eggs, aiming to investigate the viability and effectiveness of using ML based models to predict contamination levels in freshly laid eggs. We build several predictive models and compare them to enlighten whether machine learning may be leveraged for the detection of mycotoxin residues and at what probable extent, hopefully providing significant insights into the utilization of ML to improve food safety and establish a basis for future progress in this domain.

2. Related Work

Machine learning (ML) offers a significant breakthrough in mycotoxin detection by overcoming the limitations of conventional methods. Utilizing its unique capability to analyze vast datasets and discern complex, subtle patterns that are often imperceptible to human analysis, ML algorithms deliver rapid, highly accurate, and automated identification of mycotoxins in both crops and processed food products [26].

The most common category of ML used in mycotoxin prediction is supervised learning. Supervised learning involves training the algorithm on a labeled dataset, making use of both input features-for example, weather data, crop data-and the target feature as mycotoxin levels.Several machine learning approaches have successfully applied mycotoxin prediction and detection applications. Among traditional supervised learning methods, Support Vector Machines (SVM), Random Forests (RF), and neural networks seem promising regarding processing complex datasets produced by various analytical techniques [27]. Neural Networks (NNs) and Deep Neural Networks (DNNs) have been widely used for this purpose, with studies demonstrating high accuracy (>75%) in predicting aflatoxin and fumonisin contamination in maize [28]. Bayesian networks have successfully predicted mycotoxin occurrence in fruits and vegetables for geographically dispersed countries [29].

In the work of Inglis et al. [30], a systematic review is presented, depicting the broad application of ML toward mycotoxin detection in food. It highlights major benefits and advantages of ML approachs such as efficiency, and cost, while maintaining reasonable levels of accuracy for mycotoxin detection compared to traditional methods.

Numerous studies have successfully applied ML to predict mycotoxin levels in cereal grains and other agricultural commodities [28,31,32]. Some others have looked at carry-over into animal tissues or milk, but there has been little direct application of these predictive technologies to eggs [33,34], and even in breast feed milk [35]. This is a significant oversight since eggs are one of the world’s most important dietary staples and they can be readily contaminated by the carry-over of mycotoxins from feed. As the process that covers the transfer of mycotoxins from feed to eggs is considered complex as it is controlled by several factors such as type and concentration levels of mycotoxin in the feed, hen’s breed and age, health status, metabolic processes, and also egg formation dynamics, it is likely that predicting levels of mycotoxins in eggs could use a different set of input features or even modeling approaches that would work for predicting contamination in crops. Typical relevant predictors include feed composition and its mycotoxin profile, hen health parameters (e.g., liver function biomarkers, immune status indicators), production data (egg laying rate, feed conversion ratio), and the environmental condition inside the poultry house. The limited number of studies that even tangentially address mycotoxins in eggs generally focus on detection methods or general impacts on poultry health and egg quality rather than developing predictive models based on ML [17,20,34]. If robust ML models for predicting mycotoxin risk in eggs could be developed, they can provide the poultry industry and food safety regulators with a powerful tool for acting proactively against contamination risk to ensure egg safety and public health protection itself. Such models may indicate high-risk flocks based on feed management strategies to optimize targeted testing protocols that make present control measures efficient and effective. The development of such models could potentially identify high-risk flocks, guide feed management strategies, and optimize targeted testing protocols, leading to more efficient and effective control measures call for particular efforts in collecting data specific to the poultry-egg system, an exploration of relevant features, selection, and validation of careful ML algorithms.

3. Materials and Methods

The feasibility study comprises the subsequent phases:

1. In order to correctly prepare the data for the ML models, it is important to understand the existing data by analyzing its correlation and geographical features. This involves examining as many predictors as possible while ensuring that there are no missing entries and the data is satisfied.

2. Implementing the algorithm involves selecting predictors, tuning hyperparameters, and fitting the final prediction model.

3. Analysis of outcomes, such as the interpretation of the variable importance plots that illustrate the impact of each predictor on the statistical measures. Partial dependence plots visually illustrate the individual impact of selected predictors on the outcome.

4. Validate the results using the predictive model in a separate dataset and comparing the predictive model derived from the ML model with a mechanistic model.

5. Conduct a thorough evaluation of the benefits and drawbacks of the technique and draw conclusions about the data and expert knowledge necessary to develop a predictive model.

3.1. Data Description

A total of 1250 fresh egg samples were collected from grocery stores in several Jordanian governates, including Amman, Zarqa’a, Balqa’a, Madaba, Mafraq, Irbid, Jarash, Karak, Ajloun, Aqaba, AL Tafelah and Ma’an, as part of the current cross-sectional study. The random collection of poultry eggs occurred from January to July in 2024. Poultry eggs that were 7–10 days old and stored in a clean icebox at a temperature below 4 °C were delivered to the laboratory. The eggs were divided into yolk and white. The detection of TAF, OTA and DON involved the use of preparation and extraction procedures that were comparable to previous research, with minor adjustments.The concentration of mycotoxins in the samples was determined using different ELISA kits.

Two types of ELISA kits were utilized: AgraQuant Aflatoxin M1fast (range: 100–2000 ng/kg) and AgraQuant Aflatoxin M1sensitive (range: 25–500 ng/kg), both supplied by Romer Labs, Singapore. The kits were stored at a temperature of 2–8 °C until use. Prior to analysis, they were equilibrated to room temperature for 1 h. All procedures were carried out in accordance with the manufacturer’s instructions provided by Romer Labs.

To give more insight into our collected data, Table 1 represents the mean and median values of both factors that affect mycotoxins formation, along with the mean and median of different types of mycotoxins.

3.2. Sample Preparation

To optimize the extraction of Mycotoxins (TAF, DON, OTA) from egg, we weighed 1 g of homogenized fresh egg and mixed it with 5 mL of various extraction media:

Water to which 0.05% Tween 20 had been added (0.05% Tween 20);
Water: Methanol 30:70 (Methanol 70%);
Water: Methanol 70:30, in which 0.3 M NaCl had been added to water (Methanol 70%/0.3 M NaCl);
Water: Methanol 70:30 followed by the addition of hexane to remove fatty components.

After 2 min of vigorous stirring at room temperature, the samples were centrifuged at 3200× g to reduce foam and remove denatured proteins. Supernatants were diluted

1 + 1

with water and analysed by direct competitive ELISA. Each sub-sample was extracted in duplicate and analyzed in quadruplicate. The optimal protocol involved extraction with aqueous methanol (70%) followed by defatting with hexane.

3.2.1. Mold Counts in Poultry Feed

Random samples (10 to 30 representative samples of 1 kg) were collected from different poultry farms in several governorates of Jordan, including Amman, Zarqa’a, Balqa’a, Madaba, Mafraq, Irbid, Jarash, Karak, Ajloun, Aqaba, Al Tafelah, and Ma’an. These samples were used for mycological examination. Feed samples intended for mycological examination were usually analyzed immediately upon arrival or, if necessary, stored for 2–3 days in paper bags at room temperature (22–25 ºC).

3.2.2. Isolation of Poultry Feed Fungi

The dilute plate technique was used for the isolation of fungi from the samples. General mold counts were carried out by weighing 20 g of poultry mixed feed samples and mixing them with 180 mL of saline solution (0.85% sodium chloride) containing 0.05% Tween 80 on a horizontal shaker for approximately 30 min. Then 0.1 mL of the appropriate dilutions made up to

10^{- 5}

was applied to Dichloran Rose Bengal Chloramphenicol agar (DRBC). Plates were incubated at 25 °C for 5–7 days.

3.2.3. Moisture Analysis

Approximately 100 g of poultry feed were placed in an oven at 105 °C for 15 min. The samples were weighed before and after drying, and the moisture percentage was calculated according to the following equation.

Moisture % = \frac{B - A}{A} \times 100

where:

B = weight of the sample after drying,
A = weight of the sample before drying.

4. Data Visualization

We analyzed and visualized data using various visualization methods to reliaze distribution patterns, statistical characteristics, and the relationship between environmental storage conditions and mycotoxin formation in fresh eggs.

Several studies have confirmed a positive correlation between the levels of mycotoxins—such as aflatoxins, ochratoxins, and deoxynivalenol (DON)—in poultry feed and their residues in eggs. This relationship is primarily due to the absorption of these toxins in the gastrointestinal tract and their subsequent transfer to reproductive tissues. The carry-over rate, however, varies significantly among different mycotoxins. For example, aflatoxins exhibit a relatively high absorption rate (≈90%) and a carry-over into eggs of about 0.55%, while ochratoxin A shows ≈40% absorption with ≈0.15% carry-over, and DON has a much lower transfer rate (≈0.001%). These differences are influenced by factors such as toxin type, concentration, bird age, health status, and feed composition.

Although earlier reports suggested a generalized conversion rate of 1–2% of ingested mycotoxins into egg residues, recent findings indicate that this assumption oversimplifies the process. Modern studies emphasize that aflatoxins and zearalenone tend to accumulate more readily in eggs compared to fumonisins and trichothecenes, which show negligible transfer. Moreover, co-contamination of feed with multiple mycotoxins can lead to synergistic effects, increasing toxicity and potentially altering carry-over dynamics. This highlights the need for precise monitoring and mitigation strategies, as even low-level contamination can compromise egg quality, shell integrity, and hatchability.

Visualization methods—including histograms, boxplots, pairwise scatter plots, violin plots, and Pearson correlation heatmaps—were used to examine variable distributions, detect outliers, and identify multicollinearity. These visual insights supported the selection of relevant features and helped validate the consistency of laboratory-measured toxin levels with known biological and environmental trends.

We used SHAP (Shapley Additive Explanations) to investigate feature contributions to predictions. SHAP visualization enhances the illustrative of machine learning models and provides a robust explanation of each feature’s effect on the final prediction. SHAP exposed that ochratoxin A levels, DON levels, and poultry-feed storage conditions were the most dominant factors across Extra Trees, Random Forest, and Decision Tree models. SHAP beeswarm plots enabled a detailed examination of feature interactions, showing, for example, that high ochratoxin A values consistently increased predicted aflatoxin concentrations SHAP beeswarm plots enabled a detailed examination of feature interactions, showing, for example, that high ochratoxin A values consistently increased predicted aflatoxin concentrations. LIME explanations provided case-specific interpretations by approximating the model locally with an interpretable surrogate, helping to validate model outputs in samples with borderline or unexpected contamination levels. These explainability techniques strengthen the scientific credibility of the predictive framework, ensuring that the models’ decision pathways align with known toxicological and environmental mechanisms.

Finally, the feasibility of deploying the proposed predictive system was assessed from both practical and technological perspectives. The model can be deployed as an API-based service, integrated into laboratory information systems, poultry farm monitoring platforms, or existing farm management systems (FMS). Through lightweight REST interfaces, sensor-to-cloud IoT integration, or batch-processing pipelines, the model can automatically ingest real-time feed storage data, environmental readings, and historical contamination records. This enables proactive identification of high-risk batches, supports early interventions (e.g., adjusting feed storage conditions), and enhances regulatory compliance. The low computational overhead of the best-performing models (Extra Trees and Random Forest) further improves their suitability for on-farm or edge deployment. Overall, the integration of explainable machine learning with farm management systems provides a practical, scalable, and scientifically robust solution for mitigating mycotoxin contamination risks in the poultry-egg supply chain.

5. Machine Learning-Based Prediction of Mycotoxins in Jordanian Egg

In this study, we investigated the application of various machine learning methods to predict different levels of mycotoxins, including aflatoxins and ochratoxin A, in eggs. We involve both regression and classification analysis.

5.1. Regression-Based Analysis of AFM1 in Jordanian Eggs

The storage temperature of poultry feed, the moisture content of poultry feed, the duration of poultry feed storage (in days), Ochra A In eggs and DON In eggs are fed as inputs for different regression models, with AFM1 generated as the output. We will train models to predict the level of AFM1 toxin in eggs based on feed features and other toxins already existing in the eggs, which will aid in the prior detection and prevention of contaminated eggs.

Table 2 list regression algorithms that we will explore in this study. We will use Scikit-Learn implementation of those algorithms (A Python library for machine learning). To discover the best machine learning algorithm that predicts AFM1 levels so that it can be used for monitoring and preventing contaminated eggs.

5.2. Regression Metrics

We apply k-fold cross-validation to evaluate regressor models, increasing the robustness of evaluation and decreasing the opportunity for overfitting. Different measures were employed to evaluate the performance of various regression models and assess their reliability. We will rely on the following commonly used measures:

Mean Squared Error (MSE): This metric measures the average of the squares of the errors. (The average squared difference between the predicted values and actual ones). A lower MSE implies better prediction accuracy since the model prediction values are closer to the real values. Equation (1a) represents the mathematical equation to calculate the MSE.
Root Mean Squared Error (RMSE): This is the square root of MSE, A lower RMSE also implies a better prediction accuracy and a more interpretable model since the model prediction values are closer to the target values measured in the same units. Equation (1a) indicates the mathematical equation to calculate the RMSE.
Mean Absolute Error (MAE): This metric measures the average absolute difference between predicted and actual values. It is less sensitive to outliers than MSE. Equation (1b) provides the mathematical representation for MAE.
R-squared ( $R^{2}$ ): This metric indicates the proportion of variance in the target variable explained by the model. An $R^{2}$ of value 1 indicates a perfect model prediction, and a value of 0 elucidates none of the variance. Equation (1d) provides the mathematical representation for $R^{2}$ .
Absolute Percentage Error (MAPE): This metric represents the average percentage of prediction error. Equation (1e) provides the mathematical representation for MAPE.

\begin{matrix} MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} \end{matrix}

(1a)

\begin{matrix} RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}} \end{matrix}

(1b)

\begin{matrix} MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} | \end{matrix}

(1c)

\begin{matrix} R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \end{matrix}

(1d)

\begin{matrix} MAPE = \frac{1}{n} \sum_{i = 1}^{n} \frac{| y_{i} - {\hat{y}}_{i} |}{y_{i}} \end{matrix}

(1e)

5.3. Discussion

Table 3 shows the performance of different regression machine learning algorithms to predict AFM1 levels in Jordanian eggs as can be seen from Table 3. The top five learning models are Linear Regression, Ridge Regression, Bayesian Ridge, Huber Regressor, and Random Forest Regressor, ranked by R². These models emphasize a strong linear relationship between the input features and the target variable; they also have incredibly low RMSLE and MAPE, indicating negligible prediction errors. In addition, they have a short training time, highlighting their efficiency. Applying the T test on a 10-fold cross-validation of the R² value indicated that, of the five three regressor models shown in Table 4, Table 5, Table 6, Table 7 and Table 8 there is no significant difference between them.

The worst-performing model among the top five learning models is the Least Angle Regressor, which has the highest errors and the lowest R², despite a decent training time.

Residual plots for the top five regressors are shown in the Figure 1, Figure 2 and Figure 3. A residual plot is used to evaluate how well the model’s predictions conform to the actual data. A perfect model should have residuals (errors) that are randomly scattered around the horizontal line, with no observable patterns. The plots show a random scattering, highly random capturing relationship between the data. In addition to high training and test scores, it emphasizes well-performing models.

We used SHAP values to reveal how input features affect the five top aggressors. By comparing SHAP plots for the three regressors, we can investigate the same features that influence the models and how their effects vary across them. The SHAP (Shapley Additive Explanations) method explains how features affect the machine learning model’s predictions. Where an SHap value is assigned to each feature representing its importance in model prediction, features are arranged along the y-axis based on their impact on prediction, as measured by the mean of their absolute SHapley values. The higher the feature’s value in the plot, the more critical it is to the model’s prediction. To provide more insight into how the regressors made predictions based on input features as a type of sensitivity analysis. Figure 1 represents the SHAP bee swarm plot of the Extra Regressor model. As shown in the Figure, Ochra A in eggs is the most critical factor in determining model predictions. The other dominant features used in the model prediction, ranked by their influence, are: DON in eggs, storage temperature of poultry feed, storage duration of poultry feed (days), and moisture of poultry feed. Beeswarm plot of Extra Tree Regressor indicates that a low level of ochratoxin A in eggs decreases the prediction, while a high level leads to higher aflatoxin prediction. Don in egg A is similar to the Ochra A-level feature in the prediction process. Figure 4 represents the feature importance plot of Linear Regression. The most influential factors that affect LR are total aflatoxins in eggs, Orach A in feed, and Orach A in eggs. Figure 4 represents the Baeswarm plots of LR, as indicated in the most dominant features are Ochra A in the eggs level feature and duration of poultry feed storage (days). Higher total aflatoxins in eggs lead to higher aflatoxin predictions. Additionally, the Orach A in feed plays the same role as Orach A in eggs during the prediction process. Instances with a high Orach A in the feed lead to a high aflatoxin prediction during LR training. The most influential features in predicting total aflatoxins in eggs using a Ridge regressor are total aflatoxins in feed and the storage temperature of poultry feed. Higher aflatoxin values in eggs lead to a higher contamination risk, and higher storage temperatures increase predictions, indicating that storage conditions influence contamination risk, as shown in Figure 5. In the Bayesian Ridge regressor, Total aflatoxins in feed, Don in eggs, and Don in feed have the most influence on model prediction as indicated in Figure 6. The most crucial features of the Huber regressor, the Random Forest regressor, and the Extra Trees regressor, along with their corresponding base plots, are shown in Figure 7, Figure 8 and Figure 9, respectively.

To highlight the performance of the top three regressors. We used the t-SNE technique to visualize the structure of the data. Figure 10 shows the t-SNE figures of ET, DT, and RF, respectively. Figure 10 projects the dataset with the features into a two-dimensional space, showing distinct clusters where instances with similar values group together. The color gradient indicates the aflatoxin feature, with lower values in red and higher values in blue. The Figure indicated a firm underlying structure in the data, supporting effective models learning.

5.4. Hyperparameters Tuning

Hyperparameters are external variables that are set before the training process. The best values of hyperparameters cannot be known in advance; they are set by search or based on other experiments. An example of hyperparameter tuning is choosing a deep learning architecture, finding the maximum depth and minimum samples per split in DT, and finding the number of trees and maximum features in RF. Two main types are used in Hyperparameter Tuning optimization methods, i.e., manual search and automatic search. Manual search based on prior domain knowledge and expert experience. Moreover, it is a tedious task with high-dimensional data. So, automatic search techniques are used. These include Grid Search, Random Search, Bayesian Optimization, Genetic Algorithms, and Gradient-based Optimization. A summary of optimization algorithms is presented in Table 9. We used the following optimization algorithms:

Grid search, which estimates all possible combinations of hyperparameters and guarantees the best solution.
Bayesian optimization, which is efficient for large and complex hyperparameter spaces.
Tree-structured Parzen (TPE) Estimator search
Random search that can find best solutions quickly in high-dimensional space.

Then we analyzed their performance throughout voting and stacking.

The Performance metrics scores for Ada after hyperparameter tuning using grid search, Bayesian optimization, and tree-structured Parzen (TPE) Estimator search are represented in Table 10 and Table 11, respectively. Bayesian Optimization of Ada gives more stable performance among folds with higher R². Bayesian Optimization gives better model accuracy in fewer iterations. The original model was better than the tuned models.

Performance metrics scores for Knn after hyperparameter tuning using grid search, random are represented in Table 12 and Table 13, respectively. random shows more data variance with the highest value of R² (0.9886) and the lowest errors in terms of MAPE, RMSE, and MAE values. Moreover, it has the lowest standard deviation in folds, making it more consistent with different runs. Bayesian Optimization has a lower R² value but is still higher than Random Grid Search. It also performs better than the random grid search for MAPE, RMSE, and MAE values. Random Grid Search has the highest errors, indicating that it is less efficient in hyperparameter tuning. The original model was better than the tuned models.

Uniform Averaging is simple, where each model contributes equally to the final prediction. But weak models will degrade performance since they are given the same weight. If we have n models, each producing a prediction ${\hat{y}}_{i}$ , then the ensemble prediction is given by:

$\hat{y} = \frac{1}{n} \sum_{i = 1}^{n} {\hat{y}}_{i}$

where:
- $\hat{y}$ is the final ensemble prediction,
- n is the number of models,
- ${\hat{y}}_{i}$ is the prediction of the i-th model.
Weighted Averaging assigns each model a weight based on performance, where Better models get higher weights. If $w_{i}$ is the weight assigned to the i-th model, then the ensemble prediction is given by:

$\hat{y} = \frac{\sum_{i = 1}^{n} w_{i} {\hat{y}}_{i}}{\sum_{i = 1}^{n} w_{i}}$

where:
- $\hat{y}$ is the final ensemble prediction,
- n is the number of models,
- ${\hat{y}}_{i}$ is the prediction of the i-th model,
- $w_{i}$ is the weight assigned to the i-th model.

The result of combining three ML regressors using voting with simple averaging is tabulated in Table 14.

6. Classification-Based Analysis of AFM1 in Jordanian Fresh Eggs

Aflatoxin B1 (AFB1) is a Poisonous compound produced by various fungi such as Aspergillus parasiticus and Aspergillus flavus. Usually, it exists in eggs due to several circumstances. And it is rigorously controlled. When classifying fresh eggs with respect to AFB1 contamination:

Safe Eggs: Contain low concentration of AFB1, which is lower than the allowable safe limit levels
Contaminated Eggs: Contain a high concentration of AFB1 above the allowable safe limit levels. Such eggs are unsuitable for human use, as they present health risks like liver damage, the risk of cancer, and increased immunocompromised.

Table 15 list several well-known classification algorithms we explore to predict AFM1 toxicity in eggs, using the “Scikit-Learn” implementation and Pycaret wrapper.

Classification Results Discussion

Performance of Classifiers was assessed using metrics spanning the following tree three complementary dimensions, and reselts presents in (Table 16).

Basic Performance Indicators

Accuracy (Equation (2a)) measures overall correctness but may obscure class-specific errors. Precision (Equation (2b)) and Recall (Equation (2c)) decompose performance by quantifying, respectively, the reliability of positive predictions and the completeness of positive detection. Their harmonic mean, the F1-score (Equation (2d)), balances both considerations.

Threshold-Independent Evaluation

AUC-ROC integrates classifier performance across all operating points, with values near unity indicating strong discrimination regardless of threshold selection.

Chance-Corrected and Balanced Metrics

Cohen’s Kappa (Equation (2f)) quantifies agreement beyond chance by normalizing observed agreement

P (a)

against expected agreement

P (e)

, with

κ = 1

denoting perfect concordance. MCC (Equation (2e)) correlates predicted and actual labels using all confusion matrix entries, offering robustness to class imbalance.

\begin{matrix} Accuracy = \frac{T P + T N}{T P + F P + T N + F N} \end{matrix}

(2a)

\begin{matrix} Precision = \frac{T P}{T P + F P} \end{matrix}

(2b)

\begin{matrix} Recall (Sensitivity) = \frac{T P}{T P + F N} \end{matrix}

(2c)

\begin{matrix} F - score = \frac{2 \times Recall \times Precision}{Recall + Precision} \end{matrix}

(2d)

\begin{matrix} MCC = \frac{T P \times T N - F P \times F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}} \end{matrix}

(2e)

\begin{matrix} κ = \frac{P (a) - P (e)}{1 - P (e)} \end{matrix}

(2f)

According to the obtained results, it is clear that the Random Forest Classifier achieved the highest overall performance, with an accuracy of 97.14%, an F1-score of 97.21%, and a Kappa of 0.8848, while maintaining a low computational cost (0.04 s). The AdaBoost and Gradient Boosting Classifiers have competitive results, with accuracies outperforming 96% and high AUC values, which prove the robustness of these classification models. Logistic Regression and K-Nearest Neighbors achieved an accuracy of approximately 96%, which is lower than that of previous classifiers. The Naïve Bayes and Support Vector Machine (linear kernel) classifiers have lower performance in terms of Kappa and MCC. The Dummy Classifier has the worst performance in terms of performance metrics. It is clear that ensemble-based models, such as Random Forest, AdaBoost, and Gradient Boosting, indicated better performance than single classifiers, proving that combining multiple weak learners increases predictive performance and stability. Furthermore, the classifiers required low computational time. The highest computational time was for Logistic regression, which reached a value of 0.368, which is still low.

Figure 11 shows the learning curves for Random Forest, AdaBoost, and Gradient Boosting. The Gradient Boosting Classifier achieves training and cross-validation accuracy of approximately 98%, indicating a good balance between bias and variance, which implies a robust classifier that generalizes well. The AdaBoost classifier achieved a training accuracy of 97%. There is a greater difference between the training and validation scores compared to the Gradient Boosting Classifier, which indicates a degree of overfitting. Random Forest performs well as the data size increases, indicating its need for large data to fully utilize its structure.

Figure 12 shows ROC curves for Random Forest, AdaBoost, and Gradient Boosting. The ROC curve illustrates the model’s ability to classify instances as positive or negative across all threshold values. The closer the curve is to the top-left, the higher the AUC and, therefore, the better the model’s performance. As can be seen from the ROC curves, all are close to the top-left corner, indicating well-performing models, and the areas under the ROC curves are all close to 1. The AdaBoost classifier has the highest AUC value of 0.9735, which is very close to the perfect model. The Random Forest is very close to the AdaBoost classifier’s performance, with a score of 0.971. The Gradient Boosting classifier has a score of 0.9633. To provide more insights into how features affect classifiers’ decisions and to emphasize the crucial features in the decision-making process. Figure 13 represents the feature importance plot of the top three classifiers.The feature importance plot for Random Forest indicates that the total aflaxiton is the most dominant feature during the training process of Random Forest. Additionally, the aflaxiton has the same effect on both AdaBoost and Gradient Boosting classifiers during the training process.

7. Conclusions

The research successfully predicted the concentration and presence of deoxynivalenol (DON), aflatoxin B1 (AFB1), and ochratoxin A (OTA) in fresh eggs from Jordan using machine learning techniques, and enhance both the accuracy and effectiveness of detection, which can improve public health and safety. The research employed different algorithms for building AFM1 predictive models and found that ET performs well for aflatoxin B1 (AFB1) regression compared to other models. Additionally, we employed various classification algorithms to categorize fresh eggs. RF has outperformed other classifiers in terms of accuracy. This study emphasizes the crucial role of advanced machine learning methods in monitoring deoxynivalenol contamination, implying that different regression and classification models could play a positive role in future predictive analytics and reduce the cost and time necessary for physically examining individual fresh egg samples. Further research should employ an ensemble of regression and classification models to enhance performance and build a robust model that can be used as an alternative to physically time-consuming methods. This will help in the development of effective strategies for monitoring the risks associated with aflatoxin exposure.

Author Contributions

Methodology, E.O., E.A. and S.O.; software, E.O., E.A. and W.B.M.; validation, S.O. and E.A.; investigation, A.A., E.O., E.A. and S.O.; writing—original draft, E.O., E.A., W.B.M. and S.O.; writing—review and editing, H.M. and A.A.; Quality Check, H.M.; visualization, E.A.; supervision, S.O.; project administration, S.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fakhri, Y.; Sarafraz, M.; Nematollahi, A.; Ranaei, V.; Soleimani-Ahmadi, M.; Thai, V.N.; Mousavi Khaneghah, A. A global systematic review and meta-analysis of concentration and prevalence of mycotoxins in birds’ egg. Environ. Sci. Pollut. Res. 2021, 28, 59542–59550. [Google Scholar] [CrossRef]
Ganesan, L.; Balasubramanian, B.; Kumaravel, S.K.; Maluventhen, V.; Arumugan, M. A comprehensive review: Advancements in nanomaterials on the risk prevention, detection, and elimination of mycotoxin contamination. Qual. Assur. Saf. Crop. Foods 2024, 16, 82–110. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, C.; van der Fels-Klerx, H. Occurrence, toxicity, dietary exposure, and management of Alternaria mycotoxins in food and feed: A systematic literature review. Compr. Rev. Food Sci. Food Saf. 2025, 24, e70085. [Google Scholar] [CrossRef]
Einolghozati, M.; Talebi-Ghane, E.; Ranjbar, A.; Mehri, F. Concentration of aflatoxins in edible vegetable oils: A systematic meta-analysis review. Eur. Food Res. Technol. 2021, 247, 2887–2897. [Google Scholar] [CrossRef]
Nelson, P.E.; Desjardins, A.E.; Plattner, R.D. Fumonisins, Mycotoxins Produced by Fusarium Species: Biology, Chemistry, and Significance. Annu. Rev. Phytopathol. 1993, 31, 233–252. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Chai, T.; Lu, G.; Quan, C.; Duan, H.; Yao, M.; Zucker, B.A.; Schlenker, G. Simultaneous detection of airborne Aflatoxin, Ochratoxin and Zearalenone in a poultry house by immunoaffinity clean-up and high-performance liquid chromatography. Environ. Res. 2008, 107, 139–144. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Gao, L.; Wang, Z.; Huang, P.; Guan, T.; Zheng, X. Mechanistic insight into ochratoxin A adsorption onto the cell wall of Lacticaseibacillus rhamnosus Bm01 and its impact on grape juice quality. Qual. Assur. Saf. Crop. Foods 2024, 16, 52–64. [Google Scholar] [CrossRef]
Blank, R. Contribution of animal products to human dietary mycotoxin intake. Umweltwissenschaften Schadst.-Forsch. 2002, 14, 104–109. [Google Scholar] [CrossRef]
Aly, S.A.; Anwer, W. Effect of naturally contaminated feed with aflatoxins on performance of laying hens and the carryover of aflatoxin B1 residues in table eggs. Pak. J. Nutr. 2009, 8, 181–186. [Google Scholar] [CrossRef]
Resanović, R.M.; Nešić, K.D.; Nesić, V.D.; Palić, T.D.; Jaćević, V.M. Mycotoxins in poultry production. Zb. Matice Srp. Za Prir. Nauk. 2009, 116, 7–14. [Google Scholar] [CrossRef]
Škrbić, B.; Malachova, A.; Živančev, J.; Veprikova, Z.; Hajšlová, J. Fusarium mycotoxins in wheat samples harvested in Serbia: A preliminary survey. Food Control 2011, 22, 1261–1267. [Google Scholar] [CrossRef]
Iqbal, S.Z.; Nisar, S.; Asi, M.R.; Jinap, S. Natural incidence of aflatoxins, ochratoxin A and zearalenone in chicken meat and eggs. Food Control 2014, 43, 98–103. [Google Scholar] [CrossRef]
Poormohammadi, A.; Bashirian, S.; Mir Moeini, E.S.; Reza Faryabi, M.; Mehri, F. Monitoring of aflatoxins in edible vegetable oils consumed in Western Iran in Iran: A risk assessment study. Int. J. Environ. Anal. Chem. 2023, 103, 5399–5409. [Google Scholar]
Heshmati, A.; Mozaffari Nejad, A.S.; Mehri, F. Occurrence, dietary exposure, and risk assessment of aflatoxins in wheat flour from Iran. Int. J. Environ. Anal. Chem. 2023, 103, 9395–9408. [Google Scholar] [CrossRef]
Takooree, S.D.; Ally, N.M.; Ameerkhan, A.B.; Ranghoo-Sanmukhiya, V.M.; Duchenne-Moutien, R.A.; Neetoo, H. Surveillance of mycotoxin contaminants and mycotoxigenic fungi in agricultural produce. Qual. Assur. Saf. Crop. Foods 2023, 15, 89–98. [Google Scholar] [CrossRef]
Zamanpour, S.; Noori, S.M.A.; Yancheshmeh, B.S.; Afshari, A.; Hashemi, M. A systematic review to introduce the most effective postbiotics derived from probiotics for aflatoxin detoxification in vitro. Ital. J. Food Sci. 2023, 35, 31–49. [Google Scholar] [CrossRef]
Lee, M.; Seo, D.J.; Jeon, S.B.; Ok, H.E.; Jung, H.; Choi, C.; Chun, H.S. Detection of foodborne pathogens and mycotoxins in eggs and chicken feeds from farms to retail markets. Korean J. Food Sci. Anim. Resour. 2016, 36, 463. [Google Scholar] [CrossRef]
Bai, B.; Chen, J.; Bai, F.; Nie, Q.; Jia, X. Corrosion effect of acid/alkali on cementitious red mud-fly ash materials containing heavy metal residues. Environ. Technol. Innov. 2024, 33, 103485. [Google Scholar] [CrossRef]
Youssef, F.S.; Yonis, A.E.; Elhaddad, G.Y.; Elsebaey, H.S.; Naem, N.A.; Amro, F.H.; Abd-Elhafeez, M.S. Detection of some bacteria and mycotoxins in the baladi chicken eggs from backyard in Bahira Governorate. Egypt. Pharm. J. 2024, 23, 110–121. [Google Scholar] [CrossRef]
Osaili, T.M.; Al-Abboodi, A.R.; Awawdeh, M.A.; Jbour, S.A.M.A. Assessment of mycotoxins (deoxynivalenol, zearalenone, aflatoxin B1 and fumonisin B1) in hen’s eggs in Jordan. Heliyon 2022, 8, e11017. [Google Scholar] [CrossRef]
Alaboudi, A.R.; Osaili, T.M.; Otoum, G. Quantification of mycotoxin residues in domestic and imported chicken muscle, liver and kidney in Jordan. Food Control 2022, 132, 108511. [Google Scholar] [CrossRef]
Heshmati, A.; Mehri, F.; Nili-Ahmadabadi, A.; Mousavi Khaneghah, A. The fate of ochratoxin A during grape vinegar production. Int. J. Environ. Anal. Chem. 2023, 103, 5910–5919. [Google Scholar]
Oliveira, C.F.; Rosmaninho, J.F.; Castro, A.; Butkeraitis, P.; Reis, T.A.; Corrêa, B. Aflatoxin residues in eggs of laying Japanese quail after long-term administration of rations containing low levels of aflatoxin B1. Food Addit. Contam. 2003, 20, 648–653. [Google Scholar] [CrossRef]
Herzallah, S.M. Aflatoxin B1 Residues in Eggs and Flesh of Laying Hens Fed Aflatoxin B1 Contaminated Diet. Am. J. Agric. Biol. Sci. 2013, 8, 156–161. [Google Scholar] [CrossRef]
Wang, X.; Bouzembrak, Y.; Lansink, A.O.; Van Der Fels-Klerx, H. Application of machine learning to the monitoring and prediction of food safety: A review. Compr. Rev. Food Sci. Food Saf. 2022, 21, 416–434. [Google Scholar] [CrossRef]
Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine learning in agriculture: A review. Sensors 2018, 18, 2674. [Google Scholar] [CrossRef]
Aggarwal, A.; Mishra, A.; Tabassum, N.; Kim, Y.M.; Khan, F. Detection of mycotoxin contamination in foods using artificial intelligence: A review. Foods 2024, 13, 3339. [Google Scholar] [CrossRef]
Camardo Leggieri, M.; Mazzoni, M.; Battilani, P. Machine learning for predicting mycotoxin occurrence in maize. Front. Microbiol. 2021, 12, 661132. [Google Scholar] [CrossRef] [PubMed]
Mu, W.; Kleter, G.A.; Bouzembrak, Y.; Dupouy, E.; Frewer, L.J.; Radwan Al Natour, F.N.; Marvin, H. Making food systems more resilient to food safety risks by including artificial intelligence, big data, and internet of things into food safety early warning and emerging risk identification tools. Compr. Rev. Food Sci. Food Saf. 2024, 23, e13296. [Google Scholar] [CrossRef]
Inglis, A.; Parnell, A.C.; Subramani, N.; Doohan, F.M. Machine learning applied to the detection of mycotoxin in food: A systematic review. Toxins 2024, 16, 268. [Google Scholar] [CrossRef] [PubMed]
Rayhana, R.; Sangha, J.S.; Ruan, Y.; Liu, Z. Harnessing Machine Learning for Grain Mycotoxin Detection. Smart Agric. Technol. 2025, 11, 100923. [Google Scholar] [CrossRef]
Castano-Duque, L.; Vaughan, M.; Lindsay, J.; Barnett, K.; Rajasekaran, K. Gradient boosting and bayesian network machine learning models predict aflatoxin and fumonisin contamination of maize in Illinois–First USA case study. Front. Microbiol. 2022, 13, 1039947. [Google Scholar]
Emmanuel K, T.; Els, V.P.; Bart, H.; Evelyne, D.; Els, V.H.; Els, D. Carry-over of some Fusarium mycotoxins in tissues and eggs of chickens fed experimentally mycotoxin-contaminated diets. Food Chem. Toxicol. 2020, 145, 111715. [Google Scholar] [CrossRef] [PubMed]
Fakhri, Y.; Mehri, F.; Ranaei, V.; Pilevar, Z.; Soleimani, F.; Nasiri, R.; Khaneghah, A.M. The prevalence and concentration of mycotoxins (Aflatoxins, Deoxynivalenol, Zearalenone, and Ochratoxin A) in domestic bird eggs: A global systematic review, meta-analysis, and probabilistic risk assessment. J. Food Prot. 2025, 88, 100600. [Google Scholar] [CrossRef] [PubMed]
Aref, A.; Omar, E.; Alseidi, E.; Alqudah, N.E.A.; Omar, S. Exploring machine learning methods for aflatoxin M1 prediction in Jordanian breast milk samples. Computers 2024, 13, 288. [Google Scholar] [CrossRef]

Figure 1. (a) Residuals for Linear Regression Model (b) and Residuals for Ridge model.

Figure 2. (a) Residuals for Bayesian Ridge (b) and Residuals for Huber Regressor.

Figure 3. Residuals for Random Forest Regressor.

Figure 4. Feature Important Plot (top), Beeswarm plot (middle), and PDP plot (bottom) of Linear Regression.

Figure 5. Feature Important Plot (top), Beeswarm plot (middle), and PDP plot (bottom) of Ridge Regression.

Figure 6. Feature Important Plot (top), Beeswarm plot (middle), and PDP plot (bottom) of Bayesian Ridge.

Figure 7. Feature Important Plot (top), Beeswarm plot (middle), and PDP plot (bottom) of Huber Regressor.

Figure 8. Feature Important Plot (top), Beeswarm plot (right), and PDP plot (centered below) of Random Forest Regressor.

Figure 9. Feature Important Plot (top), Beeswarm plot (right), and PDP plot (centered below) of Extra Trees Regressor.

Figure 10. t-SNE visualizations for the top three models: (a) LR, (b) Ridge, and (c) BR.

Figure 11. Learning curves for the top three models: (a) RF, (b) ADA, and (c) GBC.

Figure 12. ROC curves for the top three models: (a) RF, (b) ADA, and (c) GBC.

Figure 13. Feature importance plots for the top three models: (a) RF, (b) GBC, and (c) DT.

Table 1. Basic statistics on the dataset.

	Mean	Median
Storage Temp of poultry feed	28.67234989	29
Relative Humidity in store	63	64
Moisture of poultry feed	61.35137139	62
Duration of poultry feed storage (days)	66.58932543	58
Total aflatoxins In eggs	24.72053373	26
Ochra A In eggs	15.55522609	15
DON In eggs	157.0882135	147
Total aflatoxins In feed	39.60293529	42
Ochra A In feed	18.71487658	18
DON In feed	173.4096064	163

Table 2. Regression algorithms categorized by model family.

Primary Family	Algorithm	Abbreviation
Tree-Based	Decision Tree Regressor	DT
	Random Forest Regressor	RF
	Extra Trees Regressor	ET
	Gradient Boosting Regressor	GBR
	Extreme Gradient Boosting	XGB
	Light GBM	LGBM
	CatBoost	CAT
Linear	Ridge Regression	RIDGE
	Lasso Regression	LASSO
	Elastic Net	EN
	Least Angle Regression	LAR
	Orthogonal Matching Pursuit	OMP
	Bayesian Ridge	BR
	Automatic Relevance Determination	ARD
	Huber Regressor	HUBER
	Theil-Sen Regressor	TS
	RANSAC Regressor	RANSAC
	Passive Aggressive Regressor	PAR
Kernel	Support Vector Regression	SVR
Kernel	Kernel Ridge Regression	KRR
Instance-Based	K-Neighbors Regressor	KNN
Neural Networks	Multi-layer Perceptron	MLP
Ensemble Meta-Algorithm	AdaBoost Regressor	ADA

Table 3. Comparison Results for Regression Learning models.

Model	MAE	MSE	RMSE	$R^{2}$	RMSLE	MAPE	TT (s)
Linear Regression	0	0	0	1	0.0005	0	0.02
Ridge Regression	0.0001	0	0.0001	1	0.0005	0	0.266
Bayesian Ridge	0	0	0	1	0.0005	0	0.005
Huber Regressor	0.0001	0	0.0001	1	0.0005	0	0.01
Random Forest Regressor	0.0063	0.0034	0.0414	0.9999	0.0018	0.0003	0.031
Extra Trees Regressor	0.0035	0.0023	0.0249	0.9999	0.0011	0.0001	0.025
Gradient Boosting Regressor	0.0023	0.0021	0.015	0.9999	0.0009	0.0001	0.018
Decision Tree Regressor	0.0053	0.0053	0.0394	0.9998	0.0017	0.0002	0.008
Passive Aggressive Regressor	0.0554	0.0048	0.0693	0.9998	0.0031	0.0025	0.006
Elastic Net	0.0876	0.0126	0.1119	0.9995	0.0061	0.0043	0.005
Light Gradient Boosting Machine	0.0247	0.0146	0.1055	0.9994	0.0034	0.0008	0.058
Lasso Regression	0.1062	0.0183	0.135	0.9993	0.0075	0.0052	0.282
Lasso Least Angle Regression	0.1063	0.0184	0.1351	0.9993	0.0075	0.0052	0.005
AdaBoost Regressor	0.4194	0.2519	0.4969	0.99	0.0205	0.0176	0.01
K Neighbors Regressor	0.2154	0.5892	0.7294	0.9767	0.0339	0.0099	0.009
Orthogonal Matching Pursuit	3.3282	18.2539	4.2634	0.2938	0.1963	0.162	0.005
Dummy Regressor	4.113	26.1581	5.1045	−0.0051	0.231	0.2011	0.006
Least Angle Regression	931.4694	13,560,887.39	1164.52	−559,283.9353	0.5616	37.632	0.006

Table 4. 10-Fold CV of Linear Regression.

Fold	$R^{2}$	RMSLE
0	1	0.0005
1	1	0.0005
2	1	0.0005
3	1	0.0005
4	1	0.0005
5	1	0.0005
6	1	0.0005
7	1	0.0005
8	1	0.0005
9	1	0.0005
Mean	1	0.0005
Std	0	0

Table 5. 10-Fold CV of Ridge Regressor.

Fold	MAE	RMSE	$R^{2}$	RMSLE
0	0.0001	0.0001	1	0.0005
1	0.0001	0.0001	1	0.0005
2	0.0001	0.0001	1	0.0005
3	0.0001	0.0001	1	0.0005
4	0.0001	0.0002	1	0.0005
5	0.0001	0.0001	1	0.0005
6	0.0001	0.0002	1	0.0005
7	0.0001	0.0001	1	0.0005
8	0.0001	0.0002	1	0.0005
9	0.0001	0.0002	1	0.0005
Mean	0.0001	0.0001	1	0.0005
Std	0	0	0	0

Table 6. 10-Fold CV of Bayesian Ridge.

Fold	$R^{2}$	RMSLE
0	1	0.0005
1	1	0.0005
2	1	0.0005
3	1	0.0005
4	1	0.0005
5	1	0.0005
6	1	0.0005
7	1	0.0005
8	1	0.0005
9	1	0.0005
Mean	1	0.0005
Std	0	0

Table 7. 10-Fold CV of Huber Regressor.

Fold	MAE	RMSE	$R^{2}$	RMSLE
0	0.0001	0.0001	1	0.0005
1	0	0.0001	1	0.0005
2	0.0001	0.0002	1	0.0005
3	0.0001	0.0001	1	0.0005
4	0	0	1	0.0005
5	0	0	1	0.0005
6	0.0001	0.0001	1	0.0005
7	0.0001	0.0002	1	0.0005
8	0	0.0001	1	0.0005
9	0.0001	0.0001	1	0.0005
Mean	0.0001	0.0001	1	0.0005
Std	0	0	0	0

Table 8. 10-Fold CV of Random Forest Regressor.

Fold	MAE	MSE	RMSE	$R^{2}$	RMSLE	MAPE
0	0.0082	0.003	0.0552	0.9999	0.0017	0.0003
1	0.0029	0.0005	0.0218	1	0.0008	0.0001
2	0.0007	0.0001	0.0072	1	0.0007	0.0001
3	0.004	0.0007	0.0273	1	0.0013	0.0002
4	0.0109	0.0053	0.073	0.9998	0.0062	0.001
5	0.0222	0.0222	0.1488	0.9992	0.0042	0.0006
6	0.0016	0.0001	0.0121	1	0.0007	0.0001
7	0.0068	0.0013	0.0358	0.9999	0.0011	0.0002
8	0.0015	0.0001	0.0073	1	0.0006	0.0001
9	0.0037	0.0007	0.026	1	0.001	0.0001
Mean	0.0063	0.0034	0.0414	0.9999	0.0018	0.0003
Std	0.0061	0.0064	0.041	0.0002	0.0018	0.0003

Table 9. Summary of optimization algorithms with advantages and disadvantages.

Algorithm	Advantages	Disadvantages
Grid Search	Simple; guarantees optimal	High-dimensional cost
Random Search	High-dimensional efficiency	No optimality guarantee
Bayesian Opt.	Fewer evaluations needed	Complex; high-dim. inefficient
Gradient-Based	High-dimensional suitability	Computationally expensive
Genetic Alg.	Complex space handling; flexible	Expensive; many evaluations

Table 10. Performance of Ada after hyperparameter tuning using random grid search.

Fold	MAE	MSE	RMSE	$R^{2}$	RMSLE	MAPE
0	0.2347	0.0849	0.2914	0.9967	0.0126	0.0102
1	0.2701	0.1269	0.3562	0.9953	0.0146	0.0115
2	0.2316	0.1003	0.3166	0.9965	0.0149	0.0109
3	0.2141	0.0777	0.2787	0.9971	0.0128	0.0097
4	0.1739	0.0733	0.2708	0.9974	0.0139	0.0082
5	0.2728	0.1575	0.3969	0.9942	0.0151	0.0115
6	0.2765	0.1381	0.3716	0.9932	0.016	0.012
7	0.2483	0.1162	0.3409	0.9946	0.0141	0.0105
8	0.2561	0.1209	0.3477	0.995	0.0151	0.0114
9	0.3222	0.1621	0.4026	0.9949	0.0171	0.0144
Mean	0.25	0.1158	0.3373	0.9955	0.0146	0.011
Std	0.0381	0.03	0.0446	0.0013	0.0013	0.0015

Table 11. Performance of Ada after hyperparameter tuning using random Bayesian optimization.

Fold	MAE	MSE	RMSE	$R^{2}$	RMSLE	MAPE
0	0.2347	0.0849	0.2914	0.9967	0.0126	0.0102
1	0.2701	0.1269	0.3562	0.9953	0.0146	0.0115
2	0.2316	0.1003	0.3166	0.9965	0.0149	0.0109
3	0.2141	0.0777	0.2787	0.9971	0.0128	0.0097
4	0.1739	0.0733	0.2708	0.9974	0.0139	0.0082
5	0.2728	0.1575	0.3969	0.9942	0.0151	0.0115
6	0.2765	0.1381	0.3716	0.9932	0.016	0.012
7	0.2483	0.1162	0.3409	0.9946	0.0141	0.0105
8	0.2561	0.1209	0.3477	0.995	0.0151	0.0114
9	0.3222	0.1621	0.4026	0.9949	0.0171	0.0144
Mean	0.25	0.1158	0.3373	0.9955	0.0146	0.011
Std	0.0381	0.03	0.0446	0.0013	0.0013	0.0015

Table 12. Performance of KNN after hyperparameter tuning using random grid search.

Fold	MAE	MSE	RMSE	$R^{2}$	RMSLE	MAPE
0	0.0266	0.0234	0.1531	0.9991	0.0048	0.0009
1	0	0	0	1	0.0005	0
2	0.0211	0.0421	0.2052	0.9985	0.0071	0.0008
3	0.0105	0.0105	0.1026	0.9996	0.0045	0.0005
4	0.0228	0.0215	0.1466	0.9992	0.0048	0.0008
5	0.0213	0.0213	0.1459	0.9992	0.0041	0.0006
6	0.0074	0.0052	0.0721	0.9997	0.0023	0.0002
7	0.0017	0.0003	0.0163	1	0.0006	0.0001
8	0	0	0	1	0.0005	0
9	0.0216	0.0219	0.1482	0.9993	0.0065	0.001
Mean	0.0133	0.0146	0.099	0.9995	0.0036	0.0005
Std	0.0099	0.0131	0.0695	0.0005	0.0023	0.0004

Table 13. Performance of KNN after hyperparameter tuning using random Bayesian optimization.

Fold	MAE	MSE	RMSE	$R^{2}$	RMSLE	MAPE
0	0.0211	0.0211	0.1451	0.9992	0.0045	0.0007
1	0	0	0	1	0.0005	0
2	0.0211	0.0421	0.2052	0.9985	0.0071	0.0008
3	0.0105	0.0105	0.1026	0.9996	0.0045	0.0005
4	0.0213	0.0213	0.1459	0.9992	0.0048	0.0007
5	0.0213	0.0213	0.1459	0.9992	0.0041	0.0006
6	0.0007	0	0.0066	1	0.0005	0
7	0.0006	0	0.0056	1	0.0005	0
8	0	0	0	1	0.0005	0
9	0.0213	0.0213	0.1459	0.9993	0.0064	0.001
Mean	0.0118	0.0138	0.0903	0.9995	0.0033	0.0004
Std	0.0098	0.0134	0.0749	0.0005	0.0025	0.0004

Table 14. Combining the top three ML regressors using voting.

	MAE	MSE	RMSE	$R^{2}$	RMSLE	MAPE
Voting of 3 ML regressors	0.0423	0.1723	0.245	0.9971	0.1599	0.0915

Table 15. Classification algorithms categorized by model family.

Primary Family	Algorithm	Abbreviation
Tree-Based	Decision Tree Classifier	DT
	Random Forest Classifier	RF
	Extra Trees Classifier	ET
	Gradient Boosting Classifier	GBC
	Extreme Gradient Boosting	XGB
	Light GBM	LGBM
	CatBoost	CAT
Linear	Logistic Regression	LR
	Ridge Classifier	RIDGE
	Linear Discriminant Analysis	LDA
	SVM (Linear Kernel)	SVM
Discriminant Analysis	Quadratic Discriminant Analysis	QDA
Kernel	SVM (Radial Basis Function)	RBFSVM
Probabilistic	Naive Bayes	NB
Probabilistic	Gaussian Process Classifier	GPC
Instance-Based	K-Neighbors Classifier	KNN
Neural Networks	Multi-layer Perceptron	MLP
Ensemble Meta-Algorithm	AdaBoost Classifier	ADA
Baseline	Dummy Classifier	DUMMY

Table 16. Performance comparison of different classification models across multiple metrics.

Model	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC	TT (s)
Random Forest Classifier	0.9714	0.971	0.9714	0.9737	0.9721	0.8848	0.8864	0.04
Ada Boost Classifier	0.966	0.9735	0.966	0.9698	0.9669	0.8646	0.8684	0.025
Gradient Boosting Classifier	0.9633	0.9646	0.9633	0.9659	0.9636	0.8479	0.8518	0.025
Decision Tree Classifier	0.962	0.9515	0.962	0.9647	0.9618	0.8398	0.8455	0.011
Extra Trees Classifier	0.962	0.9504	0.962	0.9647	0.9618	0.8398	0.8455	0.035
K Neighbors Classifier	0.9606	0.9603	0.9606	0.9637	0.9607	0.8355	0.8408	0.246
Logistic Regression	0.9605	0.962	0.9605	0.9609	0.9597	0.8269	0.8309	0.368
Quadratic Discriminant Analysis	0.9578	0.9721	0.9578	0.964	0.9594	0.8362	0.8419	0.012
Linear Discriminant Analysis	0.9373	0.9489	0.9373	0.9477	0.9402	0.7617	0.7702	0.015
Ridge Classifier	0.9277	0	0.9277	0.9276	0.9255	0.6782	0.6861	0.012
Naive Bayes	0.9086	0.9608	0.9086	0.9288	0.9145	0.6675	0.6826	0.013
SVM-Linear Kernel	0.906	0	0.906	0.895	0.8913	0.5075	0.5336	0.011
Dummy Classifier	0.8624	0.5	0.8624	0.7438	0.7987	0	0	0.014

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Omar, E.; Alsaidi, E.; Aref, A.; Omar, S.; Mustafa, W.B.; Milhem, H. Exploring Risk Factors of Mycotoxin Contamination in Fresh Eggs Using Machine Learning Techniques. Computers 2026, 15, 34. https://doi.org/10.3390/computers15010034

AMA Style

Omar E, Alsaidi E, Aref A, Omar S, Mustafa WB, Milhem H. Exploring Risk Factors of Mycotoxin Contamination in Fresh Eggs Using Machine Learning Techniques. Computers. 2026; 15(1):34. https://doi.org/10.3390/computers15010034

Chicago/Turabian Style

Omar, Eman, Eman Alsaidi, Abdullah Aref, Sharaf Omar, Wafa’ Bani Mustafa, and Hind Milhem. 2026. "Exploring Risk Factors of Mycotoxin Contamination in Fresh Eggs Using Machine Learning Techniques" Computers 15, no. 1: 34. https://doi.org/10.3390/computers15010034

APA Style

Omar, E., Alsaidi, E., Aref, A., Omar, S., Mustafa, W. B., & Milhem, H. (2026). Exploring Risk Factors of Mycotoxin Contamination in Fresh Eggs Using Machine Learning Techniques. Computers, 15(1), 34. https://doi.org/10.3390/computers15010034

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Risk Factors of Mycotoxin Contamination in Fresh Eggs Using Machine Learning Techniques

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Data Description

3.2. Sample Preparation

3.2.1. Mold Counts in Poultry Feed

3.2.2. Isolation of Poultry Feed Fungi

3.2.3. Moisture Analysis

4. Data Visualization

5. Machine Learning-Based Prediction of Mycotoxins in Jordanian Egg

5.1. Regression-Based Analysis of AFM1 in Jordanian Eggs

5.2. Regression Metrics

5.3. Discussion

5.4. Hyperparameters Tuning

6. Classification-Based Analysis of AFM1 in Jordanian Fresh Eggs

Classification Results Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI