Food Safety Risk Prediction and Regulatory Policy Enlightenment Based on Machine Learning
Abstract
1. Introduction
2. Review of Relevant Literature
2.1. Factors Influencing Food Safety Risks
2.2. Prediction of Food Safety Risks
3. Data Sources and Analysis
3.1. LDA-Based Topic Text Mining
- (1)
- Sample the topic distribution for document from a Dirichlet distribution :
- (2)
- For the th word in document :
- Sample a topic from the multinomial distribution :
- Sample the word from the word distribution associated with topic :
- (3)
- To address the characteristics of Shanghai’s food safety sampling data, this study implements the following specialized design:
- Corpus Construction: Key fields are fused to construct analysis documents:
- Parameter Optimization: The optimal number of topics is determined via coherence score:
- Topic Interpretation: Based on the topic–word distribution , the semantic weight of topic is calculated:
- (4)
- Seven core risk topics are ultimately identified, and for each nonconforming sample ii, its document–topic distribution vector is extracted as a new feature:
3.2. Food Safety Risk Status in Shanghai
- (1)
- Temporal Distribution of Food Risks. Through annual trend analysis, the noncompliance rate of food safety sampling in Shanghai (2023–2025) exhibits significant annual fluctuations. The overall noncompliance rate shows an upward trend: the average rate was 0.451% in 2023, increased to 0.535% in 2024 (a 13% year-on-year rise), and further surged to 0.630 in Q1 2025 (a 28.6% year-on-year increase). This indicates escalating food safety risk pressure in Shanghai. Notably, the noncompliance rate peaked at 1.311% in April 2024 (the highest observed value), likely linked to cold chain disruptions caused by post-Spring Festival supply chain restructuring and accelerated warehouse turnover. In contrast, September 2023 marked a trough (historical lowest level), potentially benefiting from intensified regulatory campaigns prior to major events such as the China International Import Expo, Mid-Autumn Festival, and National Day holidays. By quarterly distribution analysis, the second quarter has the highest risk level (0.658%), exceeding the annual average by 37.8%. The third quarter has the lowest risk (0.439%), contradicting theoretical expectations of heightened microbial contamination during summer heatwaves. This may reflect the effectiveness of cold chain infrastructure in megacities like Shanghai. Monthly dynamic data reveal key risk nodes: the monthly noncompliance rate exhibits a bimodal seasonal pattern. A primary peak occurs from April to July, forming a continuous high-risk period, while a secondary peak spans November to January, potentially driven by New Year and Spring Festival consumption surges. Critical turning points occur in March–April (possibly influenced by supply chain adaptability risks during winter-spring transitions) and September (affected by major holidays, emerging as the optimal annual regulatory window). Overall, Shanghai’s food safety risks exhibit significant temporal clustering, characterized by “annual growth, quarterly divergence, and monthly bimodal patterns.” Priority prevention should focus on Q2’s high-temperature risk period and year-end consumption peaks, achieved through a temporal response mechanism for targeted regulatory resource allocation. The anomalous noncompliance rate surge in 2024 warns of the need for enhanced oversight of imported foods and e-commerce channels, alongside establishing a time-dimension-based risk prevention paradigm. Monthly trends are illustrated in Figure 1.
- (2)
- Category Distribution of Food Risks. Through in-depth analysis of 34 food categories in Shanghai, three high-risk categories were identified: (1) Three significant high-risk categories. First, honey products (1.327%) exhibited the highest risk level, with pesticide residue exceedance as the main issue (accounting for 67.3% of nonconforming samples), peaking in the second quarter. This aligns with the peak period of pesticide application in nectar plants, indicating direct contamination from raw honey sources and reflecting weak source control. Second, vegetable products (1.150%) faced dominant risks from additive misuse, such as benzoic acid excess, particularly in pickled subcategories. This corresponded to the surge in winter pickled food consumption during the fourth quarter. Third, agricultural products (1.102%) were vulnerable to microbial contamination and pesticide residue exceedance. Their summer noncompliance rate was over twice that of winter, highlighting temperature-dependent risks and exposed source control gaps. (2) Medium-to-low-risk categories exhibited significant heterogeneity in noncompliance characteristics. For instance, roasted nut products were primarily affected by aflatoxin and acid value exceedance, while meat products showed risks from peroxide value and nitrite excess, largely due to substandard storage conditions. Infant formula and other categories achieved zero noncompliance throughout the study period, demonstrating regulatory emphasis on these products and underscoring the importance of full-chain production control. (3) Category risks exhibited seasonal migration. In the first, second, third, and fourth quarters, the most prominent risky categories were meat products, honey products, aquatic products, and vegetable products, respectively. Special dietary foods and roasted nuts showed annual rising noncompliance trends, while dairy products exhibited a decline. Food category sampling results revealed significant category heterogeneity and temporal dynamics. Therefore, a dual-track governance mechanism combining “risk-stratification-based category-specific precision control” and “seasonal fluctuation-adaptive temporal prevention” is required to address core issues: honey product pesticide residues, additive misuse in vegetable products, and pesticide residue in agricultural products. Enhancing source control and process intervention synergies will improve food safety governance efficiency. Category-specific noncompliance rate changes are illustrated in Figure 2.
- (3)
- Cause Distribution of Food Risks. Food safety risks are categorized into six primary contamination types: pesticide and veterinary drug residues, additive misuse, microbial contamination, indicator anomalies, toxic substances, and illegal additives. This study further classifies these into 17 subcategories (see Figure 3). Among them, pesticide and veterinary drug residues are the most severe issues, dominated by chlorpyrifos and thiamethoxam, concentrated in agricultural products, with the highest detection rate in summer. Second is additive misuse, primarily benzoic acid—a key ingredient in pickled vegetables. Excessive sulfur dioxide in starch products also warrants attention. Third is microbial contamination, mainly colony count exceedance and mold pollution, often linked to cold chain infrastructure and hygiene conditions. Contamination causes exhibit category specificity: pesticide/veterinary drug residues and illegal additives predominantly occur in agricultural products; peroxide value, sulfur dioxide, and microbial overruns mainly affect processed foods like meat products, pastries, and vegetable products. In summary, contamination issues display a “three-dimensional pattern”: pesticide/veterinary drug residues as the primary driver, additive misuse as secondary, and microbial contamination fluctuating with temperature. A three-track strategy is required to block risks across the farm-to-table chain: source control of pesticides/veterinary drugs, process upgrades for additives, and cold chain reinforcement against microbial threats.
4. Food Safety Risk Prediction in Shanghai
4.1. Construction of a Food Safety Prediction Model Based on XGBoost
- (1)
- Core Formula.
- (2)
- Key Feature Engineering.
- (3)
- Risk Analysis Methods.
- (4)
- Regulatory Decision-making Model.
4.2. XGBoost-Based Food Safety Prediction Results
- (1)
- ROC Curve Analysis
- (2)
- Identification of Key Features in the Model
- (3)
- Distribution Patterns of Key Features. In this analysis, feature engineering was used to extract and transform the original variables. The two most important variables,” Category” and “Monthly Total Inspections for the Category”, were selected and processed. One-Hot encoding and PCA were applied to the ”Category” variable to create the ”Category_encoded” feature. Z-Score normalization was performed on the numeric variable ”Monthly Total Inspections for the Category” to unify its scale for model computation. The distribution of the core features was visually verified, with results as follows:
- (4)
- Feature Correlation Analysis. As shown in Figure 7, the feature correlation matrix based on Pearson’s r supports the above results. Specifically, the strong positive correlation (r ≈ 0.89) between monthly and quarterly retail chain inspections reflects the temporal nesting of regulatory frequency. The correlation (r = 0.62) between the number of holidays and supermarket inspection frequency aligns with the logic of allocating regulatory resources to consumption peaks. Strong correlations (e.g., more inspections in a quarter leading to more inspected categories, or higher quarterly failure rates corresponding to higher annual rates) remain logically sound and help interpret the model’s predictions.
- (5)
- Result Comparison. As shown in Figure 8, according to the research outcomes, after text mining with LDA, the XGBoost-based prediction model achieved 93.8% positive predictive accuracy, meaning only 6.2 re-inspections are required per 100 alerts, significantly reducing unnecessary re-inspection costs for regulatory agencies. The confusion matrix shows FP = 9, proving the model’s strong noise resistance capability. Meanwhile, the model captured 88.3% of noncompliant samples, demonstrating a substantial improvement in regulatory coverage compared to traditional sampling strategies. With an F1-score of 0.850, the model balances precision and recall at a leading level. Therefore, its application could potentially drive further breakthroughs in Shanghai’s food safety inspection system.
- (6)
- Future Trend Prediction of Noncompliance Rates. Finally, this paper conducts predictive analysis based on the constructed XGBoost food safety prediction model (As shown in Figure 9). To ensure the scientificity and robustness of the prediction results, this study adopts a three-stage dynamic calibration strategy, deeply integrating historical trends into the prediction framework: First, time series decomposition and covariate alignment are performed. Specifically, the historical noncompliance rates from 2023 to Q1 2025 are decomposed using STL (Seasonal-Trend decomposition using LOESS), and the decomposed trend component is input into the XGBoost model as an explicit covariate. This forces the algorithm to learn historical growth patterns, avoiding interference from short-term fluctuations in capturing long-term trends. Second, Forward Chaining Validation is used to dynamically calibrate hyperparameters: a 12-month training window is adopted to predict the subsequent 3 months (e.g., predicting January–March 2024 based on January–December 2023), and key parameters are adjusted according to prediction errors (MAE, MAPE). Third, when predicting values for 2025, key covariates are dynamically extrapolated according to historical patterns, considering seasonal characteristics of factors such as regulatory intensity, major events and holidays, and climate.
5. Conclusions and Policy Implications
5.1. Policy Implications
- (1)
- Establish a dynamic hierarchical regulatory mechanism. In accordance with relevant laws and regulations, establish a risk-driven mechanism for allocating sampling inspection resources. In terms of categories, combine historical noncompliance rates to establish a red-yellow-green three-level risk classification system. Red-level products (bee products, vegetable products, edible agricultural products), with noncompliance rates exceeding 15% and high harm levels, will have their sampling frequency increased to three times the conventional level. Yellow-level categories (meat products, roasted seeds and nuts, etc.), with noncompliance rates between 5% and 15%, will maintain the conventional sampling frequency. Green-level categories (infant food, etc.), with noncompliance rates below 5% and controllable risks, will have their sampling frequency reduced to 50% of the conventional level, thereby directing regulatory resources toward high-risk categories. In terms of time and space dimensions, establish a time-dimensional adjustment mechanism based on seasonal risk characteristics. During the second quarter (April–July), due to high temperature and humidity increasing the risks of cold chain food spoilage and agricultural product mildew, the total sampling volume will be increased by 40%, with a focus on strengthening microbial monitoring in cold chain logistics links and rapid detection of pesticide residues in fresh agricultural products. From November to January of the following year, during the peak period of excessive additive use in processed foods, the proportion of tests for indicators such as preservatives and sweeteners will be increased to 30%, forming a closed-loop for seasonal risk prevention and control. In other aspects, implement hierarchical supervision for different entities based on business scale, supply chain complexity, and historical compliance records. For farmers’ markets, due to scattered stalls and difficulty in traceability, implement a “weekly inspection system” and establish a dynamic clearing mechanism for problem records. For chain supermarkets, which have strong self-inspection capabilities, adopt “double-random” monthly inspections (randomly selecting time periods and categories) while synchronously connecting with enterprises’ self-inspection data for cross-validation. For small food businesses, combine “Internet + supervision” technologies to achieve full quarterly coverage through mobile sampling terminals, minimizing regulatory blind spots.
- (2)
- Strengthen whole-chain risk prevention and control. First, tackle source governance. For pesticide and veterinary drug residues, collaborate with agricultural and rural departments to establish a “pesticide monitoring network for nectar plants,” and conduct quarterly soil pesticide residue surveys in major bee product-producing areas (such as Chongming and Fengxian). Meanwhile, reduce environmental pollution by promoting subsidy policies for low-toxicity biological pesticides, providing 30% cost subsidies for compliant growers. Second, upgrade process control. For cold chain logistics, mandate that imported cold chain enterprises access a temperature-controlled blockchain system; for cases of noncompliance caused by cold chain breakdowns, impose the maximum penalty in accordance with the Food Safety Law. For production hygiene, implement a “hygiene certification for reduced sampling frequency” policy for small and medium-sized food factories—enterprises certified under ISO22000 will have their sampling frequency reduced by 50%.
- (3)
- Promote technology-empowered smart supervision. Keep pace with the times and advance technology-empowered smart supervision. First, promote the in-depth application of digital models. For example, develop a Shanghai food safety risk map based on this model to generate real-time risk heat maps, guiding grassroots offices in precise deployment. Second, promote data integration and sharing among multiple group buying, online shopping platforms, and offline stores; connect the databases of agriculture, customs, and market supervision departments; and build a traceability chain “from farmland to port” to identify the contamination link of noncompliant products within 1 h.
- (4)
- Deepen the pattern of social co-governance. Further improve public participation mechanisms, encourage participation in “Ni Dian Wo Jian” (You Order, We Inspect), allow online voting for high-risk categories, and reward citizens who report label fraud or illegal additives. Meanwhile, strengthen corporate responsibilities by implementing “risk liability insurance,” requiring enterprises producing high-risk categories to purchase food safety insurance, and establishing a “red-black list” for food safety sampling inspections.
5.2. Discussion
- (1)
- Food risk characteristics are significant. In terms of time, the noncompliance rate increases year by year, showing the characteristics of “quarterly differentiation and monthly bimodality.” The second quarter is associated with supply chain restructuring and cold chain breakdowns; the third quarter reflects the effectiveness of cold chain infrastructure. In terms of categories, high-risk categories are concentrated in bee products (excessive pesticide residues), vegetable products (abuse of additives), and edible agricultural products (microbial contamination). Medium and low-risk categories mostly have abnormal indicators due to substandard storage conditions. In terms of pollution causes, pesticide and veterinary drug residues, abuse of additives, and microbial contamination constitute the three main causes, with category-specific characteristics.
- (2)
- Based on LDA text mining, the risk early warning model constructed using the XGBoost algorithm exhibits excellent performance. Through LDA, seven risk topics are mined (e.g., “loss of control in imported cold chains,” “pesticide residue contamination in agricultural products”), and combined with XGBoost to build a prediction model, which performs well under highly imbalanced data. High precision, high recall, and AUC values indicate that this model is significantly superior to comparison models such as Random Forest and Logistic Regression. In terms of relevant features, supply chain links, regulatory intensity, and consumption scenarios are core predictive variables.
- (3)
- There is room for optimization in the allocation of regulatory resources. Empirical results show that the traditional “full-coverage, equalization” sampling model is inefficient, and targeted strengthening of supervision is needed for high-risk categories and high-risk periods. Meanwhile, public attention and public opinion can drive regulatory responses, indirectly improving the compliance rate.
- (4)
- Research limitations and future recommendations. Although this study has constructed a dynamic prediction and early warning model for food safety risks in megacities, it still has certain limitations. First, at the data level, the study still over-relies on sampling data, which may miss unreported data, leading the model to underestimate local risks. Second, text fields are insufficiently standardized, and semantic noise remains despite preprocessing. At the model level, XGBoost still has room for improvement; meanwhile, the current fixed number of topics in LDA is seven, making it difficult to dynamically respond to new risks (e.g., the emerging “synthetic protein” issue in March 2025). To address the above limitations, future research is suggested to attempt data bias correction, such as constructing compensation factors; dynamic topic evolution, such as introducing DTM (Dynamic Topic Model) for splitting new topics; and establishing an emergency response mechanism, such as integrating Reinforcement Learning (RL) agents to perform online model fine-tuning when SHAP values detect that the contribution of unknown factors reaches a certain level. Additionally, future work is recommended to construct a food safety risk meta-learning framework (Meta-RiskNet), respond to data heterogeneity and sudden disturbances through adaptive module switching, and promote the upgrade of the prediction paradigm from “static response” to “autonomous evolution”.
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Topic ID | Top 5 Feature Words | Topic Name |
---|---|---|
1 | Cold chain, Import, Temperature, Colony, Broken chain | Import cold chain out-of-control risk |
2 | Pesticide residue, Vegetable, Orgadnic phosphorus, Source, Planting | Agricultural product pesticide pollution |
3 | Additive, Preservative, Exceeding standards, Pigment | Processed food additive abuse |
4 | You pick I test, Internet celebrity, Public opinion | High-concern product risk for citizens |
5 | Label, Forgery, Place of origin, Expired | Food label fraud |
6 | Heavy metal, Cadmium, Aquatic product, Soil, Offshore | Environmental pollution-induced heavy metal excess |
7 | Disinfection, Hygiene, Workshop, Equipment, Operation | Production hygiene management deficiency |
References
- Peng, S.X.; Zhang, R.X. Optimization of Organizational Model of Public Food Safety Emergency Management in China. Chin. J. Eng. Sci. 2023, 24, 88–98. [Google Scholar] [CrossRef]
- Yasuda, J.K. Why food safety fails in China: The politics of scale. China Q. 2015, 223, 745–769. [Google Scholar] [CrossRef]
- Chai, D.; Meng, T.; Zhang, D. Influence of food safety concerns and satisfaction with government regulation on organic food consumption of Chinese urban residents. Foods 2022, 11, 2965. [Google Scholar] [CrossRef]
- Yang, H.; Tian, Y. Application Research of Machine Learning in Food Safety Risk Early Warning and Sampling Inspection Program. Manag. Rev. 2022, 34, 315. [Google Scholar]
- Zhou, J.; Wei, K.; Jin, Y.; Xu, Z. Food Safety Risk Prediction and Regulatory Policy Implications Based on Machine Learning: Evidence from Fresh Aquatic Products. Issues Agric. Econ. 2024, 5, 4–19. [Google Scholar] [CrossRef]
- Zhou, J.; Jin, Y.; Liang, Q. Effects of Regulatory Policy Mixes on Traceability Adoption in Wholesale Markets: Food Safety Inspection and Information Disclosure. Food Policy 2022, 107, 102218. [Google Scholar] [CrossRef]
- Nogales, A.; Díaz-Morón, R.; García-Tejedor, Á.J. A Comparison of Neural and Non-Neural Machine Learning Models for Food Safety Risk Prediction with European Union RASFF Data. Food Control 2022, 134, 108697. [Google Scholar] [CrossRef]
- Zhang, R.; Zhou, L.; Zuo, M.; Zhang, Q.; Bi, M.; Jin, Q.; Xu, Z. Prediction of Dairy Product Quality Risk Based on Extreme Learning Machine. In Proceedings of the 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA), Changsha, China, 21–23 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 448–456. [Google Scholar]
- Alkhawaldeh, I.M.; Albalkhi, I.; Naswhan, A.J. Challenges and limitations of synthetic minority oversampling techniques in machine learning. World J. Methodol. 2023, 13, 373. [Google Scholar] [CrossRef]
- Brabec, J.; Machlica, L. Bad practices in evaluation methodology relevant to class-imbalanced problems. arXiv 2018, arXiv:1812.01388. [Google Scholar] [CrossRef]
- Liu, G.; Li, G.; Yang, R.; Guo, L. Improving Food Safety in Supply Chain Based on Big Data. E3S Web Conf. 2018, 53, 03084. [Google Scholar] [CrossRef]
- Holvoet, K.; Sampers, I.; Seynnaeve, M.; Jacxsens, L.; Uyttendaele, M. Agricultural and Management Practices and Bacterial Contamination in Greenhouse Versus Open Field Lettuce Production. Int. J. Environ. Res. Public Health 2015, 12, 32–63. [Google Scholar] [CrossRef]
- Stoev, S.D. Foodborne Diseases Due to Underestimated Hazard of Joint Mycotoxin Exposure at Low Levels and Possible Risk Assessment. Toxins 2023, 15, 464. [Google Scholar] [CrossRef] [PubMed]
- Lebelo, K.; Malebo, N.; Mochane, M.J.; Masinde, M. Chemical Contamination Pathways and the Food Safety Implications Along the Various Stages of Food Production: A Review. Int. J. Environ. Res. Public Health 2021, 18, 5795. [Google Scholar] [CrossRef] [PubMed]
- Virumbrales, C.; Hernández-Ruiz, R.; Trigo-López, M.; Vallejos, S.; García, J.M. Sensory Polymers: Trends, Challenges, and Prospects Ahead. Sensors 2024, 24, 3852. [Google Scholar] [CrossRef] [PubMed]
- Ibrahim, M.S. Mitigasi Risiko Rantai Pasok Daging Sapi di PT Gizi Pangan Utama Bekasi. Sharia Agribus. J. 2022, 2, 91–146. [Google Scholar] [CrossRef]
- Liu, N.; Mao, A. Food Safety Management System in Star Hotel and Safety Hazards Countermeasures. Adv. J. Food Sci. Technol. 2015, 9, 911–913. [Google Scholar] [CrossRef]
- Majumdar, S. The Role of Food Safety Regulations in Preventing E. coli Contamination in Leafy Greens. Int. J. Agric. Nutr. 2022, 4, 87–90. [Google Scholar] [CrossRef]
- Hernández-Rubio, J.; Pérez-Mesa, J.C.; Piedra-Muñoz, L.; Galdeano-Gómez, E. Determinants of Food Safety Level in Fruit and Vegetable Wholesalers’ Supply Chain: Evidence from Spain and France. Int. J. Environ. Res. Public Health 2018, 15, 2246. [Google Scholar] [CrossRef]
- Colon, C.; Brännström, Å.; Rovenskaya, E.; Dieckmann, U. Fragmentation of Production Amplifies Systemic Risks from Extreme Events in Supply-Chain Networks. PLoS ONE 2020, 15, e0244196. [Google Scholar] [CrossRef]
- Macieira, A.; Barbosa, J.; Teixeira, P. Food Safety in Local Farming of Fruits and Vegetables. Int. J. Environ. Res. Public Health 2021, 18, 9733. [Google Scholar] [CrossRef]
- Aqeel, A.; Abbas, T.; Mirani, Z.A.; Naveed, T.; Ahmed, N.; Basit, A. Seasonal Variation in Microbial Contamination of Various Food Items in Karachi: Microbial Contamination of Various Food. Biol. Sci.-PJSIR 2021, 64, 75–80. [Google Scholar]
- Bianchi, D.M.; Maurella, C.; Lenzi, C.; Fornasiero, M.; Barbaro, A.; Decastelli, L. Influence of Season and Food Type on Bacterial and Entero-Toxigenic Prevalence of Staphylococcus aureus. Toxins 2022, 14, 671. [Google Scholar] [CrossRef]
- Leonard, S.R.; Simko, I.; Mammel, M.K.; Richter, T.K.S.; Brandl, M.T. Seasonality, Shelf Life and Storage Atmosphere Are Main Drivers of the Microbiome and E. coli O157:H7 Colonization of Post-Harvest Lettuce Cultivated in a Major Production Area in California. Environ. Microbiome 2021, 16, 25. [Google Scholar] [CrossRef]
- Smith, B.A.; Meadows, S.; Meyers, R.; Parmley, E.J.; Fazil, A. Seasonality and Zoonotic Foodborne Pathogens in Canada: Relationships Between Climate and Campylobacter, E. coli and Salmonella in Meat Products. Epidemiol. Infect. 2019, 147, e190. [Google Scholar] [CrossRef]
- Marty, L.; de Lauzon-Guillain, B.; Nicklaus, S. Short- and Mid-Term Impacts of COVID-19 Outbreak on the Nutritional Quality and Environmental Impact of Diet. Front. Nutr. 2022, 9, 838351. [Google Scholar] [CrossRef] [PubMed]
- Miller, V.; Webb, P.; Cudhea, F.; Shi, P.; Zhang, J.; Reedy, J.; Erndt-Marino, J.; Coates, J.; Mozaffarian, D.; Global Dietary Database. Global Dietary Quality in 185 Countries from 1990 to 2018 Show Wide Differences by Nation, Age, Education, and Urbanicity. Nat. Food 2022, 3, 694–702. [Google Scholar] [CrossRef] [PubMed]
- Kosola, M.; Kiviniemi, K.; Lundén, J. Factors Affecting Effectiveness of Food Control Inspections in Food Production Establishments in Finland. Sci. Rep. 2022, 12, 4230. [Google Scholar] [CrossRef]
- Li, K.; Yin, S.; Chen, Y. Analysis of Cross-Regional Transfer of Food Safety Risks and Its Influencing Factors—An Empirical Study of Five Provinces in East China. Foods 2023, 12, 1596. [Google Scholar] [CrossRef]
- Amaiach, R.; Lairini, S.; Fadil, M.; Benboubker, M.; Bouslamti, R.; Amrani, S.E.; Lalami, A.E.O. Microbiological Profile and Hygienic Quality of Foodstuffs Marketed in Collective Catering in Central Morocco. Int. J. Food Sci. 2023, 2023, 2820506. [Google Scholar] [CrossRef] [PubMed]
- Legesse, A.; Muluken, A.; Getasew, A. A Survey on Awareness of Consumers About Health Problems of Food Additives in Packaged Foods and Their Attitude Toward Consumption of Packaged Foods: A Case Study at Jimma University. Int. Food Res. J. 2016, 23, 375. [Google Scholar]
- Making, M.S.S.; Ratu, K.; Lada, C.O. The Correlation Between Knowledge and Behavior of Reading Composition Label of Packaged Food on Public 2 JHS and Giovani JHS Students in Kupang. World Nutr. J. 2023, 6, 1–9. [Google Scholar] [CrossRef]
- Sołtysiak, M.; Zając, D. Formation as a Factor in Consumer Demand for Organic Food. Humanit. Soc. Sci. 2022, 29, 51–64. [Google Scholar] [CrossRef]
- Indu, S. A Comparative Study to Assess the Knowledge Regarding Food Adulteration and Its Detection and to Create Awareness Among Homemakers in Selected Rural and Urban Community of Durg District, Chhattisgarh. Pondicherry J. Nurs. 2020, 12, 31–37. [Google Scholar] [CrossRef]
- Wang, H.; Ma, L. Media Coverage and Citizens’ Perceptions of Food Safety in Urban China. China Policy J. 2020, 2, 55–76. [Google Scholar] [CrossRef]
- Moruzzo, R.; Riccioli, F.; Boncinelli, F.; Zhang, Z.; Zhao, J.; Tang, Y.; Tinacci, L.; Massai, T.; Guidi, A. Urban Consumer Trust and Food Certifications in China. Foods 2020, 9, 1153. [Google Scholar] [CrossRef] [PubMed]
- Stoitsis, G.; Papakonstaninou, M.; Karvounis, M.; Manouselis, N. The Role of Big Data and Artificial Intelligence in Food Risk Assessment and Prediction. In Present Knowledge in Food Safety; Elsevier: Amsterdam, The Netherlands, 2023; pp. 1032–1044. [Google Scholar]
- Berrueta, L.A.; Alonso-Salces, R.M.; Héberger, K. Supervised Pattern Recognition in Food Analysis. J. Chromatogr. A 2007, 1158, 196–214. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Wu, C.; Zhang, Q.; Wu, D. Review of Visual Analytics Methods for Food Safety Risks. npj Sci. Food 2023, 7, 49. [Google Scholar] [CrossRef]
- Xiong, Y.; Li, W.; Liu, T. Risk Early Warning of Food Quality Safety in Meat Processing Industry. Int. J. Environ. Res. Public Health 2020, 17, 6579. [Google Scholar] [CrossRef] [PubMed]
- Hao, C.; Zhang, Q.; Wang, S.; Jiang, T.; Dong, W. Prediction of Safety Risk Levels of Benzopyrene Residues in Edible Oils in China Based on the Variable-Weight Combined LSTM-XGBoost Prediction Model. Foods 2023, 12, 2241. [Google Scholar] [CrossRef]
- Song, D.; Hu, G.; Li, H.; Zhao, H.; Wang, Z.; Liu, Y. Real Estate Market Forecasting for Enterprises in First-Tier Cities: Based on Explainable Machine Learning Models. Systems 2025, 13, 513. [Google Scholar] [CrossRef]
- Ma, Y.; Hou, Y.Y.; Liu, Y.S.; Xue, Y.H. Research of Food Safety Risk Assessment Methods Based on Big Data. In Proceedings of the IEEE International Conference on Big Data Analysis, Hangzhou, China, 12–14 March 2016. [Google Scholar]
- Mu, W.; Kleter, G.A.; Bouzembrak, Y.; Dupouy, E.; Frewer, L.J.; Al Natour, F.N.R.; Marvin, H.J.P. Making Food Systems More Resilient to Food Safety Risks by Including Artificial Intelligence, Big Data, and Internet of Things into Food Safety Early Warning and Emerging Risk Identification Tools. Compr. Rev. Food Sci. Food Saf. 2024, 23, e13296. [Google Scholar] [CrossRef] [PubMed]
- Gholami, S.; Knippenberg, E.; Campbell, J.; Andriantsimba, D.; Kamle, A.; Parthasarathy, P.; Sankar, R.; Birge, C.; Ferres, J.L. Food Security Analysis and Forecasting: A Machine Learning Case Study in Southern Malawi. Data Policy 2022, 4, e33. [Google Scholar] [CrossRef]
- Qasrawi, R.; Hoteit, M.; Tayyem, R.; Bookari, K.; Al Sabbah, H.; Kamel, I.; Dashti, S.; Allehdan, S.; Bawadi, H.; Waly, M.; et al. Machine Learning Techniques for the Identification of Risk Factors Associated with Food Insecurity Among Adults in Arab Countries During the COVID-19 Pandemic. BMC Public Health 2023, 23, 1805. [Google Scholar] [CrossRef]
- Chen, Y.; Li, H.; Dou, H.; Wen, H.; Dong, Y. Prediction and Visual Analysis of Food Safety Risk Based on Tabnet-Gra. Foods 2023, 12, 3113. [Google Scholar] [CrossRef]
- Jing, W.; Qian, B.; Yannian, L. Study on Food Safety Risk Based on LightGBM Model: A Review. Food Sci. Technol. 2022, 42, e42021. [Google Scholar] [CrossRef]
- Zhong, J.; Sun, L.; Zuo, E.; Chen, C.; Chen, C.; Jiang, H.; Li, H.; Lv, X. An Ensemble of AHP-EW and AE-RNN for Food Safety Risk Early Warning. PLoS ONE 2023, 18, e0284144. [Google Scholar] [CrossRef] [PubMed]
- Mokal, V.U. Hard Disk Drive Failure Detection Using Hybrid Algorithm. Int. J. Res. Appl. Sci. Eng. Technol. 2021, 9, 1233–1244. [Google Scholar] [CrossRef]
Feature (Feature Group) | Type | Definition |
---|---|---|
Product Manufacturer Name Identifier | Original Field | LDA input material: manufacturer or agent manufacturer name |
Manufacturer Address Identifier | Original Field | LDA core corpus: manufacturer or agent manufacturer address |
Sampled Unit Name | Original Field | LDA input material: name of the sampled unit |
Province of Location | Manually Annotated | Province where the sampled unit is located (e.g., a district in Shanghai) |
Product Name | Original Field | LDA input material: name of the sampled product |
Specification | Original Field | Packaging types (e.g., piece-counted, weight-measured, bagged with varying weights) |
Production Date/Batch Number | Original Field | Production date or batch number of the sampled product |
Category | Original Field | 34 food categories |
Notes | Text Field | LDA core corpus: additional information (e.g., special classifications, inspection status) |
Sampling Date | Original Field | Publicized sampling report date. Used as proxy since actual sampling dates are not disclosed and closely align with report dates. |
Noncompliance Status | Manually Annotated | 1 if nonconforming, 0 if conforming |
Noncompliance Cause | Text Field | Detailed reasons for noncompliance (e.g., excessive additives, illegal additives, label fraud) |
Excessive Additives | Manually Annotated | 1 if additive levels exceed limits, 0 otherwise |
Excessive Microbial Count | Manually Annotated | 1 if pathogenic microbial counts exceed limits, 0 otherwise |
Major Holidays/Events in Month | Statistically Derived | 1 if major holidays/events occurred in the month, 0 otherwise |
Number of Major Holidays/Events in Quarter | Statistically Derived | Frequency of major holidays/events in the quarter |
Total Sampling Counts in Month | Statistically Derived | Total food safety sampling events conducted by authorities in the month |
Sampling Counts per Category in Month | Statistically Derived | Total sampling events for a specific category in the month |
Total Sampling Counts in Quarter | Statistically Derived | Total food safety sampling events conducted by authorities in the quarter |
Sampling Counts per Category in Quarter | Statistically Derived | Total sampling events for a specific category in the quarter |
Number of Sampled Categories in Month | Statistically Derived | Total distinct food categories sampled in the month |
Number of Sampled Categories in Quarter | Statistically Derived | Total distinct food categories sampled in the quarter |
Noncompliance Counts in Month | Statistically Derived | Total nonconforming cases across all categories in the month |
Noncompliance Counts in Quarter | Statistically Derived | Total nonconforming cases across all categories in the quarter |
Number of Nonconforming Categories in Month | Statistically Derived | Total categories with nonconforming cases in the month |
Number of Nonconforming Categories in Quarter | Statistically Derived | Total categories with nonconforming cases in the quarter |
Quarterly Noncompliance Rate | Statistically Derived | Noncompliance counts in quarter/total sampling counts in quarter |
Annual Noncompliance Rate | Statistically Derived | Noncompliance counts in year/total sampling counts in year |
Quarter | Manually Annotated | Sampling date falls within a specific quarter |
Special Classification/Citizen-Requested Inspection | Manually Annotated | LDA Prior Knowledge: Whether the item is a citizen-prioritized focus (e.g., “Citizen-Requested Inspection” project) |
Historical Noncompliance Rate for Category | Statistically Derived | Historical noncompliance rate for the same category within a time window |
Annual Noncompliance Rate for Category | Statistically Derived | Annual noncompliance rate for the same category |
Manufacturer Address in Shanghai | Manually Annotated | 1 if manufacturer address is in Shanghai, 0 otherwise |
Sales Entity: Farmers’ Market | Manually Annotated | 1 if the sales entity is a farmers’ market, 0 otherwise |
Sales Entity: Supermarket | Manually Annotated | 1 if the sales entity is a supermarket or chain supermarket, 0 otherwise |
Sales Entity: Retail Store | Manually Annotated | 1 if the sales entity is a retail store or small shop, 0 otherwise |
Packaged/Bulk | Manually Annotated | 1 if packaged, 0 if bulk retail |
Method | Accuracy | Precision | Recall | F1 Score | AUC |
---|---|---|---|---|---|
XGBoost | 0.794 | 0.938 | 0.883 | 0.850 | 0.739 |
Random Forest | 0.739 | 0.715 | 0.817 | 0.850 | 0.777 |
Logistic Regression | 0.705 | 0.742 | 0.808 | 0.753 | 0.745 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, D.; Cai, H.; Li, T. Food Safety Risk Prediction and Regulatory Policy Enlightenment Based on Machine Learning. Systems 2025, 13, 715. https://doi.org/10.3390/systems13080715
Wu D, Cai H, Li T. Food Safety Risk Prediction and Regulatory Policy Enlightenment Based on Machine Learning. Systems. 2025; 13(8):715. https://doi.org/10.3390/systems13080715
Chicago/Turabian StyleWu, Daqing, Hangqi Cai, and Tianhao Li. 2025. "Food Safety Risk Prediction and Regulatory Policy Enlightenment Based on Machine Learning" Systems 13, no. 8: 715. https://doi.org/10.3390/systems13080715
APA StyleWu, D., Cai, H., & Li, T. (2025). Food Safety Risk Prediction and Regulatory Policy Enlightenment Based on Machine Learning. Systems, 13(8), 715. https://doi.org/10.3390/systems13080715