Milk Quality Detection Using Machine Learning

Shahzad, Atif; Javaid, Sabeen; Alamsyah, Zaenal

doi:10.3390/engproc2025107119

Open AccessProceeding Paper

Milk Quality Detection Using Machine Learning^†

by

Atif Shahzad

^1,*,

Sabeen Javaid

¹ and

Zaenal Alamsyah

²

¹

Department of Software Engineering, University of Sialkot, Sialkot 51040, Pakistan

²

Information Systems, Nusa Putra University, Sukabumi 43152, West Java, Indonesia

^*

Author to whom correspondence should be addressed.

^†

Presented at the 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society, Aizuwakamatsu City, Japan, 20–26 January 2025.

Eng. Proc. 2025, 107(1), 119; https://doi.org/10.3390/engproc2025107119

Published: 9 October 2025

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

Download

Browse Figure

Versions Notes

Abstract

Poor-quality milk and the use of chemicals in it can lead to serious health problems, including various diseases and, in some cases, even death for those who consume it. In our society, using such products or contaminated milk that contains chemicals or is of bad quality, often with water or other adulterants, is very common. Based on previous research and existing models, we have improved the process to better and more accurately predict milk quality by using a voting system. This system uses four different algorithms: KNN (K Nearest Neighbour), Naïve Bayes, Random Forest, and Decision Tree. We applied these models to a dataset with almost 1000 samples. To enhance performance, we used brute-force feature selection and a voting process to make accurate decisions. All these procedures were implemented in RapidMiner Studio, resulting in an overall accuracy of 99.69%.

Keywords:

milk quality; machine learning; voting system; feature selection; RapidMiner Studio

1. Introduction

Milk quality affects health and the economy. Bad quality milk may cause different kinds of diseases that impact human health [1]. Our authorities need a better system that automatically detects milk quality, including chemical content and water content, which affect the milk quality. Our system is fast and accurate. In traditional systems, this process is time-consuming, costly, and error-prone [2]. Previous researchers who have worked on this issue have introduced less costly, accurate, and fast systems. Now, we have developed a more accurate system by applying different methods. We use a voting system and feature selection to identify necessary features for better results and voting for improved accuracy. Thereby, we have achieved better and higher accuracy.

2. Literature Review

Milk plays a very important role in our lives, and milk is the most commonly used food in India. However, it is often contaminated due to the addition of water and chemicals. Research shows variability in the sensitivity and reliability of detecting adulteration. A study of 110 milk samples found 47.3% were negative, while 76.6% showed signs of adulteration. This research supports previous findings and offers a simpler and more cost-effective approach to milk quality testing, corroborating previous studies and providing a more accurate assessment of milk quality [3]. This paper explains that detecting milk quality by hand is very slow and prone to mistakes, especially for small farmers. This paper also states that quick and inexpensive methods to check milk quality are not currently available. It discusses a low-cost plan to determine milk quality and identify fake milk using a simple setup with Arduino, gas sensors, moisture sensors, and an LED (Light Emitting Diode) LDR (Light Dependent Resistor). Most of the time, the results that show the difference between good and bad milk are faster and can be updated online faster than older methods. This system shows that it is cheap, easy to move, and suitable for small-scale use [4]. The researcher emphasizes that identifying good milk quality can be difficult, but it is very important for public health and industry. One flaw associated with old KNN models is noted. They treat all nearby points the same, which reduces their final accuracy. The authors discuss an improved KNN called DW-KNN, which assigns more weight to closer points. They tested it on a large milk quality dataset. The accuracy achieved was 99.53% for DW-KNN and 98.58% for the traditional KNN. This shows that DW-KNN might be a better choice than previous machine learning methods for assessing milk quality [5]. This study indicates that most of the available techniques do not perform efficiently on real-world data and do not merge numerical and categorical data into a single output for effective, real-time, high-accuracy quality assessments. The study also examines two models, namely, Random Forest and Support Vector Machine. With 92% accuracy for SVM compared to 57% for RF-Classified Average, SVM outperforms RF-Classified Average in terms of precision, achieving a remarkable classification precision of 99.99%. Therefore, it is necessary to compare these findings with similar work, like Sheng et al.’s research with Gradient-Boosted Regression Trees and Brudzewski et al.’s research with SVM with the electronic nose. The RF model was found to be simpler, less expensive, and more performance-oriented, making it safer for the real-time prediction of milk quality. An immediate comparison is needed [6]. This research paper discusses an ML-based quality measurement mechanism that aims to overcome the quality-related drawbacks of traditional evaluation techniques and address the inaccuracy and inefficiency of the manual evaluation process. They identified a gap specifically concerning real-time high-accuracy quality checks for categorical data. They employed RF and SVM classifiers, achieving classification accuracies of 99.99%, with RF performing significantly better than SVM—92% and 57% for RF and SVM, respectively. Similarly, Medha Khenwar, Swati Vishnoi, and Ankur Sisodia developed an IoT-based model with gas, pH, viscosity, and temperature sensors for online milk quality monitoring, which achieved about 90% accuracy under real-time LabVIEW testing conditions. Both approaches improve upon the simplicity and cost-effectiveness of previous work, such as Sheng and Brudzewski, whereas Bhavsar’s RF model was superseded in terms of accuracy and Khenwar’s in terms of integration into practical online applications [7]. In this paper, experts have explored new ways to use machine learning to better assess milk quality. Bhavsar and Khenwar used Random Forest and Support Vector Machine methods to achieve nearly perfect results, hitting 99.99% accuracy. Khenwar and his team, who work with him, set up an IoT system to monitor milk quality constantly. At the same time, Mazhar developed a special kit that quickly and cheaply detects germs in milk, both in pasteurized and raw types [8]. The aim of this research paper on the Smart Milk Quality Detection System is to address the shortcomings of traditional methods related to adulteration in the Indian milk industry. It comprises sensor-based measures on an Arduino to detect specific parameters, with results displayed on LED and LCD screens. This makes the device portable, low energy-consuming, yet highly accurate and fast for field use. Unlike previous works by Chakravarty, Bhatnagar, and Wolf, which were mostly IT system-oriented or focused on hardware–software integration, this system offers simple solutions for small-scale dairy economies, democratizing and economizing traditional complex methods [1]. This research paper, based on the Arduino Milk Quality Check System, tackles milk tampering. It uses sensors to monitor and display key traits on an LCD and sends data via Bluetooth for quick results. It is a very fast and precise system that works well for small dairy farms, making it better than older methods [2]. The study focuses on the problem of mixing contaminated substances into Bangladesh’s milk, including raw, pasteurized, and UHT (Ultra High Temperature) milk, which is a major issue due to fluctuations in quality and the risk of tiny germs. A detailed study that examined physical, chemical, germ-related, and contaminant tests found that treated milk, such as UHT and pasteurized types, was cleaner from germs [9]. The paper also discusses issues of poor milk quality in Montenegro’s dairy industry. It proposes a smart can equipped with sensors that monitor quality levels using IoT. This solution improves how quality parameters like pH, temperature, and others are checked and managed and helps save money. It is especially suitable for small dairy operations by using IoT and smart learning tools [10]. The study also addresses the issue of milk mixing in India, where machine learning models like Random Forest and Support Vector Machine were used to assess milk quality. The system achieved up to 99.99% accuracy, which in turn makes the findings highly reliable and effective, while also being cost-efficient for setup and real-time application [11]. It has then been separated from the manual or machine methods because of its high accuracy and wider use, although it would be easy for many people to adopt it due to its low costs [12]. The study addresses issues with cold-loving bacteria in milk production. These bacteria produce strong enzymes that damage milk’s beneficial qualities. This research paper highlights the need for better ways to detect these bacteria early and prevent their growth. This includes using new technology to identify them, improving cleaning measures, and changing milk packaging methods [13]. It urges farms to use smart detection tools and strict checks to reduce losses and make milk safer [14]. This study focuses on milk quality and highlights the limitations of traditional methods. It discusses using single-use taste testers made from fat skins, aided by pattern recognition tools like Principal Component Analysis. The goal is to distinguish fresh milk from spoiled milk. This setup is cheap, easy to move, and simple to use. It has been effective in monitoring milk spoilage over time and can accurately identify which samples are fresh and which are spoiled, outperforming older methods [15]. This paper, which is based on an IoT system, addresses milk adulteration by providing a real-time solution for quality control. It uses sensors to measure parameters like pH, temperature, TDS, and color, and displays results via the Blynk application to detect milk impurities [16]. The system is efficient at detecting impurities like urea and starch, offering accurate, fast, and low-cost solutions for both industrial and household applications. This system enhances transparency, consumer trust, and operational efficiency in the dairy sector [17].

3. Methodology

Machine learning (ML) is a subset of artificial intelligence (AI) that involves creating algorithms that have the ability to learn from historical data, make predictions, and improve over time. In this research, machine learning algorithms are applied to a dataset of milk quality and purity parameters with the aim of creating predictive models. The algorithms are trained on historical data and tested on new data to assess their performance. In particular, this study focuses on some supervised learning algorithms like K-Nearest Neighbors (KNN), Naïve Bayes, Decision Trees, and Random Forests, to compare which model yields the best prediction accuracy. Each algorithm has its own traits and strengths, and comparing them helps identify the most suitable method for milk quality prediction.

Dataset

The dataset is taken from a website named Kaggle. The dataset we are going to use is manually collected from observations. It helps us build machine learning models to predict the quality and purity of the milk. This dataset consists of 7 independent variables, such as pH, temperature, taste, odor, fat, turbidity, and color. Generally, the grade or quality of the milk depends on these parameters, which play a significant role in the predictive analysis of the milk [15].

The Split Data operator is used to create a partition of the given dataset into the desired number of subsets. The ratio can be divided as much as desired, but the sum of the ratios should be 1. This operator is a bit different from other rapid minor operators. Additionally, the dataset splitting ratio is 70% for training and 30%.

We used brute force for feature selection. The brute force algorithm solves a problem by exhaustively going through all possible choices until a solution is found. We initially provided the whole model, then identified the necessary features, which are very important, and removed one feature: milk color.

4. Result

We present the performance results after training various machine learning algorithms and comparing their results on the milk quality dataset. These machine learning models include K-Nearest Neighbors (KNN), Naïve Bayes, Decision Tree, and Random Forest. The rapid miner model is shown in Figure 1. In Figure 1, RapidMiner workflow performs feature selection and classification. It starts by retrieving the data (Retrieve Milk Data), uses Optimize Selection (Attributes) to find the best subset of features, and then Split Data divides the results for training and testing. The model (Vote classifier) is trained and then applied to the test data via Apply Model, with the final performance metrics calculated by the Performance operator. The testing and training were carried out on the dataset, which includes parameters such as pH, temperature, taste, odor, fat, turbidity, and color. The performance of these models was evaluated using several metrics, including accuracy, precision, recall, specificity, F-measure, and confusion matrix.

The accuracy of the models was among the greatest concerns for assessing overall performance. Accuracy is a measure of the ratio of the number of accurate predictions made by the model to the total number of predictions. The results showed that the Random Forest model achieved the highest accuracy of 94.65%.

Second, we calculated the precision of the models, an important measure for determining how many positive predictions the model correctly identified. Precision calculates the model’s ability to accurately classify positive instances without incorrectly labeling negative instances as positive. The Random Forest model again outperformed all the remaining models with a precision rate of 0.92. The Random Forest model achieved an accuracy of 0.88, followed by KNN at 0.85 and Naïve Bayes at 0.81. This once again supports the reliability of the Random Forest model in making positive predictions about the quality of milk.

In terms of recall, or the proportion of true positives the model correctly predicted, the Random Forest model led with a recall of 0.91. A recall value of 0.84 was attained by the Decision Tree model, with KNN and Naïve Bayes producing recall values of 0.79 and 0.75, respectively. These results demonstrate that the Random Forest model was most effective in correctly classifying actual positive examples of high-quality milk, making it the most appropriate choice for situations in which detecting positive examples is critical.

Another important measure used to evaluate the models was specificity, or the true negative rate. It is particularly significant when the goal is to avoid producing false positives and correctly identify negative cases. The Random Forest model again had the highest specificity at 0.93, followed by the Decision Tree model at 0.89, KNN at 0.86, and Naïve Bayes at 0.81. The high specificity of the Random Forest model indicates that it was very effective at identifying low-quality milk samples without labeling them as high-quality.

F-measure, as a combination of precision and recall into a single measure, is particularly useful when the goal is to balance the model’s ability to accurately categorize both negative and positive samples. The best F-measure was achieved by the Random Forest model at 0.92, followed closely by the Decision Tree with an F-measure of 0.86, then KNN at 0.82 and Naïve Bayes at 0.78. The higher F-measure of the Random Forest model shows its proficiency in maintaining a good balance between recall and precision, making it the most accurate among the models in this study. Table 1 shows the confusion matrix.

The confusion matrix of the Random Forest model clearly reveals its high accuracy in predicting milk quality occurrences. The model was completely accurate in predicting the “high” quality class by classifying all instances as high quality. It also performed very well with the “low” and “medium” classes, with only a few misclassifications. This suggests that the Random Forest model was able to distinguish between different classes of milk quality with high confidence.

The overall performance of the Random Forest model was significantly better than that of the other machine learning models utilized in this study. Its ability to achieve maximum values in all measures—accuracy, precision, recall, specificity, and F-measure—is a reflection of its potential as a classifying model of milk quality. The key factor behind the Random Forest model’s superior performance is the ensemble mechanism, which is based on combining multiple decision trees to maximize prediction accuracy.

Although the Decision Tree model also performed very well, particularly in accuracy and recall, it did not outperform the Random Forest model in precision, specificity, or F-measure. This is consistent with the general knowledge that although Decision Trees are effective, they are more prone to overfitting than ensemble models like the Random Forest model. The KNN model, although useful, showed moderate performance across all metrics, with notably lower scores in precision and accuracy compared to the Decision Tree and Random Forest models. The Naïve Bayes model, although providing a good baseline, performed the worst across all metrics, which is a typical phenomenon with this algorithm in the event of non-linear relationships or complicated datasets, like the dataset used in this research.

Accuracy of Machine Learning Models

In this research, we apply a voting system to improve accuracy. In this system, we apply four different models: Decision Tree, Naïve Bayes, Random Forest, and KNN. The voting system relies on majority votes, with the decision going to the side with the majority, which increases confidence in the accuracy of the prediction. Initially, we test each individual model and achieve the following accuracies: KNN at 96.53%, Naïve Bayes at 99.47%, Random Forest at 98.56%, and Decision Tree at 94.58%. After applying the voting system, we achieved an improved accuracy of 99.67%, which is a strong result that helps us make better decisions.

5. Conclusions

In this research paper, an accurate milk quality detection system is proposed. As is well-known, it is very important to detect the quality of milk because it can lead to various diseases, and its side effects can be severe, potentially leading to death. This system provides good accuracy regarding milk quality and helps in making accurate decisions about whether milk should be used or not. The voting system uses four different models, including Decision Tree, Random Forest, Naïve Bayes, and KNN, in a majority voting scheme. After applying this method, the highest accuracy achieved is 99.67%. Feature selection, known as brute force, is used to identify only the necessary features. In the future, this work can be further advanced by exploring more optimal solutions to this problem. We will collect more data and apply additional models or algorithms that exist or emerge to achieve even better accuracy.

Author Contributions

A.S. conceptualized the study, designed the research framework, and supervised the project. S.J. performed data collection, preprocessing, model implementation, evaluation of results, and prepared figures and tables. Z.A. provided supervision and guidance, critically reviewed the methodology and results, validated findings, and contributed to manuscript editing and final approval. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from first author, upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Saravanan, S.; Kavinkumar, M.; Kokul, N.S.; Krishna, N.S.; Nitheeshkumar, V.I. Smart milk quality analysis and grading using IoT. In Proceedings of the IEEE 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 6–8 May 2021; pp. 378–383. [Google Scholar]
Goswami, S. Arduino-Based Milk Quality Monitoring System. Int. J. Agric. Environ. Biotechnol. 2021, 14, 245–249. [Google Scholar] [CrossRef]
Ibrahim, T.; Wattoo, F.H.; Sarwar Wattoo, M.H.; Hamid, S. Assessment of Fresh Milk Quality Through Quality Parameters. Pak. J. Health Sci. 2023, 4, 21–25. [Google Scholar] [CrossRef]
Rajakumar, G.; Kumar, T.A.; Samuel, T.A.; Kumaran, E.M. IoT based milk monitoring system for detection of milk adulteration. Int. J. Pure Appl. Math. 2018, 118, 21–32. [Google Scholar]
Samad, A.; Taze, S.; Kürsad UÇAR, M. Enhancing Milk Quality Detection with Machine Learning: A Comparative Analysis of KNN and Distance-Weighted KNN Algorithms. Int. J. Innov. Sci. Res. Technol. 2024, 9, 2021–2029. [Google Scholar] [CrossRef]
Bhavsar, D.; Jobanputra, Y.; Swain, N.K.; Swain, D. Milk Quality Prediction Using Machine Learning. EAI Endorsed Trans. Internet Things 2024, 10, 1–5. [Google Scholar] [CrossRef]
Khenwar, M.; Vishnoi, S.; Sisodia, A. An Assessment of Milk Adulteration IoT Based Model to Identify the Quality of Milk using Lab View. In Proceedings of the 2022 11th International Conference on System Modeling & Advancement in Research Trends (SMART), Moradabad, India, 16–17 December 2022; pp. 868–873. [Google Scholar] [CrossRef]
Ahmad, S.; Bader Ul Ain, H.; Tufail, T.; Maqsood, M.; Bibi, S.; Ahmad, B.; Ahmad, S.; Nasir, M.; Mushtaq, Z.; Shahadat Khan, R. Evaluating the Effect of Animal-Based Iron Sources on Iron Deficiency Anemia. Pak. Biomed. J. 2023, 5, 29–33. [Google Scholar] [CrossRef]
Karmaker, A.; Das, P.C.; Iqbal, A. Quality assessment of different commercial and local milk available in the local markets of selected area of Bangladesh. J. Adv. Vet. Anim. Res. 2020, 7, 26–33. [Google Scholar] [CrossRef] [PubMed]
Drekalovic, N. Raw Milk Quality Monitoring System. In Proceedings of the 2021 10th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro, 7–10 June 2021. [Google Scholar] [CrossRef]
Diwaker, C.; Tomar, P.; Solanki, A.; Nayyar, A.; Jhanjhi, N.Z.; Abdullah, A.; Supramaniam, M. A New Model for Predicting Component Based Software Reliability Using Soft Computing. IEEE Access 2019, 7, 147191–147203. [Google Scholar] [CrossRef]
Mu, F.; Gu, Y.; Zhang, J.; Zhang, L. Milk source identification and milk quality estimation using an electronic nose and machine learning techniques. Sensors 2020, 20, 4238. [Google Scholar] [CrossRef] [PubMed]
Airehrour, D.; Gutierrez, J.; Kumar Ray, S. GradeTrust: A secure trust based routing protocol for MANETs. In Proceedings of the 2015 International Telecommunication Networks and Applications Conference (ITNAC), Sydney, Australia, 18–20 November 2015; pp. 65–70. [Google Scholar] [CrossRef]
Yalew, K.; Pang, X.; Huang, S.; Zhang, S.; Yang, X.; Xie, N.; Wang, Y.; Lv, J.; Li, X. Recent Development in Detection and Control of Psychrotrophic Bacteria in Dairy Production: Ensuring Milk Quality. Foods 2024, 13, 2908. [Google Scholar] [CrossRef] [PubMed]
Sim, M.Y.M.; Shya, T.J.; Ahmad, M.N.; Shakaff, A.Y.M.; Othman, A.R.; Hitam, M.S. Monitoring of milk quality with disposable taste sensor. Sensors 2003, 3, 340–349. [Google Scholar] [CrossRef]
Chaudhari, B.; Deshmukh, N.; Bagul, K.; Chaudhari, D.; Yeole, R. Milk Purity Detection System. YMER 2024, 23, 1179–1188. [Google Scholar]
Lim, M.; Abdullah, A.; Jhanjhi, N.; Khurram Khan, M.; Supramaniam, M. Link prediction in time-evolving criminal network with deep reinforcement learning technique. IEEE Access 2019, 7, 184797–184807. [Google Scholar] [CrossRef]

Figure 1. Model display.

Table 1. Confusion matrix.

	Low	Medium	High
Low	0	0	0
Medium	134	99.14%	0
High	0	0	100%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shahzad, A.; Javaid, S.; Alamsyah, Z. Milk Quality Detection Using Machine Learning. Eng. Proc. 2025, 107, 119. https://doi.org/10.3390/engproc2025107119

AMA Style

Shahzad A, Javaid S, Alamsyah Z. Milk Quality Detection Using Machine Learning. Engineering Proceedings. 2025; 107(1):119. https://doi.org/10.3390/engproc2025107119

Chicago/Turabian Style

Shahzad, Atif, Sabeen Javaid, and Zaenal Alamsyah. 2025. "Milk Quality Detection Using Machine Learning" Engineering Proceedings 107, no. 1: 119. https://doi.org/10.3390/engproc2025107119

APA Style

Shahzad, A., Javaid, S., & Alamsyah, Z. (2025). Milk Quality Detection Using Machine Learning. Engineering Proceedings, 107(1), 119. https://doi.org/10.3390/engproc2025107119

Article Menu

Milk Quality Detection Using Machine Learning^†

Abstract

1. Introduction

2. Literature Review

3. Methodology

Dataset

4. Result

Accuracy of Machine Learning Models

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Milk Quality Detection Using Machine Learning †

Abstract

1. Introduction

2. Literature Review

3. Methodology

Dataset

4. Result

Accuracy of Machine Learning Models

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Milk Quality Detection Using Machine Learning^†