1. Introduction
The increasing global demand for high-quality agricultural produce has necessitated the adoption of innovative methods to ensure product integrity, reduce waste, and optimize supply chains. Mangoes, recognized as the “king of fruits,” hold a prominent position in the global fruit market due to their rich flavor, nutritional benefits, and economic value. However, maintaining mango quality throughout the supply chain remains a significant challenge, primarily because of their perishable nature and sensitivity to environmental conditions. Traditional methods for assessing mango quality rely heavily on destructive testing techniques, including physical slicing, chemical analysis, and visual inspection. These methods, while accurate, are not scalable for large-scale operations and often lead to the significant wastage of produce. Additionally, manual inspections are prone to human error and subjectivity, resulting in inconsistencies in quality grading. This necessitates the development of non-destructive, objective, and automated systems for fruit quality assessment. Combining machine learning (ML) with sensor technology has become a viable way to tackle these issues in recent years. Nondestructive sensors, such as electronic noses (e-nose) for detecting sweetness, pH meters for measuring acidity, and color sensors for analyzing visual attributes, provide a wealth of data that can be leveraged to predict fruit quality accurately. Machine learning models have demonstrated their ability to process this data, uncover hidden patterns, and classify fruits based on quality attributes without causing physical damage. This research focuses on utilizing machine learning approaches to predict mango quality using sensor-based data. A dataset containing features such as sweetness, acidity, ripeness, sweetness, and juiciness was utilized to train and evaluate multiple machine learning models, Decision Tree, K-Nearest Neighbors (KNN), Naive Bayes and an Automated Machine Learning (AutoMLP). Among these models, AutoMLP achieved a superior accuracy of 98.46%, demonstrating how well it handles intricate, non-linear relationships in the data.
2. Literature Review
The quality assessment of fruits, particularly mangoes, has traditionally relied on destructive testing methods, including chemical analysis and physical sampling. However, these methods are labor intensive, time consuming and unsuitable for large-scale applications. This has led to a growing interest in non-destructive techniques, which aim to preserve the fruit’s integrity while providing accurate quality measurements. Ref. [
1] overcame the drawbacks of conventional HPLC techniques by using DRIFT-FTIR spectroscopy to forecast fructose and glucose concentrations in Kenyan mango cultivars. With an increasing sugar content in fruits exposed to the sun, the study demonstrated significant predictive performance; however, that for sucrose and maltose was less accurate.
Paiva-Peredo and associates used near-infrared spectroscopy in 2023 to evaluate the dry matter of mangoes using conventional methods. The effectiveness of the best model, PLSR with MSC pre-processing, is demonstrated by its RMSE of 1.6142 percent DM [
2]. In 2022, Capela and colleagues overcame the shortcomings of current molecular theories by using deep learning to predict sweetness from compound chemical structures. Using the largest library of bitter and sweet substances, they created deep learning models that improved accuracy and scalability by discovering 67,724 possible sweeteners from PubChem [
3]. Srisungsittisunti [
4] used near-infrared spectroscopy to predict Brix values in mangoes, improving accuracy through forward feature selection and ensemble models. The study used 120-day data for training and highlights the advantages of ripe fruit data. The 2013 study by Pornprasit and Natwichai shows that reliable and scalable models are required for utilizing near-infrared spectroscopy to forecast the quality of mango fruit. They provide an ensemble classification method that uses weighted sub-classifiers to achieve better scalability and accuracy in dynamic agricultural settings [
4]. Ref. [
5] developed a hybrid system for the classification of mango ripeness, combining image processing and odor sensing. The system achieved 94.69% accuracy, outperforming standalone techniques and significantly improving reliability and scalability for diverse agricultural applications, outperforming previous methods. Ref. [
6] developed a non-destructive method for detecting TSS in mangoes using machine learning and reflectance spectroscopy. They compared transformations and preprocessing techniques to determine which model performed better than PCR and to demonstrate the superiority of specific preprocessing techniques. Ref. [
7] used near-infrared spectroscopy and regression modeling to assess mango quality. With the lowest error rates and the highest R
2 values, they found that MLPR was more accurate at predicting pH and TSS, proving its dependability for accurate, long-term assessments of agriculture quality. Multiple studies highlight the effectiveness of non-destructive approaches like NIR spectroscopy and machine learning for assessing mango quality parameters, such as TSS, pH and ripeness. Handheld spectrometers coupled with advanced algorithms have shown promising results for rapid and accurate evaluation [
8]. Machine learning models like SVM, Random Forest and FANN have been increasingly adopted for mango grading. These methods provide significant accuracy improvements over traditional classification, addressing inconsistencies and inefficiencies [
9]. Technologies such as low-cost multispectral sensors and portable spectrometers are paving the way for practical applications in agriculture. These technologies offer capable solutions for grading and sorting fruit quality with low costs [
10].
3. Methodology
The proposed methodology for this research study entails using machine learning (ML) classifiers inside an integrated framework to detect mango quality.
3.1. Models
3.1.1. K-Nearest Neighbors
A simple instance-based learning method for classification and regression problems is K-Nearest Neighbors (KNN). Using the average value or majority class of its closest neighbors, it makes an educated prediction as to the class or value of a new data point. The number of neighbors to consider is indicated by the “K” in KNN. KNN compares the distance between new and existing features to generate a prediction. The KNN ruler assists you in determining the ideal location for the new feature, allowing you to put it where it will blend in the most, usually using Euclidean distance.
3.1.2. Naïve Bayes
Naive Bayes is a machine learning technique that applies Bayes’ theorem under the “naive” assumption of feature independence. It is frequently used for categorization, especially with high-dimensional data. The technique computes the probability of a data point belonging to each class based on the likelihood of its features falling into that class, provided that features are conditionally independent.
3.1.3. Decision Tree
One supervised machine learning technique for classification and regression is the decision tree. Each internal node represents a test on an attribute, each branch shows the test result, and each leaf node indicates a class name or numerical value. This tree structure illustrates decisions. Because they are straightforward and aesthetically pleasing, decision trees are helpful for comprehending decision-making processes. Entropy must be calculated before building a decision tree. This is achieved by examining the distribution of class labels to determine the dataset’s level of uncertainty.
3.1.4. AutoMLP
The term “AutoMLP” describes a method for automatically creating and training Multi-Layer Perceptron’s (MLPs) for machine learning applications. It blends the strength of MLP with architectures that use automated procedures to maximize performance, saving manual involvement in model configuration and hyperparameter adjustment.
3.1.5. Neural Network
A neural network is a computer model that draws inspiration from the composition and operations of the human brain. In order to solve different issues like classification, regression, and clustering, it is made up of interconnected layers of nodes (neurons) that process input and extract patterns. A key element of deep learning is neural networks, which give computers the ability to recognize intricate patterns in data.
4. Framework
A machine learning framework is an interface that enables developers to design and deploy machine learning models more rapidly and easily. It usually consists of a sequence of processes that start with data collection and preprocessing via model selection and training and concluding with model evaluation and deployment. The framework for this study involves several key steps for developing a mango quality prediction system using machine learning models. The first step was dataset preparation, where mango samples were collected and categorized as Fresh, Ripe or Rotten based on expert evaluations. Sensor data, including e-nose readings for aroma and pH readings for acidity, along with physical attributes such as color, weight, and size, were recorded. The dataset was then preprocessed by cleaning noisy data, normalizing numerical features and creating derived features like a sweetness index and ripeness score. It was split into training (80%) and testing (20%) subsets to ensure balanced class representation. Four machine learning models were applied: Decision Tree provided interpretable results, Naive Bayes was efficient for mixed data and KNN captured local relationships. AutoMLP, a deep learning model with automated hyperparameter tuning, achieved the highest accuracy of 98.46%, outperforming other models by effectively capturing complex patterns in the data. The models were assessed using metrics like accuracy, recall with precision, and F1-score. AutoMLP emerged as the best-performing model and was deployed as the core of a real-time mango quality prediction system. This system uses sensor data and physical attributes as input to classify mangoes non-destructively, making it suitable for agricultural and retail applications.
5. Dataset Description
The mango dataset used in this study is a structured and comprehensive collection designed to predict mango quality accurately and non-destructively. It contains 8000 samples, each characterized by seven numerical features and one categorical label. The dataset includes the physical and chemical attributes critical for determining mango quality. Key features include MangoSize, MangoWeight and MangoSoftness, which provide insights into the physical properties of the fruit. MangoSweetness and MangoAcidity capture the fruit’s chemical composition, offering a direct correlation with taste and freshness. MangoHarvestTime and MangoRipeness are additional indicators that assess the fruit’s maturity and readiness for consumption. The target variable MangoQuality categorizes the mangoes into quality grades such as A-Grade, serving as the benchmark for model predictions. The dataset is well-prepared and has no missing values, ensuring consistency and reliability for machine learning applications. The numerical features are normalized, which enhances the performance of the predictive models by ensuring that all features contribute proportionately to the output. This dataset is an ideal resource for training and evaluating machine learning algorithms due to its rich combination of physical, chemical, and categorical data. It supports the development of a robust, non-destructive mango quality prediction system that can be applied in agricultural and retail settings for efficient quality control.
5.1. Retrieve Mango Data
This is the initial step where the mango dataset is imported into the system. The dataset contains key features such as size, weight, sweetness, softness, acidity, ripeness, and quality labels. The input data is loaded for preprocessing and model training.
5.2. Replace Missing Values
Any incomplete or missing values in the dataset are dealt with in this stage. This phase protects data integrity by substituting suitable values (such as the column’s mean, median, or mode) for missing entries, which might impair the performance of machine learning models.
5.3. Split Data
Subsets of the dataset are used for testing and training. To guarantee that there is sufficient data for model training and to set aside some for assessing the model’s performance on unseen data, the split is usually carried out in an 80–20 or 70–30 ratio.
5.4. Training
This is the main training model for machine learning. The optimal architecture and hyperparameters, including the number of layers, neurons, activation functions, and learning rate, are automatically chosen by the neural network-based method known as AutoMLP (Automated Multi-Layer Perceptron). AutoMLP learns patterns and correlations between input characteristics and the target labels by using the training data from the previous stage.
5.5. Apply Model
The AutoMLP model is applied to the testing data after it has been trained. This stage generates the projected labels by using the training model to forecast mango quality for the unseen testing subset.
5.6. Performance Evaluation
The model’s performance is assessed in the last stage as shown in
Figure 1. The model’s ability to predict mango quality is evaluated using metrics including accuracy, precision, recall, F1-score, and confusion matrix. This phase offers a numerical assessment of the model’s efficacy and identifies areas for improvement.
6. Results
This research introduces an innovative, non-destructive approach to assessing mango quality using machine learning and advanced sensor technologies. By leveraging e-nose sensors to measure sweetness and aroma, color sensors to detect surface hues, and pH sensors to evaluate acidity, the study extracts the critical features that reflect mango freshness, ripeness, and overall quality.
Figure 2 shows that the dataset was analyzed using four machine learning models such as Decision Tree, KNN, Naive Bayes and AutoMLP. Among these, AutoMLP demonstrated superior performance, with an impressive accuracy of 98.46%, underscoring its ability to handle complex feature interactions effectively. The proposed system not only predicts the overall quality of mangoes but also identifies and highlights rotten areas, making it a comprehensive solution for fruit quality assessment. This non-invasive technique ensures the integrity of the fruit, offering significant advantages over traditional destructive methods. With potential applications in agriculture, food processing, and supply chain management, this framework paves the way for sustainable and precise quality evaluation. Future advancements could include integrating additional sensor technologies, such as hyper-spectral imaging, and exploring ensemble and transfer learning models to further enhance performance and adaptability.
7. Conclusions
In conclusion, this research successfully demonstrates a non-destructive method for assessing mango quality using machine learning and sensor-based technologies. By analyzing features such as sweetness, color, and acidity, the system accurately evaluates mango freshness, ripeness, and overall quality while identifying and marking rotten areas. Among the models applied, AutoMLP achieved the highest accuracy of 98.46%, highlighting its robustness in handling complex data. This approach offers a practical, efficient, and sustainable solution for agricultural and food industry applications, eliminating the need for invasive testing. For future work, the integration of advanced sensors like hyperspectral imaging, the exploration of ensemble or transfer learning techniques, and the extension of the framework to other fruits and agricultural products are recommended. Additionally, the development of a real-time mobile or web-based application, coupled with IoT-enabled devices, could facilitate automated and scalable quality monitoring across the supply chain, paving the way for enhanced precision agriculture.