Comparison of Vertex AI and Convolutional Neural Networks for Automatic Waste Sorting

Ortiz-Mata, Jhonny Darwin; Oleas-Vélez, Xiomara Jael; Valencia-Castillo, Norma Alexandra; Villamar-Aveiga, Mónica del Rocío; Dáger-López, David Elías

doi:10.3390/su17041481

Open AccessArticle

Comparison of Vertex AI and Convolutional Neural Networks for Automatic Waste Sorting

by

Jhonny Darwin Ortiz-Mata

^*

,

Xiomara Jael Oleas-Vélez

,

Norma Alexandra Valencia-Castillo

,

Mónica del Rocío Villamar-Aveiga

and

David Elías Dáger-López

Facultad de Ciencias e Ingeniería, Universidad Estatal de Milagro (UNEMI), Milagro 091050, Ecuador

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(4), 1481; https://doi.org/10.3390/su17041481

Submission received: 12 December 2024 / Revised: 16 January 2025 / Accepted: 17 January 2025 / Published: 11 February 2025

(This article belongs to the Special Issue Sustainable Application of Artificial Intelligence and Machine Learning)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study discusses the optimization of municipal solid waste management through the implementation of automated waste sorting systems, comparing two advanced artificial intelligence methodologies: Vertex AI and convolutional neural network (CNN) architectures, developed using TensorFlow. Automated solid waste classification is presented as an innovative technological approach that leverages advanced algorithms to accurately identify and segregate materials, addressing the inherent limitations of conventional sorting methods, such as high labor dependency, inaccuracies in material separation, and constrained scalability for processing large waste volumes. A system was designed for the classification of paper, plastic, and metal waste, integrating an Arduino Uno microcontroller, a Raspberry Pi, a high-resolution camera, and a robotic manipulator. The system was evaluated based on performance metrics including classification accuracy, response time, scalability, and implementation cost. The findings revealed that Xception achieved a flawless classification accuracy of 100% with an average processing time of 0.25 s, whereas Vertex AI, with an accuracy of 90% and a response time of 2 s, exceled in cloud scalability, making it ideal for resource-constrained environments. The findings highlight Xception’s superiority in high-precision applications and Vertex AI’s adaptability in scenarios demanding flexible deployment, advancing efficient and sustainable waste management solutions.

Keywords:

automatic waste classification; Vertex AI; CNN; Xception; InceptionV3; ResNet50V2

1. Introduction

Urban solid waste management encounters substantial challenges in efficiency and sustainability, especially in densely populated areas where growing consumption amplifies waste generation. Addressing these challenges requires the implementation of novel solutions based on emerging technologies, such as artificial intelligence, to overcome the limitations of existing systems and ensure sustainable outcomes. Inefficiency in waste management not only leads to landfill saturation and reduces the recovery of recyclable materials but also exacerbates soil, water, and air pollution, adversely affecting ecosystems and public health. Furthermore, reliance on manual methods poses considerable risks to workers exposed to pathogens and toxic substances, highlighting the urgent need for automated, scalable, and safe sorting systems that optimize waste processing while minimizing occupational risks and environmental impacts [1,2].

Artificial intelligence (AI) and deep learning offer innovative and effective solutions to these challenges. By implementing advanced automatic sorting models, AI can enhance the accuracy of recyclable material identification, optimize processing times, and reduce operating costs. This technology reduces reliance on manual labor, minimizes exposure to hazardous materials, and improves both operator safety and the sustainability of the sorting process [3].

In this study, two main approaches to automatic waste sorting are considered; the first utilizes Google Cloud’s Vertex AI, a pre-trained cloud-based model, while the second relies on freely accessible convolutional neural network (CNN) architectures, including Xception, DenseNet121, InceptionV3, ResNet50V2, MobileNetV2, and MobileNetV3. Among these architectures, the most suitable were selected for implementation in a prototype. Performance was evaluated using metrics such as precision, recall, F1 score, accuracy, and MCC, along with training curves, to ensure consistent and efficient learning.

The system’s implementation involved a specifically designed prototype comprising an Arduino card, a Raspberry Pi, a camera for image capture, a robotic arm, and deposit containers for paper, plastic, and metal. This configuration enabled real-time classification directly at the collection point.

Automatic systems based on artificial intelligence, such as Vertex AI and convolutional neural network (CNN) architectures, overcome the limitations of manual methods by improving accuracy, reducing costs, and minimizing occupational and environmental risks. This study compares the performance of Vertex AI with that of a system based on selected CNN architectures, evaluating key metrics to determine the most efficient and cost-effective solution, providing a solid foundation for implementing sustainable waste classification solutions.

2. Literature Review

Recent studies on automatic waste sorting with convolutional neural networks (CNNs) emphasize their potential to improve waste management accuracy and streamline recycling processes. Alsabei et al. [4] investigated pre-trained CNN architectures such as ResNet50, VGG16, InceptionV3, and Xception in combination with generative adversarial networks (GANs), achieving significant improvements in accuracy when classifying various categories of waste. Similarly, Tasnim et al. [5] applied advanced models such as YOLOv5 and EfficientDet, achieving 97% accuracy in textile waste classification, reaffirming the potential of these architectures for improving waste management. Kang et al. [6], on the other hand, optimized the ResNet-34 architecture and integrated it into a smart bin system controlled by a Raspberry Pi, achieving 99.6% accuracy in classifying 14 types of waste, thus demonstrating the effectiveness of these systems in urban environments.

In this context, Meng et al. [7] developed the X-DenseNet model, combining Xception and DenseNet architectures, achieving 94.1% accuracy in classifying six categories of waste. Similarly, Shi et al. [8] optimized the Xception architecture through branch expansion, reaching 94.34% accuracy on limited datasets such as TrashNet. These studies highlight the ability of CNNs to effectively address sorting challenges in scenarios with limited access to data, which are common in urban waste management.

Transfer learning has proven to be a crucial technique for improving model accuracy in data-constrained environments. Poudel and Poudyal [9] implemented architectures such as InceptionV3, ResNet50, and DenseNet201, achieving 95.05% accuracy in classifying biodegradable and non-biodegradable waste. Kaya [10] used optimized variants of Xception and MobileNet, achieving 89.72% accuracy in sorting six types of waste, while Rayhan and Rifai [11] employed pre-trained models including MobileNetV2 and DenseNet121, achieving 95.2% accuracy in sorting e-waste and plastics. These results underscore the effectiveness of transfer learning in scenarios with limited or unbalanced data, facilitating the development of robust models for various waste categories.

In addition to CNN architectures, integrating technologies such as the Internet of Things (IoT) has enabled scalable and efficient solutions for automating waste management. Sallang et al. [12] developed a system based on TensorFlow Lite and LoRa-GPS, using ultrasonic sensors and a Raspberry Pi to monitor real-time waste levels, achieving 80% accuracy. Huang [13] designed a CNN-controlled robotic system that achieved 96.9% accuracy in waste sorting, while Liang and Gumabay [14] implemented transfer learning in a Raspberry Pi-based system, achieving 93.85% accuracy. Mookkaiah et al. [15] developed an IoT system with ResNet V2 to sort biodegradable and non-biodegradable waste, achieving 94.44% accuracy. Additionally, Al-Shareeda et al. [16] have presented the SmartyBin prototype, which uses an ESP32CAM camera and a CNN to classify waste in real time, achieving 98% accuracy. These works underscore the importance of integrating AI and accessible hardware to develop efficient and cost-effective waste sorting systems [17].

The robustness of sorting systems has been enhanced using ensemble models and continuous learning. Huynh et al. [18] combined ResNet101 and EfficientNet, achieving 94.11% accuracy in classifying solid waste. Chatterjee et al. [19] developed the IncepX-Ensemble model, combining InceptionV3 and Xception, reaching 99.76% accuracy in classifying recyclable plastic bottles. Furthermore, Puig and Foukia [20] presented the CleverTrash system, which used continuous learning to update CNN models with new data, achieving 91.8% accuracy. These studies reinforce the importance of constantly updating models to maintain high levels of accuracy in classification tasks.

In terms of advanced approaches, combining visual and acoustic data has shown promising results. Lu et al. [21] combined both modalities in a classification system, achieving 95.78% accuracy in municipal solid waste sorting, while Gondal et al. [22] developed a hybrid model that combined multilayer perceptrons and CNN, reaching 99% accuracy in classifying metallic and non-metallic waste. These multimodal approaches highlight the relevance of using multiple data sources to improve accuracy in sorting complex waste.

Finally, the combination of transformers and CNNs has emerged as an effective solution for waste sorting in scenarios involving high similarity between classes. Alrayes et al. [23] proposed a hybrid model based on a Vision Transformer and a CNN, achieving 94.7% accuracy and outperforming traditional architectures such as ResNet-50 and VGG-19. This hybrid approach has proven its ability to improve stability and accuracy in automatic waste sorting in complex environments.

In summary, recent studies emphasize the potential of architectures such as ResNet, DenseNet, and Xception, along with transfer learning techniques and the integration of IoT technologies and accessible hardware like the Raspberry Pi, to improve the accuracy and efficiency of automatic waste classification. Furthermore, hybrid approaches and optimization techniques have enabled the classification of a broader range of waste types, significantly contributing to the sustainability and automation of waste management in urban and rural environments. However, it is important to clarify that while these techniques can classify various types of solid materials, such as different kinds of paper, plastics, and metals, the models must be specifically programmed and supported by datasets containing sufficiently representative images of those materials. In the present case, the dataset comprised images of solid, dry waste.

3. Smart Waste Management System

3.1. Design Prototype

This system consisted of a robotic arm responsible for transferring sorted waste to the corresponding bins, and a sorting area equipped with a Raspberry Pi camera to captures high-resolution images of the waste. For real-time processing and decision-making, the prototype included a Raspberry Pi 4 and an external computer to handle intensive data analysis, enabling the system to operate in an agile and precise manner. An Arduino Uno coordinated the sensors and the robotic arm’s movement, ensuring debris was accurately sorted and deposited according to the detected material type.

The system includes three repositories specifically assigned for the main waste types: plastic, paper, and metal. Using artificial intelligence algorithms, the system identifies the type of material in real time and directs the robotic arm to the appropriate bin. This modular configuration enables rapid waste sorting and facilitates future expansions or adjustments for new sorting needs, ensuring the system’s scalability. The prototype design was optimized for integration with advanced AI technologies, ensuring a more efficient automated sorting process and contributing to improved accuracy, speed, and efficiency in solid waste management.

3.2. System Architecture

The purpose of this study was to evaluate and compare the performance of the Vertex AI pretrained model with various convolutional neural network (CNN) models developed in TensorFlow. Figure 1 presents the general scheme of the automatic waste sorting system, highlighting the hardware components on the left and the software elements on the right.

Vertex AI, a commercial platform featuring pre-trained models, served as the core engine of the system. Its capabilities include training and running machine learning models in the cloud, efficiently processing large volumes of data, and accurate real-time classification. This platform also makes it easy to manage large data sets and offers scalability options for industrial applications.

Meanwhile, open source tools TensorFlow and OpenCV were used to develop and train custom CNN models for debris classification. The performance comparison between these approaches considered key metrics such as accuracy (waste classification correctness), efficiency (processing speed and resource utilization), and response time (latency in real-time operations). These metrics provided a solid framework to determine the most suitable model by analyzing the results obtained during the training and validation phases.

3.3. Prototype Hardware Schema

The main components are detailed in Table 1, which lists the elements used in the system, including an Arduino-controlled robotic arm with four servo motors, a Raspberry Pi 4 connected to a Raspberry Pi-specific camera, a webcam, and a laptop. This set of devices enables the system to operate in two different modes depending on the hardware and software configuration.

In the first mode of operation, the Raspberry Pi and its camera connect to the laptop, utilizing Vertex AI to capture images of the debris, process them, and send commands to the Arduino to sort the materials into plastic, paper, or metal bins. This configuration harnesses Vertex AI’s cloud computing capabilities, enhancing real-time accuracy and speed.

In the second mode, when the webcam and laptop are connected directly to the Arduino, the system employs convolutional neural networks (CNNs) using TensorFlow and OpenCV for image processing and local sorting of debris. This setup allows the system to function without relying on cloud services, using local processing to identify and classify materials.

The detailed functions of each component are outlined in Table 1. The Raspberry Pi serves as the central processing unit in both modes, managing image capture and classification through deep learning models, while the Arduino controls the servo motors that operate the robotic arm, enabling the physical sorting of debris into designated compartments. The evaluation of the classification model is based on metrics such as accuracy and response time. In addition, the hardware elements required to implement each model were considered and are listed in Table 1. The efficiency of the model was evaluated by testing with different types of waste, allowing these metrics to be compared.

3.4. Software Components

For the training and implementation of convolutional neural network (CNN) architectures in OpenCV and TensorFlow, various software resources and tools were utilized to facilitate image preprocessing, model development, execution in accelerated environments, and advanced visualization of results. These resources were essential for optimizing the accuracy and performance of the models, preventing overfitting, and ensuring a thorough evaluation. The software components used are detailed below.

3.4.1. Python Libraries

Several Python libraries were employed for data manipulation and performance evaluation. NumPy enabled efficient array handling, while Pandas managed tabular data structures such as DataFrames. Scikit-learn provided performance metrics and model evaluation methods, while Matplotlib and Seaborn enabled advanced visualization of results, including precision–recall curves and confusion matrices, which are essential for interpreting model behavior [24].

3.4.2. Google Colab

Google Colab served as the cloud platform for training the deep learning models, leveraging hardware acceleration with TPUs. Datasets were loaded and scripts executed in Python, integrating all necessary Keras and TensorFlow libraries for defining, training, and evaluating the waste classification models. The trained models were stored in .h5 format using Keras, allowing their reuse in different project phases and enabling implementation in other environments without the need for retraining [5,24].

3.4.3. Visual Studio Code

Visual Studio Code was utilized as the development environment for creating and editing Python scripts. This editing environment was used for the complete development workflow, including data loading, neural network construction, and local result visualization, providing detailed control over the code.

3.5. Methodology

The training, validation, and evaluation workflow of the model followed a structured approach, dividing the dataset into three subsets: 70% for training and 15% each for validation and testing (see Figure 2). This segmentation optimized the model’s fit and generalizability.

The process took place in two environments: Google Colab and Visual Studio Code. Google Colab facilitated the training phase by utilizing 70% of the dataset and leveraging hardware acceleration to optimize model performance [10].

Visual Studio Code handled the validation and testing phases, using the remaining 15% of the dataset to evaluate the model’s generalization and detect overfitting. This workflow ensured a thorough assessment of performance prior to final implementation.

The workflow involved model training through hyperparameter tuning and data augmentation to enhance generalization. After training, the model was validated and adjusted based on validation set results, followed by a final evaluation. This assessment applied key metrics such as accuracy, loss, recall, F1 score, and Matthews correlation coefficient (MCC), providing a comprehensive performance analysis.

The choice to use 70% of the data for training and 15% each for validation and testing was based on established best practices in machine learning. Training with 70% of the data ensured that the model has sufficient examples to learn without overfitting. Although 30% is typically allocated for validation, in this case, a test folder was added and divided into 15% each to make the classification more robust. This balance helped evaluate the model’s ability to generalize to unseen data. Factors influencing effectiveness include the size and diversity of the dataset, which allow the model to generalize better, reducing the risk of overfitting and underfitting. Another factor that prevents biases toward certain classes is the consistent representation of data across the training, validation, and testing sets.

3.6. Dataset Description

The dataset included three main categories of waste: metal, paper, and plastic. At the top was metal waste, in the middle were various types of paper, and at the bottom were plastic bottles and containers. This set of images was collected from different datasets hosted in the Roboflow community repository, including “RecycleSorter” [25], used for paper sorting; “Bottle Defect Detection Dataset” [26], used for plastic bottle identification; and “Detect Can Dataset” [27], applied for metal can detection. These datasets have been used to train artificial intelligence models with the aim of automating the waste sorting process, optimizing accuracy and efficiency in solid waste management.

Furthermore, to ensure that the model maintained a high level of accuracy and efficiency, even in the face of variations in the physical characteristics of the waste, such as its wet or dry state, thickness, and area, specific strategies were implemented during preparation and training. These strategies included the selection of representative images covering different physical conditions of the waste, the use of data augmentation techniques to simulate variations in texture, lighting, and orientation, and the validation of the models in real-life scenarios reflecting practical situations. These measures ensure that the model is robust and capable of classifying waste with high quality, regardless of the conditions in which it is found.

The dataset contained a total of 5700 images distributed as follows:

Paper (1900 images): including various types of paper in different states and sizes, such as crumpled papers, printed sheets, blank documents, cutouts, and folded or extended fragments [25].

Plastic (1900 images): consisting mainly of plastic bottles of different sizes and shapes, presented in various conditions (full, empty, crushed, or intact) [26].

Metal (1900 images): cans of various sizes and brands, including beverage cans and canned foods, in varied states such as crushed, opened, or intact, and in different positions and conditions [27].

3.7. Dataset Preprocessing

Data preprocessing is critical for enabling convolutional neural network (CNN) models to generalize effectively and deliver optimal performance. Preprocessing techniques enhance model performance in image classification tasks by enabling greater adaptation to variations in input data.

In this project, three key preprocessing techniques were employed: resizing, normalization, and data augmentation.

3.7.1. Dimension

The models used required specific input sizes. For models such as Xception, InceptionV3, and ResNet50V2, the images were resized to 299 × 299 pixels, while for DenseNet121, MobileNetV2, and MobileNetV3, they were adjusted to 224 × 224 pixels using bicubic interpolation, ensuring uniformity and compatibility with the pre-trained structures [4].

3.7.2. Normalization

To facilitate faster and more efficient convergence, pixel-level normalization was applied by scaling values to the range [0, 1]. This setting optimized the image processing and improved training stability.

3.7.3. Data Augmentation

Transformations such as angular shift, zoom, horizontal flip, rotation, and shifts in the X and Y axes were implemented. These techniques increased the variability of the training set, enabling the model to recognize patterns in different orientations and positions, thereby reducing the risk of overfitting and improving generalization to unseen data [4].

3.8. Fine-Tuning

The fine-tuning technique is fundamental in deep learning for adapting pre-trained models to specific tasks. Instead of training a model from scratch, a previously trained model on a large dataset, such as ImageNet, is reused and adjusted for a new task—in this case, waste classification [4].

For fine-tuning in this project, the weights of all but the last 15 layers were frozen, while the unfrozen layers were specifically adjusted for the waste sorting task. These layers were trained using the RMSprop optimizer with a learning rate of 0.01 or 0.02, depending on the model, and a categorical cross-entropy loss function. This approach preserved the general knowledge gained from ImageNet while adapting the final layers to the characteristics of the new dataset.

3.9. Important Parameters

The parameters mentioned above were not directly used to evaluate the solid waste classification model; instead, they played a crucial role in controlling the model’s processing during training and optimization. These factors were designed to influence how the model learns to classify data and to improve its performance.

3.9.1. Optimizers

Adam, RMSprop, and Adamax optimizers are highly efficient for adjusting weights in deep learning systems [28]. Adam combines gradient descent with first- and second-order moments, dynamically adjusting learning rates [29]. RMSprop adapts a learning rate based on the average of recent square derivatives, effective for noisy data and recurrent networks. Adamax, a variant of Adam based on the infinity norm, is ideal for datasets with large values, ensuring stable convergence.

3.9.2. Categorical Cross-Entropy

Categorical variables were used to classify data into different classes, in this case, three specific categories in the input dataset.

3.9.3. Epochs

The epoch value refers to the number of complete cycles over the dataset during training. A higher number of epochs can improve performance, although excessive epochs may lead to overfitting [16].

3.9.4. Learning Rate

The learning rate controls the magnitude of weight adjustments during training. An appropriate value balances the speed of convergence with optimization accuracy [16,20].

3.9.5. Batch Size

Batch size refers to the number of samples processed before the model parameters are updated in each iteration, determining the number of images used before each weight adjustment [16]. These parameters are factors that control the behavior of the model and, although they are not evaluation metrics by themselves, they have a significant impact on the results by optimizing the performance of the model during processing (see Table 2).

3.10. Transfer Learning

Transfer learning is a technique in which a model previously trained on one task is reused to improve performance on a new, related task. In the context of waste classification, this technique is particularly valuable as it enables the use of convolutional neural network (CNN) models previously trained on massive datasets, such as ImageNet. Models such as Xception, InceptionV3, MobileNetV2, DenseNet121, ResNet50V2, and MobileNetV3 are fine-tuned with specific images of plastic, metal, and paper waste, leveraging the general characteristics of shapes and textures already captured during initial training [30].

This approach optimizes the waste sorting process by reducing both training time and the computational resources required. Instead of starting training from scratch, the models already possess a solid foundation of knowledge, improving accuracy and generalizability when applied to a smaller, more specialized dataset [11].

3.11. Vertex AI

Vertex AI is a Google cloud platform that facilitates the development and deployment of machine learning models. In this project, Vertex AI was used to train a model intended for classifying images into three categories: paper, plastic, and metal. The platform automatically tuned the model’s architecture and hyperparameters, in contrast to tools like TensorFlow, which provide more fine-grained control over internal configuration [29].

Vertex AI’s capabilities include image classification and object detection, making it adaptable to a variety of machine learning tasks. The trained model is integrated into external applications via an endpoint API, making it easy to deploy. However, as a paid service, there are associated costs, such as USD 3.46 per hour of training and USD 18 for model creation.

The data were organized into training, validation, and test sets, with images labeled automatically and manually. Once training was complete, the model was linked to a functional endpoint, with an approximate deployment time of 10 min, using a processing node. Resource management in Vertex AI was carried out strategically, activating the API only when necessary, allowing efficient cost control. In addition, permissions managed through IAM and an API key in JSON format were used for model integration. It is worth mentioning that the platform requires an internet connection for its operation.

Regarding the efficiency of the models, it is relevant to consider that the effectiveness of Vertex AI and CNN architecture can vary depending on the type of waste and the frequency of its use. For example, some materials like plastics may require more training or processing cycles than others like metals or paper. Additionally, the performance of each model can be affected by factors such as the amount of data and the complexity of the classes.

Table 2 compares the characteristics of Vertex AI and CNN architectures across four aspects: training, performance, usage, and long-term savings.

3.12. CNN Architecture

Convolutional neural network (CNN) architectures such as Xception, DenseNet121, InceptionV3, ResNet50V2, MobileNetV2, and MobileNetV3 have demonstrated high effectiveness in image classification tasks, including waste sorting. Xception, based on depthwise separable convolutions, optimizes feature extraction to deliver efficient performance and high classification accuracy [11]. DenseNet121, by contrast, is distinguished by its dense layer connections, enhancing feature reuse and achieving high accuracy even with moderately sized datasets. This architecture facilitates seamless learning through the gradient descent algorithm, maintaining robust performance in tasks such as sorting waste and recyclable materials [20].

InceptionV3 and ResNet50V2 also bring unique strengths to complex classification tasks. InceptionV3 employs a modular structure that captures features at multiple scales, optimizing parameter use while ensuring computational efficiency. ResNet50V2 uses identity blocks and residual connections to addresses issues of gradient degradation in deep networks, enabling fast convergence and maintaining accuracy in intricate classification scenarios [11]. MobileNetV2 and MobileNetV3, renowned for their computational efficiency, are well-suited for resource-constrained applications. While they may have limitations in accuracy compared to more complex models, they excel in practical waste sorting applications, particularly where efficiency and low resource consumption are critical considerations [11].

3.13. Hyperparameter Specification

Several combinations of hyperparameters were evaluated to optimize the model configuration (see Table 3). The categorical_crossentropy loss function and Momentum 0 were used during the tests. For the learning rate, values of 0.001, 0.005, 0.01, 0.02, and 0.04 were tested. Among these, 0.02 provided the best balance between speed and accuracy, avoiding slow or unstable learning. Regarding batch size, batches of 32 and 64 were evaluated, with 64 proving to be the most suitable, as it effectively balanced training speed and stability when paired with a learning rate of 0.02.

Three optimizers were tested: Adam, RMSprop, and Adamax. RMSprop demonstrated the most consistent performance, enabling efficient convergence and minimizing loss, which helped reduce overfitting. Across all experiments, the categorical cross-entropy loss function was employed. This function, suitable for multiclass classification, ensured accurate categorization of paper, plastic, and metal.

The model was trained for 10 and 12 epochs, with 12 epochs at a learning rate of 0.02 identified as the most effective configuration for achieving optimal learning without overfitting. Regarding the pre-trained architectures evaluated (see Table 4), Xception, InceptionV3, MobileNetV2, ResNet50V2, DenseNet121, and MobileNetV3 were included.

4. Results and Discussion

This section presents a detailed analysis of the performance of the convolutional neural network models used for waste classification; first, the classification metrics used to evaluate the effectiveness of the models are described, including accuracy, recall, F1-score and Matthews correlation coefficient (MCC). The performance of the models in terms of accuracy evolution throughout training is then analyzed, providing an overview of their ability to generalize to test data.

4.1. Classification Metrics

The metrics employed in this study assessed different dimensions of model performance. Test Accuracy measured the overall percentage of correct predictions. Precision indicates the proportion of true positive predictions among all positive predictions, while Recall reflects the model’s ability to identify all actual positive cases. The F1 Score provides a balance between Precision and Recall, making it particularly useful for unbalanced datasets, and the Matthews correlation coefficient (MCC) considers all elements of the confusion matrix, ranging from −1 (complete disagreement) to 1 (perfect prediction), with 0 indicating a random result [9,22]. The mathematical expressions for these metrics are presented in Equations (1) to (4):

Precision = \frac{T P}{T P + F P}

(1)

Recall = \frac{T P}{T P + F N}

(2)

F 1_score = \frac{2 \times Precision \times Recall}{Precision + Recall}

(3)

Accuracy = \frac{T P + T N}{T P + F P + F N + T N}

(4)

where TP represents the true positives, FP is the number of false positives, TN the true negatives and FN the false negatives.

Additionally, the Matthews correlation coefficient (MCC) was calculated to assess the statistical reliability of the model using the confusion matrix, as shown in Equation (5) [15]:

MCC = \frac{(T P \times T N) - (F P \times F N)}{\sqrt{(T P + F P) \times (T P + F N) \times (T P + F P) \times (T P + F N)}}

(5)

4.2. Perfomance Analysis of CNN Models

Figure 3 illustrates the evolution of the accuracy of the different neural network models over 12 epochs, a key indicator of the percentage of correct predictions in relation to the total samples evaluated. Initially, all models experienced a significant increase in accuracy during the early epochs, reflecting an accelerated learning phase. This initial trend suggests that the models were able to capture the most distinctive features of the classes at an early stage of training. However, from epochs 4 to 6, the accuracy of most models stabilized, indicating that they had achieved an adequate representation of the training data with no apparent overfitting.

In terms of overall performance, the Xception and InceptionV3 (FT) models stood out, achieving over 98% accuracy in the final epochs. This demonstrates their exceptional ability to classify images accurately and consistently. ResNet50V2 (FT) also exhibited strong performance, nearing 98%, although it showed slight variations in the final epochs. These fluctuations may indicate some sensitivity to configuration changes or lower stability in convergence compared with Xception and InceptionV3. While training these models is free in terms of licensing, the time required for training varies significantly. Deeper architectures such as Xception and ResNet50V2 require more computational resources and longer training times compared with lighter models such as MobileNetV2, which train faster but achieve lower accuracy. Furthermore, during classification, Xception and InceptionV3 (FT) required slightly longer processing times than MobileNetV2 due to their higher complexity. These results indicate a trade-off between classification accuracy, training time, and processing speed, highlighting the importance of selecting a model based on the specific needs of the application.

In contrast, the DenseNet121 (FT) and MobileNetV2 models showed comparatively lower performance, stabilizing around 94% and 92%, respectively. This suggests that these architectures may be less suitable for this specific task or may have required further hyperparameter tuning to optimize their performance. Their inability to reach the accuracy levels of Xception and InceptionV3 highlights limitations in extracting the critical features necessary for accurate classification in this dataset.

Figure 4 illustrates the evolution of loss across the different classification models over 12 epochs, reflecting the quality of each model’s fit to the correct labels. Loss is a crucial indicator of learning quality, with its reduction signifying progress toward improved predictions. In this case, the loss function used is categorical cross-entropy, which is suitable for multiclass classification problems.

Across the first two epochs, all models exhibited a notable reduction in loss, indicating that the initial parameters quickly adjusted to the dataset’s fundamental characteristics. This sharp decline suggests that the optimization process effectively captured the key relationships between features and target labels early on, rapidly improving the learning performance.

As training progressed, from the third epoch onward, the models reached a stabilization point where the loss decreased to values close to zero and remained constant. This indicates that the models achieved stable convergence, fitting the training data well without showing signs of overfitting, as the loss values did not fluctuate significantly in the final epochs.

Notably, although some models began with slightly higher initial loss levels, they all converged toward similar loss values by the end of training. The stability observed in the final epochs indicates that hyperparameter tuning and optimization techniques, such as fine-tuning, were applied effectively. These adjustments maximized the models’ predictive capacity while avoiding oscillation or divergence issues.

4.3. Trainig Results

Table 5 presents a detailed evaluation of the performance of the different neural network models when using the RMSprop optimizer with variations in the hyperparameters: learning rate, epochs, and batch size.

The results obtained in the evaluation of different neural network models under specific hyperparameter configurations revealed significant differences in performance. Using the RMSprop optimizer, the Xception, DenseNet121, and MobileNetV2 models performed favorably with a learning rate of 0.02, achieving high levels of accuracy in validation and testing. Xception stood out with a validation accuracy of 98.78% and a test accuracy of 98.12%, along with efficient training times (2820 s). DenseNet121 also demonstrated excellent performance, achieving a test accuracy of 96.60% with a training time of 823 s, highlighting a strong balance between accuracy and computational efficiency.

MobileNetV2, while slightly less accurate than Xception and DenseNet121, showed solid validation and test performance with a learning rate of 0.02, achieving an adequate balance between accuracy and processing times. Although effective for classification tasks, its accuracy remained lower compared with Xception and DenseNet121.

The InceptionV3 and ResNet50V2 models achieved their best results with a learning rate of 0.01, recording validation accuracies above 97% and shorter training times, indicating that both models converged efficiently with a lower learning rate. In contrast, MobileNetV3 exhibited the most limited performance, achieving its maximum validation accuracy with a learning rate of 0.005, but with significantly lower results compared with the other architectures. This suggests considerable room for improvement for application in this context.

The difference in accuracy between the Xception and ResNet50V2 models can be attributed to ResNet50V2’s inability to fully adapt to classification with high efficiency. Factors such as the nature of the data may introduce variations that ResNet50V2 does not handle so well as Xception, and despite its archiutecture having more parameters (23.6 M vs. 20.8 M), it is not so efficient at extracting relevant features. Therefore, the difference was not due to the training speed, but to the model’s ability to efficiently generalize over the evaluation and test data.

The accuracy comparison between image classification models highlighted the outstanding performance of Xception, which achieved a perfect accuracy of 1.0. Figure 5 lists the rates of correct predictions in all samples in the test suite. InceptionV3 also showed excellent performance with fine-tuning (FT), no data augmentation, and data augmentation (DA), with accuracies greater than 0.98, suggesting that both techniques are effective in improving the generalizability of the model, although fine-tuning offers a slight advantage in this case.

The ResNet50V2 and DenseNet121 models also achieved high levels of accuracy in both configurations (FT and no data augmentation), while fine-tuning provided a small additional improvement. In contrast, MobileNetV2 showed lower accuracy, peaking at 0.9497. This result suggests that although MobileNetV2 is suitable for applications with computational processing constraints, it has limitations compared with more robust models such as Xception or InceptionV3 in terms of ultimate accuracy. Figure 5 shows that the Xception, InceptionV3 FT, and ResNet50V2 architectures achieved the highest levels of accuracy, standing out as the most effective options for image classification tasks in this context. Based on the accuracy values, none of the CNN models proposed for the automatic waste classification system managed to improve accuracy when data augmentation was applied; so, for this type of system, it is not considered necessary to use the data augmentation technique.

The comparative results for the image classification models emphasize the superior performance of Xception, which achieved a test accuracy of 98.13% and high scores across metrics including accuracy (98.17%), recall (98.12%), F1 score (98.13%), and MCC (0.9722). These results position Xception as the most balanced and robust model for this task. InceptionV3 also ranked among the high-performing models, with a test accuracy of 97.43% and consistent values across all metrics, including an MCC of 0.9614, underscoring its reliability in classification tasks.

DenseNet121 and ResNet50V2 delivered solid performances, with test accuracies around 96%, along with metrics such as precision, recall, and F1-Score near 0.961, making them reliable options, albeit slightly behind Xception and InceptionV3. Conversely, MobileNetV2, despite achieving a test accuracy of 94.97%, fell short in accuracy and generalizability compared with the more complex models. MobileNetV3’s results reflected its significantly lower performance, with a test accuracy of 55.20% and an MCC of 0.3616. This significant drop in performance is mainly attributable to its limited ability to handle the complexities of the dataset, as it is designed for lightweight applications and lacks the depth required to model complex patterns effectively. Furthermore, the reduced number of parameters and layers in MobileNet models, while advantageous for reducing computational costs, compromises their ability to capture and process fine-grained features, leading to decreased accuracy in tasks requiring high levels of precision, such as residue classification. These findings underscore the trade-off between computational efficiency and classification performance and emphasize the need to align model selection with the specific requirements and complexity of the application. These results, detailed in Table 6, underscore that Xception, InceptionV3, and DenseNet121 were the most effective models in terms of accuracy and consistency. Thus, they are highly recommended for classification tasks in contexts where accuracy is critical.

The Xception and InceptionV3 models demonstrated superior performance in waste categorization, achieving accuracies close to 100% in classifying metal, paper, and plastic. This highlights their exceptional effectiveness and consistency in this task. MobileNetV2, DenseNet121, and ResNet50V2 also performed well overall, though they exhibited slight decreases in accuracy, particularly in the plastics category. This suggests that, while robust, these models may encounter difficulties in classifying certain materials, probably due to variations in visual characteristics. This indicates that, despite their robustness, these models may struggle with certain materials due to variations in visual features [12].

Conversely, MobileNetV3 showed significantly lower accuracy across all categories, underscoring its limitations in the context of this specific waste sorting application. Figure 6 illustrates how the model selection significantly impacted the ability to distinguish between material categories, with Xception and InceptionV3 emerging as the most reliable and accurate architectures for this task.

After applying data augmentation (DA) and fine-tuning (FT) techniques, the results revealed significant differences in the accuracy achieved by each model. Xception, even without data augmentation or fine-tuning, achieved optimal performance with a perfect accuracy of 1.0, establishing itself as the most effective model in terms of pure accuracy. InceptionV3’s accuracy reached 0.993 when fine-tuning was applied, highlighting how this technique enhances its generalizability. Similarly, ResNet50V2 showed a substantial improvement in accuracy, achieving 0.989 with fine-tuning, demonstrating its positive adaptation to the adjusted data.

These findings, illustrated in Figure 7, emphasize the effectiveness of fine-tuning in improving the accuracy of models like InceptionV3 and ResNet50V2, while Xception maintained its peak performance without requiring additional modifications.

The fine-tuning technique demonstrated a positive impact on models such as InceptionV3, DenseNet121, and ResNet50V2 (see Figure 8a). For instance, InceptionV3 achieved an accuracy of 0.993 with fine-tuning (see Figure 8b), compared with 0.990 without fine-tuning. Similarly, DenseNet121 and ResNet50V2 improved their performance with fine-tuning, reaching accuracies of 0.974 and 0.989, respectively, compared with their baseline values.

However, Xception and MobileNetV2 exhibited decreased accuracy with fine-tuning, suggesting that this technique is not universally beneficial, particularly for architectures designed to be lighter and more computationally efficient. These results highlight the importance of tailoring optimization techniques to the specific structural characteristics of each model.

The results of the study demonstrated that, based on accuracy and overall performance metrics, the Xception model emerged as the most robust and effective option for the image classification task, achieving perfect accuracy without fine-tuning. InceptionV3 with fine-tuning (FT) and ResNet50V2 also exhibited outstanding performance, with near-perfect accuracy values, particularly when fine-tuning was applied.

These findings indicate that both InceptionV3 FT and ResNet50V2 FT can achieve high generalizability and are reliable for tasks requiring high precision. However, Xception distinguishes itself as the most efficient and accurate model, maximizing performance without the need for additional adjustments. This makes it the optimal choice for high-demand sorting applications where accuracy and stability are critical.

4.4. Confusion Matrix

The results shown along the diagonal of each array represent the correct predictions made during the testing phase, using a set of images from the “test” folder [30]. Analysis of the confusion matrices (see Figure 9) reveals key differences in the performance of the Xception, InceptionV3 FT, and ResNet50 FT models.

Xception demonstrated the highest overall accuracy, with only 12 errors in the “metal” class (9 misclassified as paper and 3 as plastic) and perfect performance in the “paper” class, with no classification errors. In the “plastic” class, Xception recorded four errors (two misclassified as metal and two as paper), establishing itself as the most robust model.

InceptionV3 FT, while accurate, produced more errors than Xception: 31 in the “metal” class (17 misclassified as paper and 14 as plastic), 4 in “paper” (3 as metal and 1 as plastic), and 14 in “plastic” (11 as metal and 3 as paper). These results suggest lower accuracy, particularly in the “metal” and “plastic” classes.

ResNet50 FT delivered an acceptable performance, with 19 errors in the “metal” class (10 misclassified as paper and 9 as plastic), 1 error in the “paper” class (misclassified as metal), and 6 in the “plastic” class (3 as metal and 3 as paper). Although ResNet50 FT demonstrated solid accuracy, it fell short of the performance achieved by Xception.

4.5. VertexAI

The VertexAI model was tested within the prototype, using materials such as different types of cans, bottles, and paper.

The results obtained with Vertex AI showed consistent accuracy and relatively stable processing times for each class. On average, the model achieved an accuracy of 94.31% in the classification of metal, 82.36% for paper, and 94.72% for plastic, with response times ranging from 2.25 to 2.46 s, as detailed in Table 7. These values reflect the effectiveness of the Vertex AI model in the automatic identification of materials, demonstrating its potential to improve efficiency in solid waste management.

While the accuracy of Vertex AI is slightly lower than other models, its great advantage lies in the ease of deployment in the cloud and automatic scalability. This model adapts well to environments with infrastructure limitations, allowing automated sorting systems to be expanded and adapted with greater flexibility. It does not require a robust local infrastructure, as it runs in the cloud, making it an attractive option for areas with limited resources.

4.6. Xception

To evaluate the Xception model within the prototype, exhaustive tests were carried out using various types of materials; Table 8 presents the results, detailing the accuracy and average processing time per classification instance.

As shown, the Xception model achieved near-perfect accuracy across all classes, consistently delivering outstanding results. At the end of Table 7, a summary of the average accuracy and processing time values for each class is presented, revealing an average accuracy of 99.99% for metal, 100.00% for paper, and 100.00% for plastic, with average processing times of less than 0.25 s. These results in the prototype demonstrate the efficiency and speed of the Xception model for automated material sorting.

In comparison, Xception exceled in both accuracy and speed compared with Vertex AI. Meanwhile, Vertex AI offers advantages in scalability and ease of deployment in cloud environments. Considering the challenges faced by municipal solid waste management in terms of efficiency and sustainability, Xception is ideal for systems requiring fast and accurate sorting, optimizing material recovery and reducing landfill saturation. Meanwhile, Vertex AI facilitates the expansion of automated sorting systems in environments with limited infrastructure. Both models provide effective solutions for advancing toward safer and more sustainable waste management practices.

5. Cost Evaluation

Cost plays a major role in evaluating models like Vertex AI and Xception, especially in resource-constrained projects. In the current analysis, Vertex AI resulted in a total cost of USD 355 to implement, while Xception was considerably cheaper at USD 150. Vertex AI’s cost breaks down into model creation, which took 1.5 h at USD 18 per hour, resulting in USD 28.41, plus an additional cost of USD 3.46 per hour for the 3 h of creation and testing, which added up to USD 10.38. In total, this led to a final expense of USD 38.79 for using the model. However, cost is not the only aspect to consider. Accuracy and performance in waste classification are key, and a more expensive model like Vertex AI could offer higher accuracy and better real-time processing capabilities, which may justify the additional cost in certain contexts. Additionally, infrastructure and technological dependence play a major role; although Vertex AI relies on an internet connection and cloud infrastructure, Xception’s independence from internet connectivity makes it more suitable for resource-constrained environments or when looking for more autonomous solutions. Flexibility and control are also differentiating factors; Xception allows greater customization and control over the model, which is ideal for teams that want to modify the system according to their needs. On the other hand, Vertex AI, being a pre-trained solution, might have restrictions in terms of customization, but it offers the benefit of automatic updates and maintenance, reducing the operational burden. In this sense, although cost is essential, the choice of model should be based on an analysis that balances cost with performance, flexibility, scalability, and maintenance, to determine which offers the best advantages according to the needs of the project. Table 9 presents the costs associated with the creation of each model.

While both models require the robotic arm, Arduino Uno, and a lighting system, implementation with Xception is cheaper at a total cost of USD 150 compared with USD 355 for Vertex AI, thanks to the use of more accessible components.

6. Conclusions

The CNN architectures evaluated in this study demonstrated varying capabilities to automatically classify solid waste into three categories: paper, plastic, and metal. Experiments involving CNN models, including Xception, InceptionV3 FT, and ResNet50V2 FT, demonstrated their superior performance, ranking these as the top three. Xception excelled with an accuracy of 1.0 and response times of approximately 0.25 s, showcasing both its superior accuracy and efficiency. Fine-tuning techniques enhanced the accuracy of models such as InceptionV3 and ResNet50V2; however, not all models responded favorably to fine-tuning, as seen with Xception and MobileNetV2. Additionally, data augmentation did not result in significant improvements in accuracy, indicating that this technique was unnecessary in this specific context.

The implementation of the Xception model in the prototype proved highly efficient, achieving an accuracy of over 99% and fast response times. In comparison, Vertex AI achieved an average accuracy of 90% with processing times of around 2 s. While Vertex AI provides advantages in terms of scalability and ease of deployment in cloud environments, its implementation costs, totaling approximately USD 355, were significantly higher than the USD 150 required for the Xception-based system.

This study can support the improvement of automatic waste sorting systems employing deep learning models to boost accuracy and response times, offering a cost-effective, high-performance solution for sustainable waste management.

Author Contributions

Conceptualization, J.D.O.-M. and D.E.D.-L.; formal analysis, J.D.O.-M. and M.d.R.V.-A.; investigation, X.J.O.-V.; methodology, N.A.V.-C. and M.d.R.V.-A.; supervision, X.J.O.-V. and N.A.V.-C.; writing—original draft, J.D.O.-M. and D.E.D.-L.; writing—review and editing, J.D.O.-M., X.J.O.-V., and D.E.D.-L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a Universidad Estatal de Milagro (UNEMI) Scholarship.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

The authors are grateful to the Universidad Estatal de Milagro (UNEMI).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alshaikh, R.; Abdelfatah, A. Optimization Techniques in Municipal Solid Waste Management: A Systematic Review. Sustainability 2024, 16, 6585. [Google Scholar] [CrossRef]
Yong, L.; Ma, L.; Sun, D.; Du, L. Application of MobileNetV2 to waste classification. PLoS ONE 2023, 18, e0282336. [Google Scholar] [CrossRef] [PubMed]
Thung, G.; Yang, M. Classification of Trash for Recyclability Status. 2016. Available online: https://api.semanticscholar.org/CorpusID:27517432 (accessed on 22 October 2024).
Alsabei, A.; Alsayed, A.; Alzahrani, M.; Al-Shareef, S. Waste Classification by Fine-Tuning Pre-trained CNN and GAN. Int. J. Comput. Netw. Inf. Secur. 2021, 21, 65. [Google Scholar] [CrossRef]
Tasnim, N.H.; Afrin, S.; Biswas, B.; Anye, A.A.; Khan, R. Automatic classification of textile visual pollutants using deep learning networks. Alex. Eng. J. 2023, 62, 391–402. [Google Scholar] [CrossRef]
Kang, Z.; Yang, J.; Li, G.; Zhang, Z. An Automatic Garbage Classification System Based on Deep Learning. IEEE Access 2020, 8, 140019–140029. [Google Scholar] [CrossRef]
Meng, S.; Zhang, N.; Ren, Y. X-DenseNet: Deep Learning for Garbage Classification Based on Visual Images. J. Phys. Conf. Ser. 2020, 1575, 012139. [Google Scholar] [CrossRef]
Shi, C.; Xia, R.; Wang, L. A Novel Multi-Branch Channel Expansion Network for Garbage Image Classification. IEEE Access 2020, 8, 154436–154452. [Google Scholar] [CrossRef]
Poudel, S.; Poudyal, P. Classification of Waste Materials using CNN Based on Transfer Learning. In Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation, New York, NY, USA, 29–33 December 2022. [Google Scholar] [CrossRef]
Kaya, V. Classification of waste materials with a smart garbage system for sustainable development: A novel model. Front. Environ. Sci. 2023, 11, 1228732. [Google Scholar] [CrossRef]
Rayhan, Y.; Pratama Rifai, A. Multi-class Waste Classification Using Convolutional Neural Network. Appl. Environ. Res. 2024, 46. [Google Scholar] [CrossRef]
Sallang, N.C.A.; Islam, M.T.; Islam, M.S.; Arshad, H. A CNN-Based Smart Waste Management System Using TensorFlow Lite and LoRa-GPS Shield in Internet of Things Environment. IEEE Access 2021, 9, 153560–153574. [Google Scholar] [CrossRef]
Huang, Y. Research on garbage sorting robotic arm based on image vision. J. Phys. Conf. Ser. 2024, 2741, 012020. [Google Scholar] [CrossRef]
Liang, Z.; Gumabay, M.V.N. Smart Household Waste Classification System using Artificial Intelligence. In Proceedings of the 2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), Wuhan, China, 22–24 April 2022; pp. 766–771. [Google Scholar] [CrossRef]
Mookkaiah, S.S.; Thangavelu, G.; Hebbar, R.; Haldar, N.; Singh, H. Design and development of smart Internet of Things–based solid waste management system using computer vision. Environ. Sci. Pollut. Res. 2022, 29, 64871–64885. [Google Scholar] [CrossRef] [PubMed]
Al-Shareeda, S.; Hasan, H.; Alkhayyat, A. SmartyBin: A Tangible AI-Based Waste Classification Prototype. In Proceedings of the 2023 International Symposium on Networks, Computers and Communications (ISNCC), Doha, Qatar, 23–26 October 2023; pp. 1–6. [Google Scholar] [CrossRef]
Sayed, G.I.; Abd Elfattah, M.; Darwish, A.; Hassanien, A.E. Intelligent and sustainable waste classification model based on multi-objective beluga whale optimization and deep learning. Environ. Sci. Pollut. Res. 2024, 31, 31492–31510. [Google Scholar] [CrossRef] [PubMed]
Huynh, M.-H.; Pham-Hoai, P.-T.; Tran, A.-K.; Nguyen, T.-D. Automated Waste Sorting Using Convolutional Neural Network. In Proceedings of the 2020 7th NAFOSTED Conference on Information and Computer Science (NICS), Ho Chi Minh City, Vietnam, 26–27 November 2020; pp. 102–107. [Google Scholar] [CrossRef]
Chatterjee, S.; Hazra, D.; Byun, Y.-C. IncepX-Ensemble: Performance Enhancement Based on Data Augmentation and Hybrid Learning for Recycling Transparent PET Bottles. IEEE Access 2022, 10, 52280–52293. [Google Scholar] [CrossRef]
Puig, S.; Foukia, N. CleverTrash: An ML-based IoT system for waste sorting with continuous learning cycle. In Proceedings of the 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET), Prague, Czech Republic, 20–22 July 2022; pp. 1–9. [Google Scholar] [CrossRef]
Lu, G.; Wang, Y.; Xu, H.; Yang, H.; Zou, J. Deep multimodal learning for municipal solid waste sorting. Sci. China Technol. Sci. 2022, 65, 324–335. [Google Scholar] [CrossRef]
Gondal, A.U.; Dar, A.W.; Habib, I.; Rehman, A.; Tariq, U.; Zahoor, S.; Gul, S.; Rizwan, M. Real Time Multipurpose Smart Waste Classification Model for Efficient Recycling in Smart Cities Using Multilayer Convolutional Neural Network and Perceptron. Sensors 2021, 21, 4916. [Google Scholar] [CrossRef] [PubMed]
Alrayes, F.S.; Alotaibi, M.; Algoufi, F.; Rahman, M.A.; Rajamani, S. Waste classification using vision transformer based on multilayer hybrid convolution neural network. Urban Clim. 2023, 49, 101483. [Google Scholar] [CrossRef]
TensorFlow. TensorFlow libraries & Extensions. Available online: https://www.tensorflow.org/resources/libraries-extensions?hl=es-419 (accessed on 21 October 2024).
Swittuthworkspace. RecycleSorter Dataset. Roboflow Universe. Available online: https://universe.roboflow.com/swittuthworkspace/reyclesorter/browse?queryText=&pageSize=50&startingIndex=0&browseQuery=true (accessed on 21 October 2024).
Industrial Watch. Bottle Defect Detection Dataset. Roboflow Universe. Available online: https://universe.roboflow.com/industrial-watch-f4kfa/bottle-defect-detection-srfqc/browse?queryText=&pageSize=50&startingIndex=0&browseQuery=true (accessed on 21 October 2024).
Add. Detect Can Dataset. Roboflow Universe. Available online: https://universe.roboflow.com/add-ejbor/detect-can (accessed on 21 October 2024).
Rismiyati; Endah, S.N.; Khadijah; Shiddiq, I.N. Xception Architecture Transfer Learning for Garbage Classification. In Proceedings of the 2020 4th International Conference on Informatics and Computational Sciences (ICICoS), Semarang, Indonesia, 10–11 November 2020; pp. 1–4. [Google Scholar] [CrossRef]
Oliveira, S.; Topsakal, O.; Toker, O. Benchmarking Automated Machine Learning (AutoML) Frameworks for Object Detection. Information 2024, 15, 63. [Google Scholar] [CrossRef]
Rajalakshmi, J.; Sumangali, K.; Jayanthi, J.; Muthulakshmi, K. Artificial Intelligence with Earthworm Optimization Assisted Waste Management System for Smart Cities. Glob. NEST J. 2023, 25, 190–197. [Google Scholar] [CrossRef]

Figure 1. System architecture.

Figure 2. Experiment workflow.

Figure 3. Training diagram for different network.

Figure 4. Loss diagram for different networks.

Figure 5. Accuracy comparison between different models.

Figure 6. The accuracy comparison results of single category.

Figure 7. Average accuracy per model.

Figure 8. (a) Average accuracy for each model with fine-tuning (FT). (b) Average accuracy for each model without fine-tuning (WFT).

Figure 9. Confusion matrix: Xception, InceptionV3 FT, and ResNet50 FT.

Table 1. Function of electronic components used.

Operating Mode	Electronic Component	Function
Vertex	CanaKit Raspberry Pi	The Raspberry Pi provides the central processing unit, responsible for controlling and running the image capture and classification system, as well as managing interaction with other system components.
Vertex	Raspberry Pi Camera Module	Captures images in real time that are processed and classified using deep learning models run on the Raspberry Pi.
CNN	Web Camera	Captures additional images from different angles for validation processes and remote monitoring, ensuring the correct operation of the system.
Vertex and CNN	Arduino Uno	Controls sensors and actuators, including the servo motor of the robotic arm; manages hardware interactions and sends information to the Raspberry Pi or laptop for processing along with the captured images.
Vertex and CNN	Servo Motor S MG90S	Controls the movements of the robotic arm or other mechanical parts, allowing objects to be manipulated and classified into specific compartments according to the identification made.
Vertex and CNN	Robotic Arm	Controlled by the Arduino Uno, the robotic arm performs physical actions according to the classification results, such as picking up and depositing objects into the corresponding compartments.
Vertex and CNN	Lighting System	Provides additional illumination during image capture, improving image quality; consists of a small light bulb and a mini two-pin switch.

Table 2. Comparison between Vertex AI and CNN Architecture.

Feature	Vertex AI	CNN Architecture
Training	- Cloud processing, no significant hardware dependency. - Easy integration and scalability.	- Requires robust hardware, especially GPUs, for efficient training. - Requires manual configuration of the training environment.
Performance	- Automatic scalability. - Pre-trained models optimized for efficiency.	- Requires optimization after pre-training. - Greater control over the optimization process.
Usage	- Requires internet connection to access the cloud service. - User-friendly interface for users with no ML experience.	- Does not require internet connection once the model is trained. - Requires technical knowledge for implementation and management.
Long-term Savings	- Does not require own infrastructure. - Continuous costs for usage.	- Requires initial investment in infrastructure. - Potential savings by not relying on external services.
Compatibility with Waste Types	- Can adapt to different types of waste through specific configurations. - No hardware changes required for different types of waste.	- Requires training for each type of material. - Greater customization for each type of waste.

Table 3. Training hyperparameters settings.

Optimizer	Learning Rate	Epoch	Batch Size	Test Accuracy	Speed Classification
Adam	0.001	10	32	97.44%	0.642 s
	0.005	10	64	98.01%	0.711 s
	0.01	10	64	96.84%	0.847 s
RMSprop	0.02	12	64	98.12%	0.234 s
Adamax	0.04	12	64	97.89%	0.726 s

Table 4. Details of the pre-trained models.

Model	Parameters	Layers	Size (MB)
Xception	20.8 M	135	96.04
Inceptionv3	21.8 M	314	100.07
MobileNetV2	2.3 M	157	19.15
Resnet50v2	23.6 M	193	106.44
DenseNet121	7.0 M	429	36.04
MobileNetv3	3.0 M	191	19.52

Table 5. Experiment results for different CNNs.

Model	Batch Size	Epoch	Learning Rate	Training Time	Validation Accuracy	Test Accuracy	Parameters
Xception	64	10	0.005	3096 s	98.78%	98.01%	20.8 M
			0.01	3250 s	97.30%	97.42%
		12	0.02	2820 s	98.78%	98.12%
InceptionV3	64	10	0.005	1926 s	97.19%	97.07%	21.8 M
			0.01	710 s	97.77%	97.42%
		12	0.02	1880 s	96.14%	94.80%
DenseNet121	64	10	0.005	985 s	93.45%	96.26%	7.0 M
			0.01	986 s	94.27%	95.79%
		12	0.02	823s	96.60%	96.60%
ResNet50V2	64	12	0.005	2786 s	96.49%	95.79%	23.6 M
		10	0.01	664 s	97.08%	96.14%
		12	0.02	1853 s	96.84%	96.89%

Table 6. Different performance metrics for the classification models.

Model	Test Accuracy	Precision	Recall	F1 Score	MCC
DenseNet121	0.9661	0.9667	0.9661	0.9659	0.9495
InceptionV3	0.9743	0.9743	0.9743	0.9743	0.9614
MobileNetV2	0.9497	0.9514	0.9497	0.9493	0.9257
MobileNetV3	0.5520	0.5722	0.5708	0.5632	0.3616
ResNet50V2	0.9614	0.9615	0.9614	0.9613	0.9423
Xception	0.9813	0.9817	0.9812	0.9813	0.9722

Table 7. Average results from the three classes.

Material	Tests	Accuracy (%)	Time (s)
Metal	5	94.31	2.27
Paper	5	82.36	2.25
Plastic	5	94.72	2.46

Table 8. Average results from the 3 classes.

Material	Tests	Accuracy (%)	Time (s)
Metal	5	99.99	0.234
Paper	5	100.00	0.234
Plastic	5	100.00	0.242

Table 9. Material costs for each implementation.

Vertex AI		Xception
Arm	85	Arm	85
Arduino Uno	20	Arduino Uno	20
Raspberry	220	Structure	15
Pi camera	10	Web Camera	25
Structure	15	Lighting System	5
Lighting System	5	Lighting System	5
Total	USD 335	Total	USD 150

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ortiz-Mata, J.D.; Oleas-Vélez, X.J.; Valencia-Castillo, N.A.; Villamar-Aveiga, M.d.R.; Dáger-López, D.E. Comparison of Vertex AI and Convolutional Neural Networks for Automatic Waste Sorting. Sustainability 2025, 17, 1481. https://doi.org/10.3390/su17041481

AMA Style

Ortiz-Mata JD, Oleas-Vélez XJ, Valencia-Castillo NA, Villamar-Aveiga MdR, Dáger-López DE. Comparison of Vertex AI and Convolutional Neural Networks for Automatic Waste Sorting. Sustainability. 2025; 17(4):1481. https://doi.org/10.3390/su17041481

Chicago/Turabian Style

Ortiz-Mata, Jhonny Darwin, Xiomara Jael Oleas-Vélez, Norma Alexandra Valencia-Castillo, Mónica del Rocío Villamar-Aveiga, and David Elías Dáger-López. 2025. "Comparison of Vertex AI and Convolutional Neural Networks for Automatic Waste Sorting" Sustainability 17, no. 4: 1481. https://doi.org/10.3390/su17041481

APA Style

Ortiz-Mata, J. D., Oleas-Vélez, X. J., Valencia-Castillo, N. A., Villamar-Aveiga, M. d. R., & Dáger-López, D. E. (2025). Comparison of Vertex AI and Convolutional Neural Networks for Automatic Waste Sorting. Sustainability, 17(4), 1481. https://doi.org/10.3390/su17041481

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparison of Vertex AI and Convolutional Neural Networks for Automatic Waste Sorting

Abstract

1. Introduction

2. Literature Review

3. Smart Waste Management System

3.1. Design Prototype

3.2. System Architecture

3.3. Prototype Hardware Schema

3.4. Software Components

3.4.1. Python Libraries

3.4.2. Google Colab

3.4.3. Visual Studio Code

3.5. Methodology

3.6. Dataset Description

3.7. Dataset Preprocessing

3.7.1. Dimension

3.7.2. Normalization

3.7.3. Data Augmentation

3.8. Fine-Tuning

3.9. Important Parameters

3.9.1. Optimizers

3.9.2. Categorical Cross-Entropy

3.9.3. Epochs

3.9.4. Learning Rate

3.9.5. Batch Size

3.10. Transfer Learning

3.11. Vertex AI

3.12. CNN Architecture

3.13. Hyperparameter Specification

4. Results and Discussion

4.1. Classification Metrics

4.2. Perfomance Analysis of CNN Models

4.3. Trainig Results

4.4. Confusion Matrix

4.5. VertexAI

4.6. Xception

5. Cost Evaluation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI