ECO-HYBRID: Sustainable Waste Classification Using Transfer Learning with Hybrid and Enhanced CNN Models
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsSummary:
- This paper aims to improve automated waste classification for sustainable waste management by developing deep learning models that are both accurate and efficient. The authors combine transfer learning with custom lightweight CNNs, proposing two new architectures, including EcoMobileNet and EcoDenseNet. They are enhanced with techniques such as Mish activation, SE blocks, CBAM attention, and PolyFocal loss. The authors further explore hybrid and ensemble strategies to maximize performance. The models are benchmarked against 11 pretrained CNNs, using their Extended Garbage dataset (10 classes, 4,691 images), and an additional dataset collected by the authors. The models are evaluated with ablation studies, statistical testing, and validated on the external TrashNet dataset (6 classes). The results show that the proposed models achieve state-of-the-art accuracy (approximately 98%) while remaining computationally lightweight. It highlights their potential for real-world deployment in smart waste management systems.
Strengths:
- A wide variety of models (11 pretrained CNNs) were benchmarked.
-
The authors address a real-world issue (waste sorting) with significant practical implications.
- Confusion matrices for different methods have been shown.
-
Cross-dataset testing on the TrashNet dataset demonstrates generalization capacity.
- The inclusion of an ablation study (Section 4.5, Table 8) is a notable strength. It demonstrates the contribution of each architectural enhancement, including Mish activation, SE blocks, CBAM, and PolyFocal loss, to the performance of EcoMobileNet.
Weaknesses:
- While the authors mention learning rates only in EcoMobileNet and optimizers, e.g., Adam, AdamW, Nadam, most hyperparameters, including batch size, momentum, and weight decay, are either missing or not consistently reported. Furthermore, although hyperparameter tuning is briefly mentioned, no systematic methodology, e.g., grid search, random search, is described. For reproducibility, the authors should provide a comprehensive table of hyperparameters and detail their tuning approach, including search ranges and validation metrics.
- While training epochs are visible in Figures 7 and 8, several curves end well before the nominal maximum epochs, e.g., around 30–35, which suggests the use of early stopping. However, the authors do not specify the early stopping parameters, e.g., patience, monitoring metric, or report the actual number of epochs each model trained before stopping. Since early stopping can significantly affect model performance, especially in comparisons between training from scratch and fine-tuning, I recommend that the authors provide explicit details of their early stopping strategy.
- The reported differences between models trained from scratch and those fine-tuned via transfer learning (Table 4, Section 4.2) are unusually large for some architectures, particularly MobileNetV3-Large (12.79% vs. 97.01%). While the general trend is expected, the magnitude suggests possible inconsistencies in training setup, hyperparameter choices, or early stopping criteria. The authors should clarify whether identical augmentation, optimizer settings, epoch counts, and learning rates were used across both conditions to ensure the comparison is fair and reproducible.
- While the class distribution of the primary Extended Garbage dataset is reported in Table 2, the TrashNet dataset used for external validation (Section 4.7, Table 11) is not described with similar detail. Providing the class distribution of the external dataset would strengthen the validity and transparency of the evaluation.
- While the authors have developed a solid Extended Garbage dataset, it remains relatively small. There are larger, publicly available alternatives, such as the HGI-30 (18k images, 30 classes), Waste Classification datasets (15k–25k images,10-12 classes), and the Garbage Classification (15k images), that could reinforce model generalization and reduce overfitting. It would have been better if the authors had considered experimenting with or comparing their models on one or more of these datasets to validate robustness and enhance the paper's broader applicability.
- Some mathematical notations in the attention mechanism section, including d, dj, are introduced directly in equations without verbal clarification. For readability, it would have been better if the authors had accompanied such symbols with short, intuitive definitions, e.g., d denotes the number of channels.
- The "RELATED WORKS" section is comprehensive and covers a wide range of prior studies. However, it is more descriptive (summarizing datasets and reported accuracies) than analytical. The paper would benefit from a stronger critical synthesis in terms of highlighting systematic gaps, such as reliance on small 6-class datasets, lack of efficiency analysis, or limited generalization testing, and positioning the current study more clearly in relation to those gaps.
- In Table 9, the accuracies for the proposed models are listed in the "Architecture" column rather than in the Accuracy column, which breaks the formatting consistency with the other rows. For clarity, these values should be placed in the "Accuracy" column to maintain comparability across studies.
Author Response
Response to Reviewer 1
We thank the reviewer for their constructive comments and valuable feedback. Below, we provide detailed responses to each point raised.
We sincerely appreciate the reviewer’s recognition of the strengths of our work, including benchmarking against a wide variety of models, addressing real-world implications, and performing ablation and cross-dataset testing.
Concern #1:
Reviewer Comment: While the authors mention learning rates only in EcoMobileNet and optimizers, e.g., Adam, AdamW, Nadam, most hyperparameters, including batch size, momentum, and weight decay, are either missing or not consistently reported. Furthermore, although hyperparameter tuning is briefly mentioned, no systematic methodology, e.g., grid search, random search, is described. For reproducibility, the authors should provide a comprehensive table of hyperparameters and detail their tuning approach, including search ranges and validation metrics.
Author Response: We thank the reviewer for this helpful observation. A complete table of hyperparameters—including batch size, dropout rates, and weight decay—is now included (Table 3). Section 3.5 details our tuning methods: EcoMobileNet used grid search on discrete-valued ranges; EcoDenseNet and Hybrid models used random search. Selection was based on validation accuracy/loss, with EarlyStopping and scheduler callbacks as described.
Concern #2:
Reviewer Comment: While training epochs are visible in Figures 7 and 8, several curves end well before the nominal maximum epochs, e.g., around 30–35, which suggests the use of early stopping. However, the authors do not specify the early stopping parameters, e.g., patience, monitoring metric, or report the actual number of epochs each model trained before stopping. Since early stopping can significantly affect model performance, especially in comparisons between training from scratch and fine-tuning, I recommend that the authors provide explicit details of their early stopping strategy.
Author Response: We thank the reviewer for pointing this out. Details of EarlyStopping strategies are now explicit in Section 3.5. We used validation loss as the monitoring metric, patience set to 20 (EcoDenseNet) or 15 (EcoMobileNet and Hybrid), and always restored best weights. Actual epochs to convergence—typically 30–40—are reported.
Concern #3:
Reviewer Comment: The reported differences between models trained from scratch and those fine-tuned via transfer learning (Table 4, Section 4.2) are unusually large for some architectures, particularly MobileNetV3-Large (12.79% vs. 97.01%). While the general trend is expected, the magnitude suggests possible inconsistencies in training setup, hyperparameter choices, or early stopping criteria. The authors should clarify whether identical augmentation, optimizer settings, epoch counts, and learning rates were used across both conditions to ensure the comparison is fair and reproducible.
Author Response: We thank the reviewer for this observation. For fairness, all models—trained from scratch or fine-tuned—used identical augmentation, AdamW optimizer, learning rate (0.0001), batch size, and max epochs of 50, with early stopping (patience=7). The stark performance gap for MobileNetV3-Large arises from optimization collapse on limited data, as deep models without pretraining tend to underfit or get stuck in poor minima. Transfer learning from ImageNet mitigates this, leading to rapid accurate convergence and high reproducibility.
Concern #4:
Reviewer Comment: While the class distribution of the primary Extended Garbage dataset is reported in Table 2, the TrashNet dataset used for external validation (Section 4.7, Table 11) is not described with similar detail. Providing the class distribution of the external dataset would strengthen the validity and transparency of the evaluation.
Author Response: This is a helpful suggestion to strengthen transparency. A detailed class distribution for TrashNet is now reported in Table 12: Cardboard 403, Glass 501, Metal 410, Paper 594, Plastic 482, Trash 137.
Concern #5:
Reviewer Comment: While the authors have developed a solid Extended Garbage dataset, it remains relatively small. There are larger, publicly available alternatives, such as the HGI-30 (18k images, 30 classes), Waste Classification datasets (15k–25k images,10-12 classes), and the Garbage Classification (15k images), that could reinforce model generalization and reduce overfitting. It would have been better if the authors had considered experimenting with or comparing their models on one or more of these datasets to validate robustness and enhance the paper's broader applicability.
Author Response: We agree that leveraging larger public datasets is desirable. While our Extended Garbage dataset ensures high-quality and balanced labels, our cross-validation on TrashNet demonstrates generalizability. Future work will expand experiments to alternatives like HGI-30, Waste Classification, and Garbage Classification datasets for broader benchmarking.
Concern #6:
Reviewer Comment: Some mathematical notations in the attention mechanism section, including d, dj, are introduced directly in equations without verbal clarification. For readability, it would have been better if the authors had accompanied such symbols with short, intuitive definitions, e.g., d denotes the number of channels.
Author Response: Definitions for mathematical symbols are now included in the text, directly before equations in the attention section, for clarity (Refer Section 3.3.3).
Concern #7:
Reviewer Comment: The "RELATED WORKS" section is comprehensive and covers a wide range of prior studies. However, it is more descriptive (summarizing datasets and reported accuracies) than analytical. The paper would benefit from a stronger critical synthesis in terms of highlighting systematic gaps, such as reliance on small 6-class datasets, lack of efficiency analysis, or limited generalization testing, and positioning the current study more clearly in relation to those gaps.
Author Response: Section 2 now emphasizes systematic gaps in prior studies—such as reliance on small datasets, lack of efficiency analysis, and limited edge testing—and clarifies how our work addresses these, positioning it more critically within the literature.
Concern #8:
Reviewer Comment: In Table 9, the accuracies for the proposed models are listed in the "Architecture" column rather than in the Accuracy column, which breaks the formatting consistency with the other rows. For clarity, these values should be placed in the "Accuracy" column to maintain comparability across studies.
Author Response: Table 9 formatting is corrected which is now Table 10 in the revised version. All accuracy values for proposed models are now listed under the "Accuracy" column for clarity.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors present an approach for waste classification based on image analysis using transfer learning with different CNNs.
The results are convincing and valid. The authors introduce the reader into the topic very well and inform about their data handling very deep, but in an appropriate length. In general, the article is well structured, easy to read and has a high scientific significance. The authors address an important problem with their research to reduce the environmental pollution.
In line 91 - 97, some linkages to the named sections are missing. After including them, the article can be accepted in present form.
Author Response
We thank the reviewer for their positive feedback and thoughtful suggestions.
Concern #1:
Reviewer Comment: The results are convincing and valid. The authors introduce the reader into the topic very well and inform about their data handling very deep, but in an appropriate length. In general, the article is well structured, easy to read and has a high scientific significance. The authors address an important problem with their research to reduce the environmental pollution.
Author Response: Thank you for the positive remarks. We are pleased that our approach to explaining data handling and scientific significance was clear and impactful.
Concern #2:
Reviewer Comment: In line 91 - 97, some linkages to the named sections are missing. After including them, the article can be accepted in present form.
Author Response: We have fixed cross-references (lines 91–97); all links now point precisely to the named sections as requested.
Author Response File:
Author Response.docx
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper presents a deep learning–based approach to automated waste classification, a task with important implications for sustainability and waste management. The authors evaluate eleven pre-trained convolutional neural network (CNN) architectures (e.g., ResNet-50, EfficientNet variants, DenseNet201, VGG16, InceptionV3, Xception) using transfer learning, and propose two novel lightweight models EcoMobileNet (MobileNetV3 with Squeeze-and-Excitation and Mish activation) and EcoDenseNet (DenseNet201 with SE and CBAM attention). They further design a hybrid ensemble model combining ResNet50, EfficientNetV2-M, and DenseNet201 through weighted feature fusion, as well as ensemble strategies such as soft voting, weighted voting, logit averaging, stacking, and test-time augmentation.
Strengths:
- The paper addresses a critical challenge of sustainable waste management, a globally important problem with real-world applications in smart cities and recycling.
- The authors systematically compare 11 pre-trained CNN architectures, providing a useful reference for the field.
- The introduction of EcoMobileNet and EcoDenseNet, tailored for mobile and edge deployment, is a significant contribution, especially given the practical need for lightweight solutions.
Weaknesses:
- The paper emphasizes sustainability benefits but does not provide quantitative analysis of real-world impact (e.g., reduction in misclassified waste, potential recycling efficiency improvements, or energy savings from lightweight models). Quantify how improved classification accuracy translates into real-world recycling efficiency or waste reduction metrics.
- While the paper claims edge-device suitability, actual deployment tests on mobile/embedded hardware were not conducted.
- Clarify methodological details, provide more details about the weighted fusion process, such as how the weights (0.3, 0.4, 0.3) were determined, empirical testing, grid search, or heuristic?
- The authors should consider including more papers like “An ultra-low-power embedded AI fire detection and crowd counting system for indoor areas” in the SOA review. This work is relevant because it addresses similar challenges in sustainable and efficient AI deployment on resource-constrained devices, and it could also support the discussion on power consumption analysis of individual components.
- Improve Writing & Formatting: Fix line break issues and ensure compliance with the intended journal’s formatting guidelines. Add clearer figure captions and cross-references (some are marked as “Section ??”).
Author Response
We greatly appreciate the reviewer’s constructive comments and recognition of the significance of our contributions.
We are grateful for the reviewer’s acknowledgment of the importance of sustainability, systematic comparisons, and the novelty of EcoMobileNet and EcoDenseNet.
Concern#1:
Reviewer Comment: The paper emphasizes sustainability benefits but does not provide quantitative analysis of real-world impact (e.g., reduction in misclassified waste, potential recycling efficiency improvements, or energy savings from lightweight models). Quantify how improved classification accuracy translates into real-world recycling efficiency or waste reduction metrics.
Author Response: We now include estimated real-world efficiency: e.g., a 3% accuracy improvement could recover 135 tonnes of recyclables annually in a typical facility (Refer Section 4.5 and Conclusion and Future Work section 5).
Concern#2:
Reviewer Comment: While the paper claims edge-device suitability, actual deployment tests on mobile/embedded hardware were not conducted.
Author Response: While direct hardware deployments were not performed, our models’ compact parameter sizes and low-latency inference on Kaggle GPU servers are evidenced, suggesting suitability for mobile/embedded devices. Future work will expand to actual deployment on hardware.
Concern#3:
Reviewer Comment: Clarify methodological details, provide more details about the weighted fusion process, such as how the weights (0.3, 0.4, 0.3) were determined, empirical testing, grid search, or heuristic?
Author Response: Fusion weights (0.3, 0.4, 0.3) were chosen empirically, utilizing validation set grid search for optimal ensemble accuracy and minimizing overfitting (Refer Section 3.3.3).
Concern#4:
Reviewer Comment: The authors should consider including more papers like “An ultra-low-power embedded AI fire detection and crowd counting system for indoor areas” in the SOA review. This work is relevant because it addresses similar challenges in sustainable and efficient AI deployment on resource-constrained devices, and it could also support the discussion on power consumption analysis of individual components.
Author Response: We appreciate this helpful suggestion. This article is really helpful on power consumption analysis.
Concern#5:
Reviewer Comment: Improve Writing & Formatting: Fix line break issues and ensure compliance with the intended journal’s formatting guidelines. Add clearer figure captions and cross-references (some are marked as “Section ??”).
Author Response: We thank the reviewer for their careful attention to presentation. We have thoroughly revised the manuscript to fix line break issues and ensure adherence to the journal’s formatting guidelines. Figure captions have been clarified for readability, and all missing cross-references have been corrected to point to the appropriate sections.
Author Response File:
Author Response.docx
Round 2
Reviewer 3 Report
Comments and Suggestions for AuthorsMy comments have been addressed, thank you

