A Systematic Literature Review of Waste Identiﬁcation in Automatic Separation Systems

: Proper waste separation is essential for recycling. However, it can be challenging to identify waste materials accurately, especially in real-world settings. In this study, a systematic literature review (SLR) was carried out to identify the physical enablers (sensors and computing devices), datasets, and machine learning (ML) algorithms used for waste identiﬁcation in indirect separation systems. This review analyzed 55 studies, following the Kitchenham guidelines. The SLR identiﬁed three levels of autonomy in waste segregation systems: full, moderate, and low. Edge computing devices are the most widely used for data processing (9 of 17 studies). Five types of sensors are used for waste identiﬁcation: inductive, capacitive, image-based, sound-based, and weight-based sensors. Visible-image-based sensors are the most common in the literature. Single classiﬁcation is the most popular dataset type (65%), followed by bounding box detection (22.5%). Convolutional neural networks (CNNs) are the most commonly used ML technique for waste identiﬁcation (24 out of 26 articles). One of the main conclusions is that waste identiﬁcation faces challenges with real-world complexity, limited data in datasets, and a lack of detailed waste categorization. Future work in waste identiﬁcation should focus on deployment and testing in non-controlled environments, expanding system functionalities, and exploring sensor fusion.


Introduction
Waste generation is a by-product of population growth, urbanization, and economic development.In 2050, experts predict a 70% increase in annual waste generation (from 2016 levels), reaching 3.40 billion tons [1].Thus, solid waste management (SWM) is a global issue that affects every individual and government.Poorly managed waste has a direct impact on public health and the environment.At least 21% of the municipal waste (i.e., the waste from residential, commercial, or institutional sources) produced annually (2.1 billion tonnes in 2016) is not managed in an environmentally safe manner (collecting, transporting, treating, and disposing of waste in a way that minimizes its impact on the environment) [1].
Recycling is one of the main SWM strategies included in most environmental regulations [2].Implementing proper source separation at the generation site (i.e., before transport) increases the recovery rate and material quality [2] while also enhancing the performance of SWM, thereby reducing environmental impacts and economic expenses [3].However, waste separation at the source is difficult for citizens with basic knowledge and experience in garbage classification.It is also affected by multiple factors, such as physical and socio-economic barriers, human behaviors, policy constraints, public engagement, and a lack of specialized garbage collection services, among other reasons [4].
Automatic systems have been proposed to perform solid source waste segregation.These systems usually apply two sorting techniques [5]: (i) direct sorting, which uses materials' properties for separation, like magnetic susceptibility, electrical conductivity, or material density, and (ii) indirect sorting, which uses sensors to detect and classify objects by recycling categories.
The segregation process can be performed in two locations: (i) At the point of waste generation [6], where the user directly disposes of their waste.The sensing process is usually conducted in a chamber [6], and the collected information is processed to send signals to motors [7] or actuators [8], directing the waste to the appropriate bin.(ii) In a centralized place [9], where sequential sensing processes are performed as the waste is transported on a conveyor belt.Multiple technologies, such as inductive sensors, color identification sensors [8], or image-based sensors [6], are used to capture information from the waste.These data are processed to identify and place the waste into the corresponding bin [10].
Machine learning (ML) is commonly used in waste management for classifying and/or locating waste.One of the most used techniques is the convolutional neural network (CNN) for image analysis [11].Other techniques, such as the support vector machine (SVM) and classification trees, have been used with sound recordings [12] and infrared (IR) cameras [13], respectively.Classically, ML models are trained on datasets containing many observations, labeled with the desired response, so the model recognizes when that observation occurs.The type of information described on the labels refers to the class of the object [14] or its material composition [15].In general, the ML models commonly belong to one of three categories: classification [16], detection [17], or segmentation [18].Although some commercial waste classification systems are available, their autonomy is typically limited (the user is responsible for feeding and separation).In contrast, current research on waste identification in autonomous separation is still in the laboratory phase.These research systems have been tested with controlled variables but have not yet been deployed in a real-world production environment (Section 3.1).This review will provide an overview of current developments and technological limitations in the main areas of waste identification.This information will be helpful to develop a more comprehensive and autonomous waste classification system.The scope of this study is limited to the indirect source segregation of municipal solid waste.Source segregation is related to the separation of waste performed in the generation place before waste transportation.Indirect sorting is a set of separation techniques in which automated machines sense and separate the waste.Municipal solid waste is related to household, institutional, and commerce waste and excludes hazardous waste, sewage, medical waste, and e-waste.
Literature reviews on the automatic sorting of municipal solid waste (MSW) have been conducted (delved into in Section 2.1.1)using different technologies, such as computer vision [19,20], ML algorithms [21][22][23], SWM general technologies [24][25][26][27], and hardware devices [5,28].However, according to the authors' knowledge, no systematic literature review (SLR) has been conducted on waste identification on indirect separation systems that simultaneously includes detail on three main subjects: physical enablers, ML models, and datasets, which are required to develop and deploy automatic waste sorting solutions.
Therefore, this review aims to identify the current implementation state, technology requirements, and challenges of the physical infrastructure, the current ML models that perform the waste identification task, and the datasets required to build the ML models.
The main contributions of this paper are as follows: A. The identification of indirect segregation machines sensors, processing devices, complementary functionalities, and their implementation context (Section 3.1).
B. A characterization of the datasets used by waste separation systems with sorting categories, environments for collecting the observations, and geographical locations, among other elements (Section 3.2).C. The identification of public datasets for developing ML models for waste identification (Section 3.2).D. The identification of ML algorithms used for waste identification, their model architecture, and feature extractors, including analysis of the performance metrics used by the models and the objects and materials identified (Section 3.3).E. The compilation of ML algorithms' benchmark on public datasets for waste identification (Section 3.3).F.
A holistic view of relationships between hardware, ML algorithms, and datasets (Section 3.4).G.The definition of challenges and limitations of waste identification systems (Section 4).The article's structure is as follows: The next section (Section 2) describes the search methodology and related work.Section 3 reports on the three areas reviewed: physical enablers, datasets, and machine learning (ML) models.This section also highlights the interconnections among the three areas.Section 4 summarizes the challenges for each area.Finally, Section 5 presents the conclusions.

Methodology
The presented work follows the Kitchenham guidelines for an SLR [29], which provide three stages: (i) Planning the review: the identification of the need for a review, definition of the research questions, and search protocol.(ii) Conducting the review: document retrieval, data synthesis, and analysis.(iii) Reporting the review.

Related Work
Table 1 presents the literature reviews conducted between 2017 and 2023 that are related to automatic waste segregation.[30] A review of state-of-the-art hyperspectral imaging-based plastic waste detection [19] Computer vision (CV) for waste classification [24] Trends in household waste recycling [20] Critical review of CV-enabled MSW sorting [25] Critical review of MSW management strategies [21] Review on ML for solid organic waste treatment [22] ML algorithms used in recycling systems [28] Effectiveness, advantages, and disadvantages, of automated waste segregation systems [26] Monitoring methods, garbage disposal techniques, and technologies [27] SLR on forecasting of waste characteristics, waste bin level detection, process parameters prediction, vehicle routing, and SWM planning [23] Strengths and weaknesses of waste segregation algorithms [5] Physical processes, sensors, actuators, control, and autonomy Zhang et al. reviewed the SWM process from a waste separation perspective.They studied four main subjects: (i) generation and source separation, (ii) collection and transportation, (iii) pretreatment, and (iv) resource recovery and valorization.Concerning source separation, they found that further sorting after transportation is still needed to achieve finer separation.Although source separation is necessary to reduce waste crosscontamination, its importance in recycling will depend on future technological advances.They concluded that artificial intelligence (AI), Internet of Things (IoT), and 5G technologies will be widely adopted in SWM [25].

Ref. Year Subject
Abdallah et al. studied the application of AI in different SWM fields.Although AIbacked solutions have gained attention, most developments remain in the laboratory phase.Some of the identified challenges were insufficient data, the field's rapid evolution, the AI black box nature, the scarcity of tailored solutions, and the slow adoption of technology [27].Alcaraz-Londoño et al. in 2022, evidenced a general growing trend in SWM systems, especially in the topics of automatic solid waste separation with the IoT and psychologyguided research [24].
Regarding the application of ML on SWM, Ozdemir et al. analyzed the algorithms used in recycling systems in 2021.Four main techniques were reviewed: the CNN, SVM, K-nearest neighbor, and artificial neural network (ANN).Their findings revealed that the large amount of data required for training, the black-box nature of the models, and the high variability in the garbage appearance are some of the obstacles presented in the field [22].
CV (Computer vision) is also a popular field of reasearch in waste segregation and analysis.Lu and Chen conducted a critical review on CV-based MSW sorting [20].They found that the use of simplified datasets could not represent the complexities of real-world applications and that public datasets are scarce.Another limitation was that visibleimage-based solutions could not distinguish different materials with a similar appearance.Likewise, in 2019, Flores et al. studied the application of CV to support waste segregation in schools.One of the main drawbacks of the existing solutions is that most systems only classify one piece of waste per image and use only a few classification categories [23].Tamin et al. reviewed hyperspectral imaging (HSI) and ML in plastic waste detection.They found that most studies focused on the detection of general waste using visible imaging due to the cost of HSI devices.Unlike visible image solutions, HSI could overcome limitations in identifying materials with similar appearances when used as a feature in ML models.Nevertheless, black plastics are still challenging given that carbon black absorbs the ultraviolet and infrared spectral light range and may not be detected by the near-infrared sensors [30].
Sorting machines have also been surveyed.For instance, Gundupalli et al. reviewed physical processes, sensors, and actuators from a hardware perspective.Three technical fields of future development were identified: sensor fusion, energy efficiency, and robustness in adverse environments [5].The usability of automated waste segregation systems has also been studied by Carpenteros et al.They evaluated 40 studies considering efficiency, effectiveness, usability, sustainability, and learnability variables.The effectiveness had the highest rating, while the main disadvantage was found in learnability [28].Lubongo et al. analyzed the equipment used for sorting plastics and the challenges of material recovery facilities.They found that current technology still has problems identifying black plastics, tanglers, films, and multipolymer objects [31].
Finally, technology-supported waste management has also been explored in other fields: Mohamed et al. performed a systematic review of the use of the Internet of Things (IoT) in medical waste management, identifying that current technology covers only some aspects of waste tracking and there is an absence of standardized platforms [32].Jagtap et al. studied the use of CV and the IoT to measure the reasons for waste generation in real time in food manufacturing.The authors found that using CV achieves a performance higher than human monitoring [33].
There is a growing interest in applying new technologies in the SWM field, which still has several challenges to overcome.While several surveys have been presented (see Table 1), the main areas (physical enablers, ML models, and datasets) have not been reviewed in depth altogether to implement automatic segregation systems.The analysis of hardware-software interactions, restrictions imposed by the different technologies, and their applications in the different fields investigated in SWM have not been studied.Implementing real-life SWM systems without considering the hardware-software interactions and constraints imposed by different technologies is unfeasible.The proposed SLR reviews the sensors, computing devices, ML algorithms, and datasets required to develop automatic waste segregation systems.

Search Protocol
The PICOC method (Population, Intervention, Comparison, Outcome, Context) [29] was used to frame the proposed study.The generation of the PICOC consists of assigning keywords for each criterion (Table 2).Table 3 presents three research questions with the corresponding query strings based on the goal of this work and the PICOC keywords.The concrete strings queried on the databases were adapted to each database search operator.Once the query strings were defined, the retrieval, filtering, and quality assessment were performed as described in the next section.A and (B W/5 E) and F Three scientific databases were used: Scopus, Web of Science (WoS), and Google Scholar.In addition to the databases, Kaggle and GitHub were consulted to complement the datasets search.The results (studies and datasets) were limited to those published after 2015.

Conducting the Review
Figure 1 shows the data retrieval process.It was performed by an initial search (raw results) with a series of sequential filters (F1 to F4) and a quality assessment to obtain the studies to be analyzed in depth.The initial results comprised 1239 studies distributed as follows: 36% come from Google Scholar, 33% from Scopus, and 20% from WoS. Regarding public datasets, GitHub contributes 7% and Kaggle 4%.
Finally, a quality assessment was performed, and only studies that meet the quality criteria (Appendix A) were considered for in-depth review.After the quality assessment, 55 articles were selected for in-depth review: 17 were related to Q1, 39 corresponded to datasets and Q2-related articles, and 37 corresponded to Q3.Each group of articles was carefully analyzed to answer these research questions.

Results
This section provides a review of the collected studies and datasets.It begins by presenting the responses to the three research questions, followed by an overview of the results.

Physical Enablers
Automatic waste sorting machines are devices that feed, classify, and separate waste automatically.They can be classified according to their level of automation: (i) Full automation:system automatically seeks, classifies, and separates waste.A robot using IR cameras, proximity sensors, and robotic arms identifies objects on the ground to this end [34].(ii) Moderate autonomy: The system classifies and separates the waste.Nevertheless, the feeding is performed by the user.Two different layouts can be observed: (a) Continuous feeding: A conveyor belt ensures the waste is always sensed in the same spot.Sensing is performed using visible-image-based sensors (most common) [9,[35][36][37][38][39], inductive and capacity sensors [8], near-infrared (NIR) sensors [35], and weight sensors [40].Subsequently, the waste is classified and segregated towards the corresponding container using sorting arms [9,35], pneumatic actuators [36], servomotors [8,38,40], or falling on an inclined platform [39].This is the most popular system layout, proposed in 8 of 17 articles.(b) Manual feeding: The user deposits the pieces of waste, one at a time, to be sensed by the device.Visible-image-based [6,7,41] and sound-based sensors [12], as well as inductive and capacitive sensors [42], are used for sensing.Afterwards, a gravity-based mechanism is used to perform the separation.
(iii) Low autonomy: The user is responsible for the feeding and separation of the waste.
The system identifies the waste and guides the user to deposit it in the correct container by opening the corresponding lid to indicate where to deposit it [10,43].Waste identification is performed with image classification [10], radio-frequency identification (RFID) [43], or the sound generated by the trash bags [44].
Figure 2 shows the sensors used.The first eight sensors are used for the waste classification task, and the other five are used for complementary functionalities.For waste classification, the sensors found can be categorized into (i) image-based sensors: visible, IR, NIR, and color sensors, (ii) sound-based sensors, (iii) inductive sensors, (iv) capacitive sensors, and (v) weight-based sensors.
Image sensors are the most common type of sensor used in waste sorting devices, appearing in 13 of 17 devices.They are often used alone, but there has been some research on combining multiple image sensors.For example, visible-image-based sensors can be used for general waste classification (e.g., glass or plastic), and, combined with nearinfrared (NIR) sensors, they can be used in applications where more detailed categorization is required (e.g., PET, PS, PE, and PP) [35].Regarding the other types of sensors, three combinations are identified: (i) capacitive sensor with inductive sensor [8,42,43], (ii) weightbased sensor with ultrasonic sensor [40], and (iii) sound-based sensor with inductive sensor [44].
Figure 2 shows the distribution of the computing devices used according to each type of sensor.Both local and cloud computing are used, with local computing being the most frequently mentioned in the literature analyzed.All types of computing devices are used to process data from image-based sensors.Inductive and capacitive sensors, however, are specifically used with edge computing devices.
Visible  Of the 17 articles reviewed, 12 used machine learning (ML) models to identify waste, typically using image-based or sound-based sensors.The remaining five articles present systems that did not use ML models but relied on sensor measurements to directly discriminate between materials (e.g., capacitive, inductive, or near-infrared sensors).Only one study directly used visible-image-based sensor input to classify waste.This input came from reading the barcode of the products [35].
Figure 3 shows the implementation context of the sorting machines described in the different studies.The context refers to the environment in which the device has been used.There are three types of contexts reported in the literature.Laboratory refers to a device tested under a controlled environment, a prototype is a device whose materials and functionality are the same as the device on production, and on context is a device tested in a real production environment.It is evident that the vast majority (76%) of the sorting devices are evaluated in a laboratory environment with controlled variables.Some limitations of the systems reviewed in relation to their use are that 82% of the systems can only dispose of one piece of waste at a time and can only be used in controlled environments (noiseless or simple backgrounds).In addition, waste must meet specific conditions (such as having a volume, being an empty package, or existing in the database) and, in some cases, must be positioned correctly on the sensors.

Datasets
Datasets are critical in waste identification, as most of this process relies on ML models.These models are trained on large datasets of labeled waste images, or other data, to identify the underlying patterns that relate the inputs (e.g., images of waste) to the expected outputs (e.g., labels of the waste type).This review studies 39 datasets, half of which are publicly available (Table 4).In half of the datasets, the authors recorded the observations, and the other half were collected from the internet by labeling web images or extending public datasets.The datasets can be grouped by their label information into three categories: single-label classification (64% of the datasets), detection with bounding boxes (23%), and pixel segmentation (13%).Visible-based images were the most used input data (34 datasets).Other used sensors were sound-based (three datasets), IR-based (two), RGBD-based (Red Green Blue Depth) (one), and inductive sensors (one).Only two datasets use sensor fusion: the work presented in [45] that combines visible-based images with sound and the work presented in [44] that uses the impact of trash bags with inductive sensors.
Figure 4 shows the distribution of dataset labels.The first ring of the figure presents the material categories.Almost 80% of the dataset labels belong to seven categories (plastic, metal, paper, glass, organics, compounds, and cardboard).The remaining 20% correspond to less common object classes and materials (e.g., ceramics, e-waste, or recyclable).The second and third rings of the figure consist of object classes (e.g., bottle, battery, or can), material subcategories (e.g., Aluminum, PET, or PE), product brands [6], material colors [46], material properties (e.g., high-density polyethylene) [15], and object classes of a specific material (e.g., metal-aluminum-can).
Almost a third (28.9%) of the dataset's labels are of plastic materials.The most popular plastic categories are plastic bottles (eight) and plastic bags (seven), and the most common plastic subcategories are PET and PE [13,15,41].Plastics are also classified by color: bottles are classified as blue, green, white, or transparent [6], PET is classified as blue, dark, green, multicolor, teal, or transparent [46]; and tableware is classified as green, red, or white) [6].In the metals category, aluminum is the only material subclass.Aluminum has multiple object categories, such as blisters, foil, and cans, among which cans are the most common objects (15 labels).In addition, aluminum cans present an additional classification of product brands [6].  Figure 5 presents the dataset sizes (number of observations) versus the number of separation categories.In the figure, blue represents datasets for classification, red is used for detection, and green is for segmentation.The median of the dataset sizes is 4.288 observations.The median of the sorting categories is five.The datasets, regardless of label types (classification, detection, and segmentation), present a similar median number of categories (5 ± 1).However, regarding the number of observations in segmentation, the median is almost double (7.212) compared to detection and classification datasets.Moreover, the number of observations decreases when the number of categories increases.In addition, most datasets (65%) present an unbalanced distribution of observations by categories.
The waste observation datasets come from municipal sources and are collected from four different environments: (i) general (15 datasets): objects in their context before being discarded or the acquisition setup is not for a specific machine or experiment.(ii) Ondevice (12): tailored to a specific sorting machine or experiment.(iii) Indoors (10): taken inside institutions or households.(iv) On-wild (3): waste is thrown away on streets or in nature.Although observations come from different environments, 62% of the datasets were taken with simple backgrounds (without noises, no cluttered surroundings, or other elements).For the rest (15 datasets), the observations include their context, and only one used augmented backgrounds [9] (see Figure 6).
The geographic location of 31 out of 39 datasets is unknown (not specified by the authors), and 7 of them are composed of data from multiple locations.Of the datasets with known locations (eight datasets), four of them are public: Poland [15], Milan/Italy [6], Novosibirsk/Rusia [46], and Crete/Greece [9].The rest of the known location datasets are privately built for ML model studies and are from Shanghai/China [7], Kocaeli/Turkey [12], Taiwan [11], and Rajshahi/Bangladesh [41].The most popular public dataset is Trashnet [16], which is a six-category classification dataset with a simple background.Additionally, Trashnet has been modified by adding detection [11,47,48] and segmentation labels [49].In second place is Taco [18], a segmentation dataset with COCO format labels [50] that currently contains 28 categories and 60 subcategories.Taco is continuously growing through a website (http://tacodataset.org, accessed on 1 June 2022).Table 5 summarizes state-of-the-art results on these public datasets.

Machine Learning
Convolutional neural networks (CNNs) were the most common machine learning (ML) models used for waste identification in the reviewed studies, accounting for 87% of the models.Support vector machines (SVMs) were used in three studies, hidden Markov models (HMMs) in one study, and classification trees in one study.More than half (24 of 37) of the studies were published from 2021 onwards.
In this review, an ML architecture refers to the overall configuration of the ML technique that defines its components, parameters, etc. (e.g., number of layers, size, and neuron type for an ANN).An ML model is an instance of an architecture whose parameters are learned for a specific dataset.The feature extractor, or backbone, is a process performed in the ML models to transform the raw inputs into a lower-dimension representation (features), preserving relevant information [71].Finally, fine-tuning refers to the reuse of an existing model parameter value (usually a feature extractor) that has been pretrained on a more extensive and general dataset to improve its performance or overcome scarce training data in another dataset [72].
The datasets used to develop the ML models are of two types: image-based (visible and IR) and sound-based.Image-based models have been trained to accomplish three types of ML tasks: (i) Single-label classification (20 of 37 models): the models predict the class of the waste present in an image; this task only accepts one piece of waste per image.(ii) Bounding box detection (used on 13): the models predict the enclosing box and class of all the wastes present on the image.(iii) Pixel segmentation (4 models): each pixel of the input image is classified as either background or a type of waste.However, sound models have only been used for single-label classification tasks.
The visible-image-based sensor was the most commonly used (31 out of 37 studies), followed by the IR-image-based sensor (3 studies).The remaining studies use a single sound-based sensor (one), a sound-based sensor with visible-image-based sensors (one), or an inductive sensor (one) (see distribution in Figure 7).Only two studies propose using a data fusion of sounds generated by falling objects with images (visible-image-based and sound-based sensors and sound-based and inductive sensors).The study presented in [45] aims to address problems in image-based models when different materials have similar appearances and in the sound-based model of mixed wastes, which generates indistinguishable sounds.The authors proposed a method that extracts visual features with a pretrained VGG16 model [73] and acoustic features with a 1D (dimensional) CNN, which are later fused into a fully connected (FC) layer for waste classification.The study presented in [44] uses the impact sounds of trash bags with an inductive sensor input to detect the presence of glass and metals.The model used three variations of a basic CNN architecture (convolutional layers followed by FC layers).The models' input data comprised Mel spectrograms, Mel frequency cepstral coefficients (MFCCs), and the metal detector frequencies.

Sensor -Prediction type
The most popular public datasets are Trashnet [16] and Taco [18].The mean of the average accuracy (Acc.) of the classification models on the Trashnet dataset was 92.9%.The study with the top Acc.(98%) on Trashnet for waste classification used a ResNext architecture [74] with transfer learning (TL) with a backbone pretrained on ImageNet [75] that was trained with data augmentation (e.g., rotations, shifts, zoom, or sheer) [51].
Other studies used a modified version of Trashnet; thus, their results can be numerically compared (see the studies of Table 5 with an α mark).In [47], a single shot detector (SSD) [76] that uses MobileNetV2 [77] as a feature extractor has been fine-tuned on Trashnet and annotated with bounding boxes, excluding the "trash" category.Likewise, [11] studied the use of YoloV3 [78] to detect waste.The authors found that training a detection model on a single object dataset (as Trashnet) was unsuitable for this type of ML task and that the used datasets need to be location-tailored.Similarly, [48] used a modified version of Trashnet for training a YoloV4 [79] detection model.The authors found that their models had an optimal performance when trained using mosaic data augmentation (combining four images in one).Additionally, [49] proposed a method for binary image segmentation (waste, no waste) based on a SegNet architecture [80] using Trashnet images.However, the proposed method has problems with low contrast between the waste and the background.Only one study [54] used SVM on the Trashnet dataset.The model used MobileNetV2 as a feature extractor and fed an SVM classifier, which allows it to be embedded in mobile applications due to it having fewer parameters and operations.
Regarding studies that use the Taco dataset, [65] used an extended version of Taco to train a model based on YoloV5 with Darknet53 as the backbone [81].The study found that YoloV5 performs better than its older versions and is suitable for embedded devices due to its size.Similarly, [37] proposed using YoloV4 with a tailored dataset that combines Taco with images from a recycling facility sorting machine.Their proposed model could accurately detect real-world live video feeds of wastes in a conveyor belt.The main disadvantages were the hardware costs and energy consumption.The study [66] proposed a RetinaNet [82] model with ResNet50 [83] as a backbone for detection of floating waste in the oceans as well as general waste.The used dataset consisted of a subset of 369 Taco images relabeled.
We found 13 architecture types using 14 feature extractors or backbones in the surveyed studies.Figure 8 shows the relationship between the feature extractors (on the right) and the architecture types (on the left).The most common approach was to propose a custom architecture.A total of 33% of the CNN models used a custom architecture prevalent in the classification models (8 of 20 studies).Of the custom architectures, only two are not image-based and use sensor fusion [44,45].
Most of the studies used standard feature extractors as part of their models.The most common (in 10 of 37 studies) feature extractor was ResNet [83].ResNet was mainly used in the detection models (7 of 14) together with other architectures used less commonly than Yolo, like Faster RCNN [46,84], Efficientdet [85], custom architectures [17,86], Retinanet [66], and Centernet [87].Additionally, Darknet variations (Darknet and CSPdarknet) were the second most used feature extractors because they are the default backbone of Yolo architectures.Regarding classification models, there was more variation in the feature extractors with the two most popular (used four times each) being VGG [7,45,88] and Mobilenet [54,55,58].The most common patterns found in the reviewed articles with custom architectures were the following: (i) A standard feature extractor with a tailored head.The study [7] uses a semantic retrieval model [89] placed on top of a VGG16 model to perform a four-category mapping of the 13 subcategories returned by the CNN model.Their results revealed that the proposed method achieved a significantly higher performance in waste classification (94.7% Acc.) compared to the one-stage algorithm with direct four-category predictions (69.7% Acc.).The study [52] proposes the ensemble of three classification models (InceptionV1 [90], ResNet50, MobileNetV2) trained separately.Their predictions are integrated using weights with an unequal precision measurement (UPM) strategy.The model was evaluated on Trashnet (93.5% Acc.) and Fourtrash (92.9% Acc.).Ref. [53] proposed DNN-TC, which adds two FC layers to a pretrained ResNext model.DNN-TC was evaluated on Trashnet (94% Acc.) and their dataset VN-trash (98% Acc.).Ref. [56] proposed IDRL-RWODC, a model composed of a mask region-based convolutional neural network (RCNN) [91] model with DenseNet [92] as a feature extractor that performs the waste image segmentation and passes to a deep reinforcement Q-learning algorithm for region classification.IDRL-RWODC was evaluated (99.3% Acc.) on a six-category dataset [57].Ref. [17] developed a multi-task learning architecture (MTLA), a detection architecture with a ResNet50 backbone on which each convolutional block is applied to an attention mechanism (channel and spatial).The feature maps are passed to a feature pyramid network (FPN) with different combination strategies.The architecture was tested on the WasteRL dataset with nearly 57K images and four categories (97.2% Acc.).(ii) The improvement of an existing architecture.Ref. [39] presented GCNet, an improvement of ShuffleNetV2, by using the FReLU activation function [93], a parallel mixed attention mechanism module (PMAM), and ImageNet transfer learning.Ref. [94] presented DSCR-Net, an architecture based on Inception-V4 and ResNet that is more accurate (94.4 Acc.) than the Inception-Resnet versions [95] in a four-waste custom classification dataset.
(iii) New architectures.Ref. [61] proposes using a basic CNN architecture on RGB images for plastic material classification (PS, PP, HDPE, and PET).They used the WadaBa dataset [15], a single piece of waste per image on a simple black background.Their model had a lower performance (74% Acc.) than MobileNetV2 but half the number of parameters, making it appropriate for portable devices (e.g., Raspberry Pi).
The second most used architecture type was Yolo's version.Yolo is a general-use architecture proposed for waste image detection that was used in 5 of the 14 models reviewed related to the detection task.YoloV3 was used to detect six waste classes of a Taiwan-sourced dataset (mAP.92%) and was also evaluated on the TrashNet dataset for detection (mAP.81.4%) [11].YoloV4 detected four waste categories in the TrashNet-based dataset (glass, metal, paper, and plastic) [48].Similarly, Ref. [96] uses YoloV4 but added "fabric" to the classes to detect (mAP 92.2%).Their model had better detection accuracy than the single shot multibox detector (SSD) and the Faster R-CNN models.Another study with YoloV4 [37] uses an RGB-IR camera and an extended version of the Taco dataset.Further, Ref. [65] used YoloV5 to detect the 60 classes of the Taco dataset, reaching 95.49% Acc. and an mAP of 97.62%.Other standard architectures used for detection were Faster RCNN [46,84], RetinaNet [66], CenterNet [87], and SSD [47].
Regarding pixel segmentation, three of the four studies use a CNN.The most common (used in two studies) architecture was Mask R-CNN [9,56].Ref. [49] uses SegNet architecture with VGG16 as a feature extractor for segmenting an image into waste and no waste.In addition, Ref. [13] used classification trees for plastic materials segmentation.
CNNs have achieved high success in image recognition tasks, surpassing other ML methods on several benchmarks.This is due to their ability to learn spatial features and patterns in images using a hierarchical architecture of layers that perform convolution operations and extract features at different levels of abstraction [97].CNNs dominated the reviewed studies; only four presented different ML model types, and, among them, the most common was the SVM, which was used in three studies.The study [34] proposed the use of IR images to extract SURF features [98] that are mapped into a bag-of-words vector that is used in an SVM for the classification of waste into three categories (aluminum can, plastic bottle, and tetra), reaching 94.3% Acc.Ref. [54] fed an SVM with the features extracted by a fine-tuned Mobilenet model with RGB images.The approach was evaluated on Trashnet (83.46%Acc.).Finally, Ref. [12] investigated the use of sound for the volumetric packaging classification of four materials (glass, plastic, metal, and cardboard), each with three different sizes, except metal packages with only two sizes.They developed two classification models based on an SVM and HMM [99] with MFCC features.Both models reached 100% Acc. on material classification and 88.6% Acc. on material and size classification.Additionally, Ref. [13] uses near-infrared hyperspectral imaging to classify six recyclable plastic polymers used for packaging.Their approach is based on a hierarchical classification tree composed of PLS-DA (partial least squares discriminant analysis) models.Their model was evaluated on a custom setup dataset with an overall result of 98.4% mean recalls.
Over half (21 articles) of the reviewed studies used TL, where the feature extractor was pretrained on a large-scale dataset (usually ImageNet).During training on the target dataset (containing the waste identification task), the feature extractor (or part of it) is "frozen", and its parameters are not modified.Other less frequently explored aspects of ML techniques were also considered, such as hyper-parameter optimization [56], k-fold cross validation [84], and ablation experiments [39].Additionally, style transfer [87] and synthetic data [9] were used for increasing training data.
The models' performance regarding the number of categories is shown in Figure 9.The most commonly used performance metrics were average accuracy (Acc.) for classification models, mean average precision (mAP) for detection models, and average recall for segmentation (Av.Rec.).Although the most commonly used metrics were selected, it is worth noting that in certain studies the model was not assessed using these particular metrics.Consequently, not all of the reviewed studies could be included in this analysis.The overall median performance of the models reviewed is 92.3%.For classification (21 studies), the median average accuracy is 94.7%, and the median number of classification categories is 4.5.For detection models, 14 studies were considered; the median mAP was 82.8%, and the median number of categories was five.Only three studies were considered for segmentation, with a mean Av.Rec. of 98% for six median categories.It is worth highlighting that one study [12] reached an Acc. of 100% using sounds for a four-material classification task.The median number of categories of all the reviewed models was 5, and the maximum was 204 classes.The study with this number of classes used the 2020 Haihua AI Challenge dataset.The authors developed a method for waste detection based on a cascade RCNN with a cascade adversarial spatial dropout detection network (ASDDN) and an FPN.This method was developed to enhance the model's performance, particularly in handling small objects and dealing with occlusions.Their proposal reached an mAP of 62%, revealing that the number of observations of the dataset was not enough, and had trouble with unbalanced categories [86].Figure 9. Reviewed ML models performance versu's the number of sorting categories.The metric for classification models was average accuracy (Acc.).Mean average precision (mAP) was used for detection models.Average recall (Av.Rec.) was used for segmentation models.
Data augmentation has also been reported for training waste identification models to overcome the shortage of data.Ref. [100] proposes using deep convolution generative adversarial networks (DCGANs) to generate synthetic samples with real ones to train a YoloV4 detector model.As a result, using data augmentation improves the mAP by 4.54% compared to using only real samples.Similarly, [101] uses a generative adversarial network (GAN) to generate new images from TrashNet and GP-GAN [102] to generate collages from overlapping and no overlapping thrash objects.The authors evaluate waste classifiers using transfer learning from pretrained ImageNet models.The new images generated by the GAN could not be used due to a lack of details, and the inclusion of collages yielded no improvement to the classification models.
Finally, according to the confusion matrix and the evaluation per class presented in the reviewed studies, the complex categories for the models are identified based on the two lower-scored categories on either of the two metrics.The two most challenging categories, metal and trash, were each referenced in seven articles.

Overview of Results
The three reviewed areas, physical enablers, datasets, and ML algorithms, are closely related.ML models are trained and evaluated on datasets, and the type of data that both use depends on the sensors used to capture waste in the identification systems on segregation devices.The three areas and their underlying technology present interrelated connections, as seen in Figure 10.From left to right, the first three branches indicate the number of studies of each research question.The middle of the figure presents the models/techniques (in purple), processing devices (in green), and contexts in which the datasets have been used (in blue).Finally, the sensors used are presented.
The distribution of studies on these three areas is as follows: 47% corresponds to ML models, 31% to systems or machines, and the rest (22%) to datasets.The CNN was the most common ML technique (24 out of 26 articles concerning ML models).However, the proposition of custom model architecture prevails (11 studies) over standard state-of-the-art ones.Also, for the feature extractors, the popular choice was to use standard ones, among which the most popular was ResNet [83], which was used in 10 studies.
The most common prediction task of the ML models was image classification (20 studies), followed by object detection (13 studies) and, finally, image segmentation (4 studies).On image classification, the models were used to predict if a material is present on the image [10,51,61,103], or an object type [7,39], or if there was an object that belongs to the recycling category [53,58,94].Detection models were used to predict the rectangular area (bounding box) where an object was located on the image and their material type [13,56,84,87,96] or the object type [65].Lastly, segmentation models classify each image pixel as a specific material type [9,56] or as waste or background [49].
Most of the proposed models use visible-image-based sensor data (45), except for a few studies that used sound-based (4) and infrared-based sensor data (3).Thus, vision models were by far the preferred choice.Other technologies, like inductive, capacitive, and weight technologies, are also used as direct input for sorting the waste [8,40,43].
Datasets are closely related to ML models because they are required for training and evaluating their performance.However, only 12 of 39 are for general usage instead of built for a model or system study.The "general-purpose" datasets are image-based, and the most common type of labeled dataset was for image classification.
Moreover, when considering all the studies, the main categories are (see Figure 11) plastic (12.83%), glass (10.62%), metal (10.62%), and paper (9.29%)).Additionally, there are more than 50 categories.Some studies classify waste by more specific types of objects (e.g., plastic bottles) [36] or brands (e.g., Heineken Beer Bottle) [6].Other studies categorize objects by their type of processing after disposal (e.g., recyclable, organic, or hazardous) [7].The reviewed studies proposing indirect sorting machines (17) fall into two categories.In the first one, waste is directly separated in the consumer place.Nearly half of the studies (10) belong to this category.In the second one, the waste is separated in a centralized location.The three more common processing devices found were computers, Raspberry Pi, and Arduino (Figure 10).The systems present more variability regarding the sensing devices for sorting (eight types identified) than ML model studies, although, like in ML models and datasets, visible-image-based sensors are the most used.

Challenges and Limitations
This section presents the challenges and current limitations of the reviewed studies.In general, the three major issues found in the waste identification of automatic segregation systems are as follows: 1.
The laboratory testing: in many instances, the real-world applicability and complexity were not evaluated.

2.
Material identification is not enough for recycling: other inputs, such as product type and contamination, are required to define their recycling category.

3.
Visible-light-based approaches often result in errors due to the high similarity between materials.The majority of the proposed systems are location-specific, relying on the visual appearance of waste, which can vary significantly from one place to another.
The challenges related to each research question were clustered and are presented in the following sections.

Physical Enablers
The challenges presented in the studies related to physical enablers can be grouped into the following categories: Real-life complexity: Most devices were developed for controlled environments.Therefore, many challenges arise regarding the complexities of real-life scenarios, such as handling multiple wastes simultaneously, complex backgrounds, uncontrolled dirt, pack-ages with food, trash uniformity, garbage appearance variability, and more classification categories.Additionally, objects with the same brand and different materials could be misclassified by vision models.
Performance: Performance relates to two main topics.The first is the systems' execution speed and efficiency, which includes communication delays, response times, and real-time object detection capabilities.The second is reductions in hardware costs and energy consumption while improving sensors' quality for accurate measurements.
Extended functionality: The extended functionality of systems can increase the number of objects and classification categories and also improve the material identification by using sensor fusion (e.g., combinations of sound and image sensors).In addition, complementary functionalities are implemented, such as gas detection, grasp planning algorithms to handle various sizes, and IOT support, among others.Finally, strategies are developed for seasonal waste behavior and the continuous learning of waste categories.

Datasets
This section gathers the datasets' challenges and future works presented in sorting machines and ML models articles.The challenges are grouped into the following categories: Realism: Most studies were developed under controlled conditions.Therefore, in several articles, the idea of using more realistic datasets for future work is mentioned.It means it is necessary to construct datasets from real consumers that include more variability (e.g., deformations, dirt, rotations, and occlusions) and packaging with different characteristics.The surrounding conditions must also be close to real life regarding complex backgrounds and lighting.
Extension: Although some studies present detailed sorting categories (Figure 4), the median number of categories in the reviewed datasets is five.Thus, the reviewed articles propose to build datasets that include more detailed classification categories for future work.
Size of datasets: ML models require large data.Dataset building is a tedious and time-consuming task.A total of 11 out of 38 studies reviewed for ML models complain about the small dataset used for training (Figure 5).Thus, effective data collection is needed to build more extensive, balanced datasets.
Location targeted: The geographical location source of the waste is known in only eight of the reviewed datasets (40).Datasets must be targeted to specific geographical places because objects' appearance, brands, and waste distribution vary from place to place; however, approaches such as transfer learning can be used to reuse datasets from other contexts.

Machine Learning
The reviewed studies related to waste identification through ML techniques propose the following challenges: Algorithm improvements: Algorithm improvements can be related to model performance (accuracy and computational speed) and hyper-parameter optimization.Also, they can be related to the addition of new features, like waste size, material, and object types, as well as new ML tasks (e.g., detection or segmentation) and new ML and deep learning architectures.
Deployment: As the next step, many authors expressed an interest in testing the models' deployment on real devices, such as robots, mobile devices, or smart trash cans.However, it is necessary to consider the hardware cost, energy consumption, and execution performance.
Data fusion: Only two of the reviewed studies used sensor fusion to combine the strengths of the sensor technologies and mitigate their weakness.Some authors propose the study of the use of sensor fusion to extend the inference power of image-based models.For instance, objects with the same brand and different materials or the same material and different appearances are challenging for visual-based models.It could be solved by com-plementing the RGB image observation with information from other sensors (microphones, inductive, or other spectrums of electromagnetic radiation).
Data scarcity: The most common issue highlighted in the reviewed studies was the lack of enough training data and the unbalanced category distribution in the datasets (Figure 5).ML techniques need to be improved to be trained with small and unbalanced datasets in such a way that they are powerful enough to generalize with few observations.Waste complexity: Along with the data scarcity, waste comes with significant visual appearance variations and contexts with non-ideal conditions (e.g., insufficient light, object occlusion, and complex backgrounds).In addition, in real-life applications, discarded objects are subject to deformations, dirt, and mixed materials.Thus, datasets need to sample most of these situations, and ML models need to learn waste features invariant to most of these transformations.

Conclusions
An SLR was conducted to identify the physical enablers, datasets, and ML algorithms used for waste identification in indirect separation systems.The increasing waste production and stricter waste management policies have made technology-supported waste separation research increasingly relevant.The most common strategy for waste identification in the reviewed articles is to locally process images from visible light cameras using a CNN model.This model uses a state-of-the-art feature extractor pretrained on the ImageNet image classification dataset and fine-tuned with a target dataset.In most cases, the target dataset is a generic public dataset that is not tailored to a specific geographical location.The few articles that use another type of sensor typically use it for complementary functionalities.The main challenges of this configuration are the confusion of materials caused by the similarity between them and the scarcity of data.This is also related to the fact that most of the surveyed research was tested in laboratory conditions.Additionally, most systems impose constraints on how people dispose of the waste or the type of waste accepted by the system.This means that other challenges are related to handling real-life scenarios, such as complex backgrounds, handling multiple wastes, or waste with more variability.Future research could benefit from exploring topics such as low-cost sensor fusion and infrared (IR) waste detection; domain generalization enabling the utilization of generic datasets under customized conditions; the identification of product types and the assessment of cleanliness (as recycling criteria may depend on these variables); and the seamless integration of smart trash cans within the product life cycle.

Figure 1 .
Figure 1.Study selection process.Filtering of studies collected by research question.

1 Figure 3 .
Figure 3. Implementation context of the sorting machines reviewed distributed by waste source.

F
Ca rdb oa rd C om po un ds T e tr a p a c k E -w a s te G la s s B o tt le M e t a l A lu m in iu m C a n O r g a n ic ic a l N o n re c y c la b le

Figure 4 .
Figure 4. Dataset (public and private) labels.The relative size of the category indicates the number of labels.

Figure 6 .
Figure 6.Dataset source environment distribution by background type (with context or simple, with plain background).

1 Figure 7 .
Figure 7. Sensors used by ML models distributed by prediction task (classification, detection, or segmentation).

Figure 8 .
Figure 8. Surveyed architecture types with corresponding feature extractors.

Figure 10 .
Figure 10.Topics covered by this review.The thickness of the connections indicates the number of references.The first two columns from left to right indicate the number of articles.The remaining columns indicate the number of technologies and datasets used in each article.

Table 1 .
Literature reviews on automatic waste segregation.

Table 2 .
PICOC criteria used to frame the literature review search protocol.Asterisks (*) is a wildcard search that represents any set of characters.

Table 3 .
Search protocol summary.Query strings are presented using the PICOC criteria table ID.The proximity operator W/5 indicates that the keyword must be within five words of the next term.Asterisks (*) is a wildcard search that represents any set of characters.

Table 4 .
Public datasets for waste separation.Dataset annotation types: classification (Classf), segmentation (Segm), and detection (Detec).The label's distributions (Dist.)for categories of the datasets are balanced (Baln) or unbalanced (Unbl).In the categories column, the first number is the main categories and the second is the number of subcategories.

Table 5 .
Machine learning studies that used public datasets sorted by main performance metrics (α: dataset modified with different classes, annotation, or extended with more data).The metrics used are average accuracy (Acc.) for classification models, mean average precision (mAP) for detection models, and interception over union (IOU) for segmentation models.