1. Introduction
According to the 2018 Revision of World Urbanization Prospects produced by the Population Division of the UN Department of Economic and Social Affairs (UN DESA) today 55% of the world’s population lives in urban areas, a proportion that is expected to increase to 68% by 2050 [
1]. One of the biggest problems of urbanization and industrialization is the extraordinary rise in the generation of municipal solid waste (MSW) all over the world [
2]. According to the World Bank’s review report [
1], in 2012, the global MSW generation levels were around 1.3 billion tons per year. This figure is expected to reach 2.2 billion tons per year by 2025. It is a global issue in terms of environmental contamination, social inclusion, and economic sustainability [
3], which requires integrated assessments and holistic approaches for its solution [
4]. Except this degradation in the environment we should also highlight the differences between developing big cities and rural areas, where management issues have a great variety, regarding the amount of waste generated and the SW management (SWM) facilities available [
5]. While MSW itself may have a negative influence on the economic development, recycling of MSW has the prospect of achieving economic and environmental benefits [
6]. In general, gross domestic product (GDP), population, and income are major factors. In recent years, some researchers have shown that goods consumption level is an important factor affecting the MSW volume [
7].
Solid waste management is inevitably linked to environmental impacts on the planet and their consecutive economic consequences [
8]. Globally, solid waste contributes to climate change and constitutes one of the largest sources of pollution in oceans. In low- and many middle-income countries, inadequate waste collection and uncontrolled dumping or burning of solid waste are still an unfortunate reality, polluting the air, water, and soil. When waste is burned, the resulting toxins and particulate matter in the air can cause respiratory and neurological diseases, among others [
8].
For a sustainable future, it is vital to develop an effective recycling system. Recycling plays an important role in environmental and economical aspects of our planet. In developing countries, MSW recycling relies on household separation via scavengers and collectors who trade the recyclables for profit [
9,
10] and the communities are more active in recycling program [
11]. Choosing the proper management of MSW provide s sustainable environment and reduce direct and indirect health risks for humans [
12]. Nowadays, researchers suggest to manage MSW a sustainable recycling method and waste reusing is required [
13,
14]. Using these methods, we reduce the amount of produced waste which causes multiple problems to the environment and at the same time they help the municipals or the waste managers to save significant financial, natural and energy resources [
15]. There are many different techniques, such as mechanical and chemical sorting, available in developed countries for automatic waste sorting [
16]. Nonetheless, there are many opportunities to improve the procedure of waste recycling mostly in developed countries, utilizing cutting-edge technologies. Moreover, a noteworthy metric is that the municipal recycling rates both in the USA and in the European Union are around 34% and 46,4%, respectively, which are significantly lower than the proposed recycling rate of 75% [
17,
18].
In the developing countries, the number of recycling rates are really low. For example, in Nepal about 5% of MSW are recycled [
19] and according to the publication of Asian Development Bank [
20] only six municipalities out of 58, use sanitary landfill sites for final disposal, and 45 are practicing open dumping, including riverside and roadside. Furthermore, having a lack of public awareness, and poor management by municipalities the environmental problems have intensified in towns in Nepal, including unsanitary waste management and disposal. Developing countries reside at a very low level of recycling compared to developed countries due to many reasons. Except the wealth of each country the most important factor is the citizens. The active participation of the citizens in the recycling procedure is the key to success combined with the application of different technical systems [
21]. There are many factors which are associated with the level of being active in the recycling process such as moral norms, information, and environmental concern [
22]. According to Babazadeh et al. [
23] there are several barriers that developing countries have to overcome, to involve the citizens in the source separation procedures of household waste. There is a need to increase citizens’ awareness and responsibility toward the source separation of wastes through applying various educational programs, such as environmental health campaigns and public education in local mass media.
The involved of citizens in the recycling process constitutes the key role to aim higher percentage of recycle. While the developed countries succeed to involve citizens in the recycling procession, the target they are far from the goal they have set [
17,
18]. Comparing the developed to developing countries the recycling gap is huge. Many researches have mention the relation between the habit of recycling and the Gross Domestic Product (GDP) of a country. Knowing that it takes decades to develop a country economically there is a need for a rapid solution. That’s why a sorting management system is necessary. Most of the municipal waste are not separated in source and end up to sanitary landfills. That’s why we need a more efficient system to distinguish the recyclable materials; and here lies our contribution. Utilizing computer vision and artificial intelligence (AI) we can detect, extract and sort those materials on a moving belt. Most of the current top-performing object detection networks use CNN (convolutional neural networks) features [
24]. Having a better automated system, helps us to send less recycle items to landfills. In this research, we propose a computer vision system having high accuracy detection using CNN, which is one of the most recognized deep learning algorithms in image classification, segmentation and detection [
25]. Recyclable material classification is a challenging problem and requires demanding techniques to obtain a reliable dataset. Except dataset collection, it is a necessity to provide a global approach for industrial applications. In this study, we present a more accurate and optimal waste classification technique.
The contributions of our study can be summarized as follows:
A high accuracy classification method to separate the recycle materials from the waste.
A cloud based architecture that provides financial and energy gains compared to similar localized operation.
A low-power and low-cost solution because we utilize small form embedded devices in every MSW site.
A system that if adopted can increase the recycling rate and thus impact positively the society and the planet.
The rest of the paper is structured as follows: In
Section 2, we compare our work with similar research while in
Section 3 is described the motivation for cloud offloading. In
Section 4, we describe the architecture of our system and interoperability of its core components. In
Section 5, we provide details about the dataset used, the preprocessing and training methods and report the experimental results of our approach.
Section 5 is about the Data Transmission and the way our system communicates and in
Section 5 we summarize our findings and conclude our article.
2. Related Work
MSW is an active multi disciplinary research topic having a wealth of research work. There are social and educational issues and different technological approaches. In this section we pinpoint the most relevant research on the topic of waste classification.
The research work of Özkaya and Seyfi [
26] performed a comparative analysis for classification of images in TrashNet data set where the proposed method used a fine tuned model. Despite the breakthrough implementation, the results show only 83.43% accuracy which is lower than our method.
Chu et al. [
27] suggest a multilayer hybrid deep-learning system (MHS) to automatically sort waste disposed of by individuals in the urban public area. This system deploys a high resolution camera to capture waste image and sensors to detect other useful information. The MHS uses a CNN based algorithm to extract image features and a multilayer perceptions (MLP) method to consolidate image attributes and other characteristics information to classify wastes as recyclable or not. The MHS is trained and validated against the manually labeled items, achieving overall classification accuracy higher than 90%. In this study the researchers analyzes a total of two classes recyclable and others.
In the research work of Ruiz et. al [
28] were used several CNN architectures for the automatic classification of waste: VGG-16, VGG-19, ResNet, Inception and Inception-ResNet. The experiments were performed on the TrashNet dataset, where the best classification results achieved using a ResNet architecture with 88.66% of average accuracy.
Ahmad et al. [
29] proposed an intelligent fusion of Deep Features for improved Waste classification. Utilizing deep architectures, namely AlexNet, GoogleNet, VggNet and ResNet pre-trained on ImageNet, classified the waste at six classes. The best accuracy is 95.58% using the Double Fusion (PSO) method which is worst than ours.
Thung and Yang [
30] designed two models, one that supports vector machines (SVM) with scale-invariant feature transform (SIFT) features and a convolutional neural network (CNN). The accuracy rate for SVM is 63% and 23% for CNN.The target is to take images of a single piece of recycling or garbage and classify it into six classes consisting of glass, paper, metal, plastic, cardboard, and trash. Similar works as performed by Mittal et al. [
31] who deployed a project to detect whether an image contains garbage or not. This project employs the pretrained AlexNet model and achieves a mean accuracy of 87.69%. However, this project aims at segmenting garbage in an image without providing functions of waste classification. In our model, the accuracy is 96.57%. In this work the researches divide the waste in two classes garbage and non-garbage.
An automated system proposed by Costa et al. [
32] based on deep learning approach and traditional techniques by aiming the correct separation of waste in four different recycling categories: Glass, metal, paper and, plastic. The researchers concluded that CNN approaches tends to be more computational expensive than the traditional techniques but requiring better computational resources and the experimental results using VGG-16 methods reached 93% of accuracy in its best scenario which is lower than our algorithm. Moreover, the classification is made in four classes: Glass, paper, metal and plastic.
Compared with the related work, our architecture achieves excellent classification accuracy of 97% on five different classes of waste. The proposed system is an end-to-end solution that can identify and provide metrics of industrial stations in real-time via an online platform. The system can achieve 76 frames per second on our modern hardware and is capable of realtime inference on at least four industrial stations.
The main novelty of the proposed system architecture is the computation offloading to the cloud. The usage of cloud for waste classification minimizes the implementation cost at larger scales where multiple waste collection facilities use the cloud server. Because of this, on-site hardware requirements at the waste collection facilities are very low compared to the traditional methods. This makes the proposed system a feasible solution at larger scales. The usage of the cloud, however, is also the biggest limiting factor of the system. Since the proposed system architecture utilizes cloud servers for the classification process, it is evident that the embedded systems located at waste collection facilities require an always-on internet connection. This is usually not a problem, however, because most waste collection facilities already have high speed broadband internet connections. Moreover, the physical distance between the location of the cloud server and each waste collection facility should be as short as possible in order to minimize the transmission time of the captured images and the responses. If the transmission time between them is very long, the response of the cloud server containing the result of the classification process may reach the waste collection facility after the waste item has reached the end of the moving trash belt.
3. Motivation for Cloud Offloading
When designing a system architecture, the designers of the system have to make some decisions in order to satisfy the requirements of the project while minimizing the cost and the engineering complexity. In this paper we have presented a distributed cloud based waste management classification system. The design choice of our system to utilize computational offloading to the cloud, in order to achieve a lower overall energy and lower cost of fixed assets was achieved after we had performed a scrutiny of energy and financial sustainability analysis, as its advantages outweigh its disadvantages.
As technology has been evolving in the recent years, there is an increased production of data, mainly because of IoT devices, such as mobile phones, smart sensors and embedded systems. New applications such as augmented reality, artificial intelligence and machine learning are emerging and attracting users. These types of applications, however, are usually data or compute intensive and require high resources and increased energy consumption. IoT devices are known for the resource scarcity, having limited computing power and battery life. The tension between compute/data intensive application and resource constrained devices hinders the successful adaption of emerging paradigms. Such a paradigm is the computational offloading to the cloud, which has been very popular in the recent years. Computational offloading to the cloud expands the capabilities of devices with limited resources by performing data and computational intensive calculations on the cloud. Computational offloading has some distinct disadvantages, such as the increased engineering complexity, the requirement of an always on internet connection and of course the cost of the cloud servers. However, in some situations like the system architecture proposed in this paper, the benefits of computational offloading to the cloud outweigh its disadvantages. The most important advantage of using the cloud is the low overall cost the system and the low cost of fixed assets. The Raspberry Pi 3B+ based embedded devices in waste collection facilities cost only a fraction of the hardware that would otherwise be required, for example an Intel(R) Xeon(R) CPU E5-2630 v4 @2.20 GHz along with 2x Tesla K40m GPU. Besides hardware, software licensing is also a cost that needs to be taken into account for every site. For example, the Red Hat Enterprise Linux Server that we used for our experiments is a proprietary operating system that is licensed per device. It is evident that at larger scales where waste classification is performed in many waste collection facilities, sending data to the cloud server for classification is much more efficient and keeps minimized the cost. Another advantage of utilizing the cloud is the ease of reconfiguring the classification algorithm. Because all the computations are being offloaded to the cloud, changes to the classification algorithm only have to be made once and not for each waste collection facility separately. Because of all the aforementioned advantages of computational offloading to the cloud, the design choice to implement it in the proposed system architecture was clear.
In order to further increase the execution speed and efficiency of the proposed system, a possible solution is to implement the classification algorithm on an FPGA board, such as the Zynq-7000 ARM/FPGA SoC. The popularity of FPGA boards has been steadily increasing over the last decades. FPGA offers significant advantages over software running on a CPU or GPU, including execution speed and energy efficiency. In the proposed system, the FPGA board could be located in the cloud along with a high performance traditional server. As the server would receive images for classification from the embedded systems located in waste collection facilities, the server could offload the classification process to the FPGA boards. Many boards support PCIe connectivity, with a maximum theoretical bandwidth of 15.75 GB/s in version 3.0. The main disadvantage of FPGA boards, however, is the increased engineering cost and complexity. The development process on FPGA boards includes the design of custom hardware circuits. Traditionally, these hardware circuits are described via Hardware Description Languages (HDL), such as VHDL and Verilog, whereas software is programmed via one of a plethora of programming languages, such as Java, C and Python. An upcoming trend on FPGA is High Level Synthesis (HLS). HLS is allows the programming of FPGAs using regular programming languages such as OpenCL or C++, allowing for a much higher level of abstraction. However, even when using high level languages, programming FPGAs is still an order of magnitude more difficult than programming instruction based systems. Accelerating the classification process via FPGA boards is only efficient at larger scales, where the performance gain of the system outweighs the increased complexity and engineering cost.
4. System Architecture
To improve the waste classification efficiency, we present a multi-layered cloud supported architecture, depicted in
Figure 1. The proposed system consists of several embedded devices distributed in different waste collection facilities that capture photos of waste in real time and send them to a cloud server for classification. The goal of our work is to instruct in real time a distributed number of robotic systems to extract and sort the materials in proper bins. The robotic system is out of the scope of this paper and will not be presented.
The embedded devices located in waste collection facilities can have different hardware configurations and can differ with each other. The only requirement is that they have a camera module attached and a high speed Internet Connection. The embedded devices are low-cost and low-power Single Board Computers (SBC). In our prototype, we used an embedded system that consists of a Raspberry Pi 3B+ and a Raspberry Pi camera module version 2.1. The Raspberry Pi 3B+ board is very suitable for our prototype needs for three reasons: (a) It has very low cost, which is necessary for our project, (b) it is very energy efficient since it consumes less than 200 mA under normal load and (c) it has a very small form factor (Dimensions: 65 mm × 30 mm × 5 mm). It is equipped with a 1.4 GHz quad-core ARMv8 CPU (BCM2837B0) and has integrated WiFi (802.11n wireless LAN) and Bluetooth 4.0. The computing power of the Raspberry Pi 3B+ is more than enough for our project needs, since the waste classification will only be performed on our server on the cloud. The other main component of our prototype is the camera module version 2.1 which features an ultra high quality 8 megapixel Sony IMX219 image sensor (up from 5 megapixel on the version 1 camera board), and a fixed focus camera lens. The version 2.1 camera module is capable of capturing pixel static images up to 3280 × 2464 resolution and also supports video capturing in three configurations, 1080p at 30 Frames Per Second (FPS), 720p at 60 FPS and 480p at 90 FPS. Of course, instead of this embedded system, we can utilize any x86, x86-64 or ARM Cortex system that operates with Linux, because our software is not architecture specific and can be ported easily to various platforms.
The server which is located on the cloud hosts the neural network that processes images and classifies them based on their characteristics. The five categories of waste are: Paper, glass, plastic, metal, carton and garbage. Because images may contain sensitive information, all data transmitted between the cloud server and the embedded devices is encrypted. The proposed system uses the AES (Advanced Encryption Standard) algorithm in order to encrypt and protect all the transmitted data from malicious attackers. The AES algorithm is currently supported on all the popular Operating Systems (OS), like Windows, FreeBSD and Linux distributions.
The cloud server that we are using for processing offloading to the cloud uses the open source operating system Red Hat Enterprise Linux Server 7.4 x86_64. Using PHP and MySQL, we have developed a web platform that allows supervisors to monitor and reconfigure the system in an easy and efficient manner. The platform allows supervisors to view the results of the classification process, fine tune the classification algorithm or even retrain the algorithm using a different dataset. The platform also shows real time information about the embedded systems, such as CPU load, CPU temperature, bandwidth metrics and more. Finally, by using HTTPS, communication between the client’s browser and the platform is always encrypted, minimizing the risk of Man In The Middle (MITM) attacks.
The software architecture of our system consists of 5 stages: (a) In the first stage, the embedded system installed in the waste processing facility opens a TCP stream to the cloud server. This TCP stream is used to send the captured images for processing. In the second stage (b), the embedded system captures an image of the waste item as it moves on the trash belt. In order to capture the image, an image capturing library is used, like fswebcam, streamer or ffmpeg. The image is then converted to base64 encoding, and it is being serialized in a JSON along with an ID tag and a timestamp. Because the classification algorithm can process multiple images simultaneously, the order of which they will be processed by the classification algorithm is not sequential. Therefore it is important to identify images with an ID and a timestamp. In the third stage (c), the image is being transmitted to the cloud server. When the transmission is complete, the cloud server deserializes the JSON and extracts the image along with the ID and timestamp. Then, (d) the image is being processed by the waste classification algorithm and is classified in the appropriate category. Finally (e), the server sends back to the embedded system the result of the classification using the same TCP socket. The embedded system can then feed a potential robotic mechanism with this information that moves the waste item to the appropriate bin, completely automating the waste separation process.
5. Classification
In the following sections, we provide details about the dataset used, the preprocessing and training methods and enlist the experimental results of our approach. The experiments are conducted using the fast.ai framework [
33] which utilizes PyTorch [
34] as the backend. The programming interface is Project Jupyter [
35].
5.1. Dataset
Detection of the recycling materials is very important for humanity and civilization. In addition, the value of the world ecosystem and its importance for a livable world is an undeniable fact in today’s world. In this work, we aim on classifying the recycling of materials such as glass, paper, cardboard, and metal. The TrashNet [
30] dataset was used for training the models. This dataset contains paper, glass, plastic, metal, carton and garbage subclasses (
Figure 2). The images on this dataset consist of photographs of garbage taken on a white background. The different exposure and lighting selected for each photo include the variations in the dataset. Each image was resized to 512 × 384 pixels and the original dataset is nearly 3.5 GB in size. TrashNet contains 2527 images in total. The content of the dataset is as follows: 594 paper, 501 glass, 137 trash, 410 metal, 482 plastic, 403 cardboard. For this work, 70% of all images were used for training, 17% for testing, and 13% for validation. For more information about the dataset and the CNN implementation by Yang et al., refer to [
30].
5.2. Training
To build our classifier we are using MobileNet-V2 model from the torchvision. Furthermore, we enabled CUDA and wrapped the model to the DataParallel architecture in order to exploit the parallelism of multiple GPUs. MobileNet is a model recommended by the Google research team that consists of in-depth separable convolutions initially used in the Inception model [
36]. The reason for this is to reduce the number of the first computations in the first layers.
5.2.1. Data Augmentation
For data augmentation, we are using traditional transformations. A combination of affine transformations to manipulate the training data is performed for each image in the dataset. Duplicate input images are generated, that is shifted, zoomed in/out, rotated, flipped, distorted, or shaded with a hue. Both image and duplicate are fed into our model. For a dataset of size N, we generate a dataset of size for training and evaluating our model.
5.2.2. Optimal Learning Rate
Learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. A too small learning rate (LR) can cause overfitting. With a larger learning rate, the training can be normalized; however, too large of a learning rate can cause diverged training. Therefore, one solution could be finding converging or diverging learning rates using a short-run grid search. An easier way is to select the learning rate using methods like the learning rate range test (LR range test) and the cyclical learning rates (CLR).
In the LR range test, we start with a small learning rate that linearly increases during pre-training. This provides feedback on both the network’s maximum learning rate and its effectiveness to be trained over a series of learning rates. Choosing a small learning rate to start causes the network to converge and, ultimately, enlargen, leading to decreased accuracy and increased test/validation loss. This learning rate is the largest value used for the maximum bound with cyclical learning rates, yet in case of a constant learning rate, a smaller value is preferred, otherwise the network will not converge. The minimum bound of a learning rate can be selected by: (1) A factor of 3 or 4 less than the maximum bound, (2) a factor of 10 or 20 less than the maximum bound when using only one cycle, (3) testing many iterations with a few initial learning rates and choosing the largest one that will allow the training to converge without causing overfitting (in case of a large initial learning rate, the training will not converge). On the other hand, to use the CLR method, both the step-size and the minimum and maximum rate boundaries need to be specified. Specifically, the step-size designates the epochs (number of iterations). In each cycle there are two steps of this kind: (1) Linear increase of the learning rate from minimum to maximum, and (2) linear decrease. In his study, Smith [
37] tried out lots of ways to diversify the learning rate among the two boundary values and found them to be corresponding. Therefore, he suggests letting the learning rate change linearly, which is the simplest way found. Another research by Jastrzebski et al. [
38] recommends using discrete jumps for acquiring similar results. Using the above methodology, we found that the optimal learning rate for the optimizer is
(
Figure 3).
5.2.3. Training MobileNet
Even though the MobileNet was structured on depth-wise distinct convolutions, its first layer is full convolution; therefore, the depth-wise convolution is performed after the full one. MobileNet offers high accuracy rates exploiting only a few hyperparameters, rendering it a fast training model that uses fewer resources.
We trained the model for 20 epochs. The learning rate decreases with each epoch, allowing us to get closer and closer to the optimum. At the last epoch the validation error decreases to about 5%, which is ideal. However, we need to test our model in the test data (
Section 5.3) to verify the results.
5.3. Experimental Results
The environment for the experiments consist of a cloud dedicated server Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20 GHz with a TDP rated to 85 W (Thermal Design Power). The operating system is the Red Hat Enterprise Linux Server 7.4 x86_64. The system utilizes 2x Tesla K40m GPUs for acceleration. The TGP (Total Graphics Power) of each GPU is rated to 235 W. The computational cost for the MobileNet model per frame is about 16 ms. The server under full usage can achieve about 76 frames per second. Therefore, we can support at least 4 industrial stations, with a maximum framerate of 19 fps (realtime). An experimental inference of the Inception-V4 model, achieved a maximum of 22 fps. Therefore, the use of MobileNet effectively maximize accuracy while being mindful of the computational resources.
As shown in
Table 1 and
Table 2, our trained model surpasses more complex fine-tuned models compared with [
39]. Data augmentation techniques and hyper-parameter tuning had a significant impact on the final results.
As illustrated in the confusion matrix (
Figure 4), the model confused mostly glass with metal and plastic. Due to the glass material’s transparency and reflection, we consider it as the most difficult to classify, without information such as weight and surface properties, even by a human being. Some samples of the false predicted data are illustrated in
Figure 5.
6. Conclusions and Future Work
Over the last years, the generation of municipal solid waste (MSW) constantly increases. According to the World Bank’s review report [
1], it is expected that MSW will reach 2.2 billion tons per year by 2025. It is evident that recycling is the only viable solution to this problem. The process of recycling, however, requires that the waste materials are separated. The separation of waste materials is a time consuming process that is currently being performed by the hands of workers. As an attempt to solve this issue, in this paper we present an innovative solution for waste classification in waste collection facilities. By utilizing computation offloading to the cloud, we can achieve low on-site implementation cost and complexity. Our work utilizes computer vision technologies along with a Convolutional Neural Network (CNN) in order to detect the recycled materials on a moving trash belt. The CNN classifies the waste materials in five categories: Paper, glass, plastic, metal, carton and garbage. The main purpose of this project is to solve the problem of non-segregated waste, which exists more to developing and developed countries. The results of our experiments are very encouraging, as the accuracy of the CNN is 96.57%. The next step of this research is to experiment with federated learning techniques and investigate how they could increase our system’s performance. The final goal of this research, however, is to develop a robotic system that could be installed at waste collection facilities and move waste items to the appropriate container based on their type autonomously, without any worker in the loop. We are confident that this research could help other researchers who attempt to give a solution to waste management.