Deep Learning-Based Small Object Detection and Classiﬁcation Model for Garbage Waste Management in Smart Cities and IoT Environment

: In recent years, object detection has gained signiﬁcant interest and is considered a challenging problem in computer vision. Object detection is mainly employed for several applications, such as instance segmentation, object tracking, image captioning, healthcare, etc. Recent studies have reported that deep learning (DL) models can be employed for effective object detection compared to traditional methods. The rapid urbanization of smart cities necessitates the design of intelligent and automated waste management techniques for effective recycling of waste. In this view, this study develops a novel deep learning-based small object detection and classiﬁcation model for garbage waste management (DLSODC-GWM) technique. The proposed DLSODC-GWM technique mainly focuses on detecting and classifying small garbage waste objects to assist intelligent waste management systems. The DLSODC-GWM technique follows two major processes, namely, object detection and classiﬁcation. For object detection, an arithmetic optimization algorithm (AOA) with an improved ReﬁneDet (IRD) model is applied, where the hyperparameters of the IRD model are optimally chosen by the AOA. Secondly, the functional link neural network (FLNN) technique was applied for the classiﬁcation of waste objects into multiple classes. The design of IRD for waste classiﬁcation and AOA-based hyperparameter tuning demonstrates the novelty of the work. The performance validation of the DLSODC-GWM technique is performed using benchmark datasets, and the experimental results show the promising performance of the DLSODC-GWM method on existing approaches with a maximum accu y of 98.61%.


Introduction
With the increase in smart video surveillance, facial detection, autonomous vehicles, and plenty of people counting applications, accurate and fast object detection methods are in increasing demand. Such a system involves classifying and recognizing all the objects in the image but localizing each one by drawing the proper bounding box around it [1]. This makes object detection a considerably more difficult process than its conventional computer vision (CV) predecessor, image classification. The current progression in the fields of deep learning (DL), image processing, and CV technologies has changed the thinking about different features of day-to-day lives [2]. The DL method has given a strong foundation for image detection with consistent accuracy [3]. The widespread image classification, the convolution neural networks (CNNs), is stimulated by biological neural networks that are comprised of different neurons and layers in every layer are closely linked to the neurons in the following layer [4]. The advantages of utilizing a CNN that provides independence between prior knowledge, the least amount of effort in design, and feature extraction. CNNs have made great achievements in image classification and recognition [5]. The accuracy and popularity of CNNs for the classification of images have improved because of the large-scale integrated systems for image processing and learning, the vast accessibility of public data of images, and higher-speed GPUs. The concept of smart waste classification using trash and waste images has great potential.
Because of fast urbanization, nowadays, cities are facing several problems. One of them is waste management systems, since the amount of waste is directly proportionate to the number of people who live in urban areas. The city administrations and the municipalities utilize conventional waste classification methods that are very slow, manual, costly, and inefficient [6]. Consequently, automated waste management and classification are indispensable for the city that is being urbanized for the improved recycling of waste. The simplification of the waste classification method is needed since the technologies have been rapidly growing and many manual tasks have been decreased by adopting artificial intelligence (AI) methods [7]. Generally, in the Indian context, waste consists of plastic, paper, metal, rubber, textiles, glass, sanitary products, organics, electronics, and electrical, infectious materials (hospital and clinical), and hazardous substances (paint, spray, and chemicals) [8], are widely categorized as biodegradable (BD) and non-biodegradable (NBD) waste, with a corresponding share of 52% and 48% [9].
Effective waste segregation could help in the appropriate recycling and disposal of this waste according to biodegradability. Therefore, the current era dictates the progress of a smart waste segregation method to suggest the abovementioned cause of ecological damage. Consequently, the segregation of waste has received considerable interest from academicians and researchers worldwide [10]. The proper organization and classification of waste into different classes (such as biodegradable, recyclable, organic, harmful, nonbiodegradable, and so on.) assists in the appropriate disposal and utilization of waste. For waste segregation, the CV technique might offer an efficient solution to separate out, identify, and classify the waste from the huge dumps of trash and garbage.
Kumar et al. [11] examine a method for waste segregation for its efficient disposal and recycling through a DL approach. The YOLOv3 algorithm was used from the Darknet neural network for training self-made datasets. The network was trained for six object types (such as glass, cardboard, paper, metal, organic, and plastic waste). Furthermore, for relative analysis, the detection process was implemented by utilizing YOLOv3-tiny for validating the capability of the YOLOv3. Nasrullah et al. [12] employed two deep 3D customized mixed link network (CMixNet) frameworks for lung nodule classification and recognition, correspondingly. Nodule classifications were implemented by using a GBM system on the learned features from the 3D CMixNet framework. Nodule detection was implemented by using fast RCNN on effectively-learned features from U-Net and CMixNet, such as the encoder-decoder framework. Hiary et al. [13] presented two-phase DL classifiers to differentiate flowers of a wide-ranging species. Firstly, the flower region is segmented automatically to permit localization of the minimal bounding box around it. The presented method is modeled as a binary classification in a full convolution network architecture. Next, a strong CNN classification is constructed to differentiate the variety of flowers.
Chen et al. [14] developed three post-processing methods to increase the benchmark Fast RCNN based on previous knowledge. Firstly, a filtering method is created to remove over-lapping boxes identified by Fast RCNN related to a similar tooth. Then, a NN system is performed to detect missing teeth. At last, a rule-based model based on a teeth number scheme is projected for matching labels of identified teeth boxes to alter outcomes that violate some intuitive rules. Vo et al. [15] presented a strong method using DNN for automatically classifying trash that is employed in smart waste sorter machines. First, it collects the VN-trash data set, which comprises 5904 images belonging to the following three distinct groups: medical, organic, and inorganic wastes from Vietnam. Then, we developed a DNN system for trash classification called DNN-TC, which is an enhancement of ResNeXt to optimize the performance of prediction.
Ahmad et al. [16] introduced a method called "double fusion" that optimally integrates various DL methods using score-level fusion and feature models. The double fusion system guarantees an enhanced contribution of the deep model by, initially, integrating the capability of earlier and latter fusion systems and a score-level fusion of the classification result attained by earlier and latter fusion models. Sheng et al. [17] designed a smart waste management method with LoRa transmission and a TensorFlow-based DL method. The presented method transmits the sensor data and Tensorflow implements real-time object classification and detection. The bin contains a number of chambers for segregating the waste involving plastic, metal, paper, and common waste compartments that are managed by the servo motor.
This study develops a novel deep learning-based small object detection and classification model for garbage waste management (DLSODC-GWM) technique. The proposed DLSODC-GWM involves the design of an arithmetic optimization algorithm (AOA) with an improved RefineDet (IRD) model for an effectual object detection process where the hyperparameters of the IRD model are optimally chosen by the AOA. In addition, the functional link neural network (FLNN) model is applied for the classification of waste objects into multiple classes. In order to demonstrate the significant performance of the DLSODC-GWM approach, a wide-ranging simulation analysis is carried out on benchmark datasets.

The Proposed DLSODC-GWM Technique
In this study, a new DLSODC-GWM technique has been developed for waste management systems in order to effectually detect and classify small garbage waste objects. The DLSODC-GWM technique involves three distinct subprocesses, namely, IRD object recognition, hyperparameter tuning, and FLNN-based object classification. During the object detection process, the AOA is applied to optimally select the hyperparameter values of the IRD model and thereby improve the detection efficiency. The detailed workings of these three modules are elaborated on in the following.

IRD-Based Object Detection Module
The improved RefineDet method using VGG16 as the central network [18] creates a sequence of anchors with distinct aspect ratios and scales in all the feature maps by utilizing the anchor generation method of RPN, and also attains a fixed number of objects bounding boxes afterward two regressions and classifications, the possibility of the occurrence of distinct groups in this bounding box. Eventually, the last regression and classification outcomes are attained by using non-maximal suppression (NMS). The enhanced RefineDet method is classified into the object detection module (ODM), the transfer connection block (TCB), and the anchor refinement module (ARM). A certain network framework has been demonstrated in Figure 1

ARM Module
It is largely comprised of the VGG16 and the convolutional layer. The ARM implements anchor generation, anchor refinement, negative anchor filtering, and feature extraction. Once the confidence of negative samples is superior when compared to 0.99, the module rejects and does not utilize them for the concluding detection in ODM. During the feature extraction process, two convolutional layers, that is, conv6_1 and conv6_2, are added to the VGG16 network. The negative anchor filtering efficiently filters out the negative anchor box using well-classification; furthermore, this improves the sample imbalances. Next, add the following four other convolutional layers: conv7_1, conv7_2, conv8_1, and conv8_2 to capture higher-level semantic data. In addition, the higher-level feature of conv8_2 is merged with the lower-level feature of conv7_2. Next, the combined feature is transmitted to the low-level feature through TCB, thus the low-level feature map is utilized for detecting high semantic data and improving the recognition performance of the floating object.

TCB Module
It is commonly utilized for connecting ODM and ARM and transferring the feature data of ARM to ODM. Additionally, akin to the framework of FPN, adjacent TCB is interconnected for realizing the feature of higher-and lower-level features and increases the semantic data of lower-level features.

ODM Module
It has largely consisted of the output of TCB and the predictive layers (regression and classification layers, that is., the convolutional layer with a 3 × 3 kernel size). In addition, the output of the predictive layer is the certain class of the refined anchor and the coordinate offset in relation to the refined anchor boxes. The refined anchor is utilized as input for regression and classification, and the last bounding boxes are chosen on the basis of NMS.

Loss Function
The IRD technique utilizes anchors that are intensively sampled on distinct scale feature maps as trained samples. Most of its samples are easy negative samples that do not comprise the object, and only some positive instances that comprise the object. As a result, the imbalance of positive as well as negative samples are magnified. Numerous

ARM Module
It is largely comprised of the VGG16 and the convolutional layer. The ARM implements anchor generation, anchor refinement, negative anchor filtering, and feature extraction. Once the confidence of negative samples is superior when compared to 0.99, the module rejects and does not utilize them for the concluding detection in ODM. During the feature extraction process, two convolutional layers, that is, conv6_1 and conv6_2, are added to the VGG16 network. The negative anchor filtering efficiently filters out the negative anchor box using well-classification; furthermore, this improves the sample imbalances. Next, add the following four other convolutional layers: conv7_1, conv7_2, conv8_1, and conv8_2 to capture higher-level semantic data. In addition, the higher-level feature of conv8_2 is merged with the lower-level feature of conv7_2. Next, the combined feature is transmitted to the low-level feature through TCB, thus the low-level feature map is utilized for detecting high semantic data and improving the recognition performance of the floating object.

TCB Module
It is commonly utilized for connecting ODM and ARM and transferring the feature data of ARM to ODM. Additionally, akin to the framework of FPN, adjacent TCB is interconnected for realizing the feature of higher-and lower-level features and increases the semantic data of lower-level features.

ODM Module
It has largely consisted of the output of TCB and the predictive layers (regression and classification layers, that is., the convolutional layer with a 3 × 3 kernel size). In addition, the output of the predictive layer is the certain class of the refined anchor and the coordinate offset in relation to the refined anchor boxes. The refined anchor is utilized as input for regression and classification, and the last bounding boxes are chosen on the basis of NMS.

Loss Function
The IRD technique utilizes anchors that are intensively sampled on distinct scale feature maps as trained samples. Most of its samples are easy negative samples that do not comprise the object, and only some positive instances that comprise the object. As a result, the imbalance of positive as well as negative samples are magnified. Numerous easy negative samples have minimum promotion effects on the concept from the entire trained procedure and simulate the final detection efficiency of the method. The focal loss function Appl. Sci. 2022, 12, 2281 5 of 18 (LF) was dynamic cross-entropy (CE) LFs. According to the novel function, the CE-LF was redesigned. The weight factor α and modulation factors (1 − p t ) y are established, in which the α factor balances positive as well as negative samples, and the modulation factor (1 − p t ) γ modifies the weight of easy and hard samples to make the method effort on hard examples under the trained procedure and enhance the accuracy. The focal LFs are as follows: where, where y refers the ground truth class labels containing if the sample was foreground or not. When it can be foreground, the label has 1, else it can be −1. Besides, p ∈ [O, 1] defines the type probabilities of foreground samples with label y = 1, for instance, the forecast possibility of a sample comprising the objects.
While the weight factor, α weakens the result of the easy sample loss value on the entire network loss and balances the effect of positive, as well as negative samples, on the networks.

AOA-Based Hyperparameter Tuning Module
Usually, as another MH technique, the AOA contains two search stages [19], namely, exploration and exploitation, which are simulated as mathematics functions such as −, +, * , and /. Initial, the AOA makes a group of N solution (agent). All but one demonstrates the agent to the tested problems. Therefore, the solution/agent signifies the X population, as follows: Afterward, the FF of all solutions is calculated to detect the optimum one X b . Next, based on the Math Optimizer Accelerated (MOA) value, the AOA executes exploration or exploitation procedures. At that time, MOA was upgraded as the following subsequent formula: where M t denotes the entire number of iterations. Min MOA and signifies the minimal and maximal values of accelerated functions, correspondingly. In particular, multiplication (M) and division (D) were used from the exploration stage of AOA as obtainable from the following subsequent formula: where e implies the smaller integer values, UB j and LB j stands for the lower and upper boundaries of the searching region at j th dimensional. µ = 0.5 refers the control functions. In addition, the Math Optimizer (M OP ) was explained as follows: α = 5 defines the dynamic parameter that resolves the precision of the exploitation stage throughout iterations.
Moreover, addition (A) and subtracting (D) functions were utilized for implementing the AOA exploitation stage utilizing the subsequent formula.
where r 3 implies the arbitrary number generated between zero and one. Then, the solution's upgrade method was executed utilizing the AOA operator. In summary, Algorithm 1 defines the important stages of the AOA.  (3) and (5)

FLNN-Based Object Classification Module
Next to the object detection process, the classification module using the FLNN technique is executed to allot distinct class labels. The FLNN is the part of FFNN without a hidden layer [20]. It takes non-linearity to their input design by extension unit. This unit assists as more input to networks and the sum of these units are calculated at the resultant layer. The extensive input to the network decreased the calculation cost and other sides, it can be enhanced by the calculation efficiency as connected to BP. In the meantime, the inherited property of the invariant effort the FLANN for picking up only the chosen signals that obtained optimum system identification. The second order FLNN infrastructure that is collected of three inputs, x 1 , x 2 , x 3 and with around higher-order combination. W o implies the tunable threshold and σ signifies the non-linear transfer functions.

Performance Validation
The proposed DLSODC-GWM technique is simulated using the Python 3.6.5 tool. The proposed model is tested using a PC MSI Z370-A Pro with an i5-8600k processor, GeForce GTX 1050 Ti 4 GB, 16 GB of RAM, a 250 GB SSD, and a 1 TB HDD. For experimental validation, a ten-fold cross-validation process is employed. The parameter setting is given as follows: batch size: 128, learning rate: 0.001, momentum: 0.  The heat map analysis of the objects that exist in the dataset is shown in Figure 3. Figure 4 visualizes the sample object detection outcomes of the DLSODC-GWM technique on the test images applied. From the figure, it is obvious that the DLSODC-GWM technique has identified glass, metal, and trash objects with the maximum accu y of 99%.
The confusion matrix generated by the DLSODC-GWM technique on the classification of waste under 1000 epochs is in Figure 5. The figure reported that the DLSODC-GWM technique has identified 372 images into cardboard, 472 images into glass, 380 images into metal, 570 images into paper, 463 images into plastic, and 107 images into trash.     The confusion matrix generated by the DLSODC-GWM technique on the classification of waste under 1000 epochs is in Figure 5. The figure reported that the DLSODC-GWM technique has identified 372 images into cardboard, 472 images into glass, 380 images into metal, 570 images into paper, 463 images into plastic, and 107 images into trash. The classifier results of the DLSODC-GWM technique on the classification of waste objects with 1000 epochs are given in Table 1 and Figure 6. The  The classifier results of the DLSODC-GWM technique on the classification of waste objects with 1000 epochs are given in Table 1 and Figure 6. The table values pointed out that the DLSODC-GWM technique has effectually recognized all the class labels. For instance, the DLSODC-GWM technique has categorized into images into cardboard with prec n , reca l , accu y , and F score of 95.88%, 94.66%, 98.50%, and 95.26%, respectively. On the other hand, the DLSODC-GWM technique has recognized images into paper with prec n , reca l , accu y , and F score of 96.77%, 97.60%, 98.66%, and 97.19%, respectively.  The accuracy outcome analysis of the DLSODC-GWM technique on the test data is portrayed in Figure 7. The results demonstrated that the DLSODC-GWM technique has accomplished improved validation accuracy compared to training accuracy. It is also observable that the accuracy values become saturated with the epoch count of 1000.
The loss outcome analysis of the DLSODC-GWM technique on the test data is depicted in Figure 8. The figure reveals that the DLSODC-GWM technique has denoted the reduced validation loss over the training loss. It is additionally noticed that the loss values become saturated with the epoch count of 1000.  The loss outcome analysis of the DLSODC-GWM technique on the test data is depicted in Figure 8. The figure reveals that the DLSODC-GWM technique has denoted the reduced validation loss over the training loss. It is additionally noticed that the loss values become saturated with the epoch count of 1000.   The confusion matrix generated by the DLSODC-GWM approach to the classification of waste under 2000 epochs in Figure 9. The figure described that the DLSODC-GWM methodology has identified 372 images into cardboard, 467 images into glass, 373 images into metal, 565 images into paper, 456 images into plastic, and 114 images into trash.  Table 2 and Figure 10. The table values pointed out that the DLSODC-GWM system has effectually recognized all the class labels. For instance, the DLSODC-GWM method has categorized images into cardboard with , , , and of 95.38%, 94.66%, 98.42%, and 95.02%, correspondingly. The classifier outcomes of the DLSODC-GWM algorithm on the classification of waste objects with 2000 epochs are provided in Table 2 and Figure 10. The table values pointed out that the DLSODC-GWM system has effectually recognized all the class labels. For instance, the DLSODC-GWM method has categorized images into cardboard with prec n , reca l , accu y , and F score of 95.38%, 94.66%, 98.42%, and 95.02%, correspondingly. In addition, the DLSODC-GWM approach has recognized images into paper with prec n , reca l , accu y , and F score of 95.28%, 96.75%, 98.09%, and 96.01%, correspondingly.  The accuracy outcome analysis of the DLSODC-GWM technique on the test data is illustrated in Figure 11. The outcomes exhibited that the DLSODC-GWM approach has accomplished improved validation accuracy compared to training accuracy. It can also be observed that the accuracy values become saturated with the epoch count of 1000. The accuracy outcome analysis of the DLSODC-GWM technique on the test data is illustrated in Figure 11. The outcomes exhibited that the DLSODC-GWM approach has accomplished improved validation accuracy compared to training accuracy. It can also be observed that the accuracy values become saturated with the epoch count of 1000. The accuracy outcome analysis of the DLSODC-GWM technique on the test data is illustrated in Figure 11. The outcomes exhibited that the DLSODC-GWM approach has accomplished improved validation accuracy compared to training accuracy. It can also be observed that the accuracy values become saturated with the epoch count of 1000.  The loss outcome analysis of the DLSODC-GWM technique on the test data is outperformed in Figure 12. The figure shows that the DLSODC-GWM technique has denoted the lower validation loss over the training loss. It is additionally observed that the loss values become saturated with the epoch count of 1000.     Figure 13 demonstrate the comparative accu y analysis of the DLSODC-GWM technique with recent methods. The results show that the AlexNet model has resulted in lower performance with accu y of 52.50%. In line with this, the VGG16 model has obtained a greatly improved accu y of 73.10%; whereas, the ResNet50 model has accomplished and even greater increased accu y of 74.70%. Though the MLH-CNN technique has resulted in a near-optimal accu y of 92.60%, the presented DLSODC-GWM technique has accomplished superior performance with accu y of 98.61%.   Table 4 and Figure 14 showcase the comparative , , and analysis of the DLSODC-GWM method with recent algorithms [22]. The outcomes demonstrated that the AlexNet approach has resulted in lower performance with , , and of 42%, 50%, and 44%, correspondingly. Followed by, the VGG16 system has achieved a greatly improved , , and of 69%, 68%, and 68%, respectively; whereas, the ResNet50 model has accomplished even increased , , and of 72%, 72%, and 72%, respectively.   Table 4 and Figure 14 showcase the comparative prec n , reca l , and F score analysis of the DLSODC-GWM method with recent algorithms [22]. The outcomes demonstrated that the AlexNet approach has resulted in lower performance with prec n , reca l , and F score of 42%, 50%, and 44%, correspondingly. Followed by, the VGG16 system has achieved a greatly improved prec n , reca l , and F score of 69%, 68%, and 68%, respectively; whereas, the ResNet50 model has accomplished even increased prec n , reca l , and F score of 72%, 72%, and 72%, respectively.  From the aforementioned results and discussion, it can be stated that the DLSODC-GWM technique has accomplished enhanced waste object classification performance compared to existing techniques.

Conclusions
In this study, a new DLSODC-GWM technique has been developed for waste management systems in order to effectually detect and classify small garbage waste objects. The DLSODC-GWM technique involves three distinct subprocesses, namely, IRD object recognition, hyperparameter tuning, and FLNN-based object classification. During the object detection process, the AOA is applied to optimally select the hyperparameter values of the IRD model and thereby improve the detection efficiency. For demonstrating the significant performance of the DLSODC-GWM technique, a wide-ranging simulation analysis is carried out on benchmark datasets. The extensive comparative analysis highlighted the superior outcomes of the DLSODC-GWM approach over existing approaches. Therefore, the DLSODC-GWM technique has the ability to proficiently identify and classify small objects in the waste management system. In the future, fusion of DL models can be employed to enhance the detection efficiency of the DLSODC-GWM technique.  However, the MLH-CNN approach has resulted in near-optimal prec n , reca l , and F score of 91%, 91%, and 91%, respectively. The presented DLSODC-GWM methodology has accomplished superior performance with prec n , reca l , and F score of 95.23%, 94.29%, and 94.73%, respectively.
From the aforementioned results and discussion, it can be stated that the DLSODC-GWM technique has accomplished enhanced waste object classification performance compared to existing techniques.

Conclusions
In this study, a new DLSODC-GWM technique has been developed for waste management systems in order to effectually detect and classify small garbage waste objects. The DLSODC-GWM technique involves three distinct subprocesses, namely, IRD object recognition, hyperparameter tuning, and FLNN-based object classification. During the object detection process, the AOA is applied to optimally select the hyperparameter values of the IRD model and thereby improve the detection efficiency. For demonstrating the significant performance of the DLSODC-GWM technique, a wide-ranging simulation analysis is carried out on benchmark datasets. The extensive comparative analysis highlighted the superior outcomes of the DLSODC-GWM approach over existing approaches. Therefore, the DLSODC-GWM technique has the ability to proficiently identify and classify small objects in the waste management system. In the future, fusion of DL models can be employed to enhance the detection efficiency of the DLSODC-GWM technique.