Pose Detection and Recurrent Neural Networks for Monitoring Littering Violations

Husni, Nyayu Latifah; Felia, Okta; Abdurrahman,; Handayani, Ade Silvia; Pasarella, Rosi; Bastari, Akhmad; Sylvia, Marlina; Rahmaniar, Wahyu; Seno, Seyed Amin Hosseini; Caesarendra, Wahyu

doi:10.3390/eng4040155

Open AccessArticle

Pose Detection and Recurrent Neural Networks for Monitoring Littering Violations

by

Nyayu Latifah Husni

¹,

Okta Felia

¹,

Abdurrahman

¹,

Ade Silvia Handayani

^1,*,

Rosi Pasarella

²

,

Akhmad Bastari

³,

Marlina Sylvia

³,

Wahyu Rahmaniar

^4,*

,

Seyed Amin Hosseini Seno

⁵

and

Wahyu Caesarendra

^6,7

¹

Department of Electrical Engineering, Politeknik Negeri Sriwijaya, Palembang 30139, Indonesia

²

Department of Computer Engineering, Faculty of Engineering, Universitas Sriwijaya, Indralaya 30862, Indonesia

³

Palembang City Public Works and Spatial Planning Department, Ilir Timur Dua, Palembang 30114, Indonesia

⁴

Institute of Innovative Research, Tokyo Institute of Technology, Yokohama 226-8503, Japan

⁵

Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Azadi Square, Mashad 917794897, Iran

⁶

Faculty of Integrated Technologies, Universiti Brunei Darussalam, Tungku Link St., Gadong BE1410, Brunei

⁷

Faculty of Mechanical Engineering, Opole University of Technology, 76 Proszkowska St., 45-758 Opole, Poland

^*

Authors to whom correspondence should be addressed.

Eng 2023, 4(4), 2722-2740; https://doi.org/10.3390/eng4040155

Submission received: 25 August 2023 / Revised: 24 October 2023 / Accepted: 27 October 2023 / Published: 30 October 2023

(This article belongs to the Special Issue Artificial Intelligence and Data Science for Engineering Improvements)

Download

Browse Figures

Versions Notes

Abstract

:

Infrastructure development requires various considerations to maintain its continuity. Some public facilities cannot survive due to human indifference and irresponsible actions. Unfortunately, the government has to spend a lot of money, effort, and time to repair the damage. One of the destructive behaviors that can have an impact on infrastructure and environmental problems is littering. Therefore, this paper proposes a device as an alternative for catching littering rule violators. The proposed device can be used to monitor littering and provide warnings to help officers responsible for capturing the violators. In this innovation, the data obtained by the camera are sent to a mini-PC. The device will send warning information to a mobile phone when someone litters. Then, a speaker will turn on and issue a sound warning: “Do not litter”. The device uses pose detection and a recurrent neural network (RNN) to recognize a person’s activity. All activities can be monitored in a more distant place using IoT technology. In addition, this tool can also monitor environmental conditions and replace city guards to monitor the area. Thus, the municipality can save money and time.

Keywords:

activity monitoring; artificial intelligence; IoT; machine learning; pose detection; recurrent neural network

1. Introduction

The increase in infrastructure development in Indonesia is not commensurate with the level of public awareness to maintain it [1]. In some areas, the government’s newly built buildings are often quickly damaged [2,3]. One example is in Palembang, South Sumatra, Indonesia, where littering is a significant problem in maintaining infrastructure. Littering has become a major problem in Indonesia and various parts of the world. Casually discarding garbage is often perceived as normal behavior because it has been a habit for many years. Undeniably, many people lack awareness about protecting their surrounding environment [4].

Indiscriminate waste disposal or littering occurs on land and in waters such as rivers, beaches, and seas [5]. The careless disposal of waste damages the environment, harms health, causes flooding, and leads to many other adverse effects [6]. Therefore, to overcome those problems, a monitoring device for littering is urgently needed. The proposed device warns, records, and reports to users when it monitors anomalies such as: (i) inappropriate human activities, (ii) environmental conditions, and (iii) water levels that may indicate a flood hazard.

In general, the proposed device not only functions to monitor people who litter but can also indirectly be a way to catch the perpetrators. At the start of the process, detected actions are grouped into two categories: littering or normal. When the system confirms that rubbish has been thrown away carelessly, the system will generate warnings and notifications. This warning is sent to the speaker as an output and to the responsible authority. In this research, this response was followed up by the maintenance team of the Public Works and Spatial Planning Department (or PUPR) of Palembang City.

Furthermore, authorities can access the proposed device data for either monitoring or reporting. They can review recorded data to identify patterns or areas that frequently experience littering problems. Catching violators who throw rubbish carelessly provides many advantages that can offer significant benefits to the city government, as follows:

(1): Reducing cleaning costs, because in a day, Palembang City PUPR has to deploy a lot of workers to clean the Sekanak area, so a low amount of rubbish means a low number of workers;
(2): Reducing maintenance and repair costs associated with environmental conservation and restoration;
(3): Increasing the aesthetic appeal of public spaces, thereby increasing property values in those areas, which also increases tax revenues for the city government;
(4): Improving public health: handling litter is very important because littering tends to increase the number of pests;
(5): Change society’s perspective: the proposed device is an effort to prevent littering, including public awareness campaigns and law enforcement actions;
(6): Produces valuable data and insights regarding littering behavior patterns, which enables evidence-based decision-making for resource allocation and policy development;
(7): Automate monitoring processes that minimize the need for continuous human supervision, which can reduce expenses related to law enforcement personnel. Additionally, municipal governments can earn revenue through fines and penalties imposed on violators who litter.

This research was initiated to support Indonesia’s development in the priority sectors of the digital economy, where the world is connected to internet infrastructure. The monitoring device proposed in this study utilizes IoT-based smart village technology [7] and recurrent neural networks (RNNs) [8]. RNNs are used to classify activities carried out by humans. This research also uses image processing technology that can be used to monitor the condition of the water surface so that the device can also provide a notification when there are indications of an impending flood. In addition, the proposed device also has other advantages, as follows:

(1): It can monitor air quality and provide information to the user if there is poor-quality air.
(2): It monitors the temperature, humidity, and water level. IoT technology allows for devices to be accessed by many people from different places using different devices.

In the proposed method, an RNN and pose detection were used to recognize the positions of elbows, hips, hands, knees, and feet to form a body pose framework by combining these points and predicting an estimate of the detected pose. The advantage of using an RNN is that it can process data sequences, such as several video frames [9]. The main idea of RNNs is that they can share parameters across multiple parts of the model. Although newer architectures such as LSTM [10] and GRU [11] have been introduced to overcome the limitations of traditional RNNs, RNNs have their advantages. Littering, as an activity, has evolved over time: someone approaches, performs the action, and then leaves. This sequence is critical to differentiate between littering and other non-hazardous activities. RNNs basically process sequences, so they are well-suited for this task. Additionally, RNNs are relatively more straightforward and computationally efficient than LSTM and GRU. For real-time applications like the proposed monitoring device, computing speed can be essential.

2. Related Work

Several previous studies have discussed the introduction of human activity, as summarized in Table 1. The application of activities can be performed indoors, outdoors, or both of them.

Pose detection is a computer vision technique that can predict the footprints and location of a person or object. The combination of poses and orientations is used to find specific key points of the person or object so that they can be combined to identify a person’s pose [26]. The proposed method estimates human poses when disposing of trash and other poses, such as walking and running [22]. Thus, the system can distinguish the results of the detected poses. RNNs combined with pose detection in deep learning models are used to recognize elbows, hips, hands, knees, and feet, thereby forming a skeletal pose by combining these points. Furthermore, the detected poses can be estimated and recognized.

The human body has a complex framework comprising many limbs and joints. In studying the framework of the human body, special knowledge is needed to identify the right area. Previous studies have carried out the recognition of human body postures that can be detected by sensors [12], such as cameras and videos, in a dark room [18,28]. In this paper, the detection begins with 2D or 3D pose evaluation, which includes several aspects: (1) Estimation of human poses, such as sports scenes, people facing forward, people interacting with objects; (2) Estimation of poses in group photos; and (3) Estimation of poses of people performing synchronized activities [20]. Other studies have carried out pose recognition for activities such as standing, sitting, jumping, running, talking, picking up the phone, yoga, smoking, walking, fighting, and so on [12,14,15,16,17,18,19,20,21]. However, to obtain the right detection results, special assumptions are needed. For example, in biomechanical research in robot rehabilitation environments [13], researchers must also pay attention to complex and non-deterministic environments.

Therefore, several studies captured human movement to obtain better results. The classification of body posture construction using the K-NN method, although more straightforward than the application of facial recognition, has accurate results [25]. Deep learning [22], multi-scale temporal features, spatio-temporal KCS pose differentiation, and occlusion data augmentation [29] have been used for the 2D to 3D development of human pose estimation [30,31]. Other methods use attention models [32] and multi-scale networks with phase inference optimization [33], introducing many parameters requiring manual tuning. The performance of the graphical model-based approach has been surpassed by convolutional neural networks (CNNs) [31,34].

Another method involved analyzing spatial/temporal convolution graphs with fixed human joint affinities and explaining how to design dynamic human joint affinities. Then, we integrated dynamic spatial/temporal graph convolution (DSG/DTG) to build a dynamic graph network (DG-Net). The proposed method can be used to predict 3D human poses in videos to create effective co-relationships, reducing spatial and temporal ambiguity caused by complex variations of pose variations in videos [35]. In addition, we developed the results of several previous studies using deep learning methods [36,37] and recurrent neural networks (RNNs) [13,19] to identify human poses.

Pose recognition in this research aims to analyze how the system can detect approximate human poses when throwing out trash and identify differences in estimated poses for humans who are not throwing out rubbish, such as while walking, running, and so on. This is intended to make it easier for the system to distinguish the estimated activities that will be detected.

3. Materials and Methods

3.1. Hardware

The hardware connection in this study, as shown in Figure 1, consists of three sensors: DHT22, MQ7, and JSN SRT04. There are two main sub-systems on the input side: Arduino and Personal Computer (PC). Arduino receives input from these sensors, which are processed to measure humidity, air quality, and water level. Data from these sensors are displayed on the LCD and sent to the PC. The image from the webcam is processed for the second processor (mini-PC). By using the RNN method, the device can identify violators of littering regulations. The device will warn by emitting the sound: “Do not litter”. In addition, the output is also connected to the cloud via Wi-Fi. Therefore, the entire output of this process can be monitored using various computer and mobile devices.

3.2. Software

This section describes the steps to prepare the software for the proposed device. IoT technology in this research allows data transmission to the cloud or central server. Data from IoT technology are stored and processed in a cloud-based platform that can be accessed by anyone, anywhere, with an internet connection. Therefore, IoT allows remote personnel to view the monitored area in real-time and respond to incidents as they occur. IoT can also help officers respond more quickly so that more frequent acts, such as vandalism, can be reduced. IoT remote monitoring systems also encourage transparency and accountability in enforcing littering laws. Video evidence captured by cameras can be used to verify violations and support legal action. Remote access also allows administrators to adjust system parameters, such as detection sensitivity, alert thresholds, and monitoring schedules, without physically being at the location being monitored.

3.2.1. Dataset

The dataset in this research was obtained by recording activities in various places, such as parks, riverbanks, roads, rooms, and buildings. This dataset includes indoor and outdoor object activity. This activity was recorded for 2 to 6 min, with two types of activity, i.e., littering or not littering (just passing through the area or, in this study, normal). The dataset obtained in this research was combined with the dataset from previous research [9]. All videos were extracted into multiple images and labeled as “littering” and “normal”.

3.2.2. Pre-Processing

In this step, the pose landmarks in the input image are detected so that 258 key points are obtained, as shown in Figure 2. Detections are made on images for “littering” and “normal activity”.

Key landmark results obtained in each frame are collected into 3D arrays (60, 60, 258). Array (0) is the number of videos that will be used in the training process. Array (1) is the number of frames contained in one video. Array (2) is the number of pose landmark points generated. The landmark pose consists of 33 key points across the body and 42 key points on the hands. Each of the 33 key point landmarks consists of X-, Y-, and Z-axes and visibility. The 21 key points landmarks on each hand have several components: X-, Y-, and Z-axes. Therefore, the total variable can be calculated by

(33 k e y p o i n t s \times 4 c o m p o n e n t s) + (2 \times (21 k e y p o i n t s h a n d \times 3 c o m p o n e n t s)) = 258 v a r i a b l e s

(1)

3.2.3. Proposed Model

Furthermore, the poses are identified from the position of the landmarks obtained with the forward RNN as follows

a^{(t)} = b + W h^{(t - 1)} + U x^{(t)} h^{(t)} = \tan h (a^{(t)}) o^{(t)} = c + V h^{(t)} {\hat{y}}^{(t)} = s o f t m a x (o^{(t)})

(2)

The loss function

L (\{x^{(1)}, \dots, x^{(τ)}\}, \{γ^{(1)}, \dots, γ^{(τ)}\}) = \sum_{t = 1}^{τ} L^{(t)} = - \log \hat{y}

(3)

The total loss for a given sequence of

x^{(t)} = \{x^{(1)}, \dots, x^{(τ)}\}

paired with a sequence of

y^{(t)} = \{γ^{(1)}, \dots, γ^{(τ)}\}

will be the sum of the losses over all time steps. The pre-activation at t,

a^{(t)}

is computed as a sum of bias b, the product of a weight matrix

W,

and the hidden state from the previous time step

h^{(t - 1)}

, and the product of another weight matrix U with the input

x^{(t)}

. The output

o^{(t)}

is computed as a sum of bias c and the product of a weight matrix

V

and the hidden state at

h^{(t)}

, which is used as an argument in the SoftMax function to obtain the probability vector

{\hat{y}}^{(t)}

for the output. The loss

L^{(t)}

is the negative log probability of the true target

y^{(t)}

based on the input.

The RNN solves tasks related to time series and maps input sequences to output sequences of equal length. RNNs can generate a value with dynamic parameters, such as time, temperature, and other values that can be changed. Figure 3 shows the proposed method architecture.

Landmarks are detected from the sequence of images obtained from the video input. The dataset is divided into 80% for training and 20% for validation. RNN models with 50 time series intervals are processed at layers 64, 128, 256, and 256 with their respective sequence returns: true, true, true, and false. In the next stage, the output of the RNN is processed in a feed-forward network with three hidden layers with neurons 258, 512, and 1024, respectively. The process is performed on the training and test data to be compared, including optimization settings and losses on the resulting matrix. Next, the classification of the identification results is performed, whether it is the activity of littering or not.

The RNN helps the system to distinguish object activity, whether it is just passing by or throwing away garbage by:

Reading the provided sequential data (images) and processing them step by step.
Recognizing each image’s relationship sequentially and capturing the image’s dependencies and temporal patterns.
Maintaining information about previously observed actions and using it to predict future movements.
Analyzing the context in which littering occurs indiscriminately. This can consider factors such as the person’s location, the presence of trash cans, and other environmental cues. This will help the system decide whether a series of actions will be categorized as littering.
Operating in real-time, continuously processing incoming data and making predictions as new information becomes available.
Learning and adapting the system’s recognition capabilities over time by adjusting to changing patterns and behavior.
Triggering an alert or taking action when an anomaly is detected.

3.3. Web Integration

The infrastructure model used in this research is shown in Figure 4. Laravel is used for strong and secure data transmission between system devices and the central server in this research. This is useful for building RESTful web APIs. The system is secured using HTTPS to ensure that the information is protected and accessible to authorized users. In this research, if the littering device detects someone throwing garbage, the image will be sent to the server, and the database data will be added. Otherwise, the camera will capture activity as normal. Figure 4 shows that the information obtained from PC1 and PC2 is sent to the cloud using a secure connection established by the IoT system and API.

IoT technology facilitates data transfer to cloud-based platforms, where data are stored and processed. Anyone with an internet connection can access these data from anywhere. This capability allows remote personnel to observe monitored areas in real time and take action on incidents as they occur.

3.4. Flowchart

The proposed research flowchart is shown in Figure 5. First, the camera will capture a video for 2 s. Furthermore, landmark poses are detected in each input image to produce the required data. Sequence data are processed with a model of training results. If garbage is detected, the image will be saved, and the information will be sent via the REST API to the database server.

4. Results and Discussion

The proposed device is tested under several conditions to determine its capability and performance in classifying activities.

4.1. Standing Still Object

The first experiments were carried out on objects standing still, where people stood still without moving to another place. This experiment was evaluated using multiple capture positions: FRONT, LEFT, RIGHT, and BACK. The experimental results are summarized in Table 2. From 24 experiments, we found that the device can determine activity quite well, even though the position between the camera and the object differs.

In the FRONT capture position, the camera is placed parallel to the object facing directly into the camera. LEFT, RIGHT, and BACK positions mean the camera is placed on the left, right, and behind the object. The results of this experiment are shown in Figure 6, Figure 7, Figure 8 and Figure 9. There is decision information on the top left of each figure for the activities in each of the experimental results in this paper. The tool will display “Normal” if the activity is considered normal, or in other words, objects pass by. However, if the activity is categorized as littering, the device will display the information “Membuang Sampah” or “Throwing out trash”. In addition, information will be sounded through the speaker as “Do not litter”. This information is sent to officers so that they can enforce the law for violators of the rules of littering.

In Figure 6, the proposed device can correctly differentiate activities. In the FRONT position, Figure 6a,b was obtained in the morning, Figure 6c,d in the afternoon, and Figure 6e,f at night. In this experiment, accurate classification results were obtained at different times. The device can detect pose points and determine the activity correctly. The proposed device recognizes Figure 6a,c,e as “Normal” activity. Even though the objects contain garbage in Figure 6a,c, the device concludes that these objects are “Normal” activities. This is due to the absence of hand movements.

The LEFT, RIGHT, and BACK experiment results are shown in Figure 7, Figure 8 and Figure 9. In Figure 7, the device can determine the activity correctly. Like the FRONT experiment in Figure 6, even though the objects are holding trash (Figure 7a,c), the device determines it as a “Normal” activity because the object does not pick it up. We conclude that the hand is an indicator of throwing away trash. The device can determine activity littering in Figure 8b,d,f because movement was detected.

In Figure 8, the position is RIGHT, and the result is as accurate as in the LEFT. The device can determine the act of littering when there is an object’s hand movement, as shown in Figure 8b,d,f. However, the object framework is not fully readable by the device.

The device can determine activity well for the BACK position (Figure 9). In Figure 9a, the device can inform that the activity is “Normal” even if an object moves its hand. This is because the object does not hold garbage. However, this position is crucial due to blank spots where the camera cannot capture the complete skeleton pose, as shown in Figure 9b,d. However, the activities are still classified correctly. Thus, the detection is still accurate even though the activity of littering is recorded from the BACK, as shown in Figure 9b,d,f.

4.2. Moving Object

The second experiment was carried out on moving objects where people move from one place to another. The results of this experiment are summarized in Table 3. The sampling activities in this experiment are shown in Figure 10.

Figure 10 shows some random sampling data. From these experiments, we found that the device can classify the activity well. In Figure 11a, even though there is movement in the hand, the device can recognize it as a “Normal” activity. In Figure 11c,e,g, the device can classify activities as usual.

4.3. Sitting Object

The third experiment was conducted on sitting objects, where people sit still in a chair. The experimental data are summarized in Table 4. In Figure 11a,g, there is hand movement, but it can still be classified as “Normal” activity because the object is not holding anything in its hands.

In this study, the sensor is also tested in a real environment. Data are presented in Table 5. All sensors work correctly and offer valid data. All data can be monitored using the user interface on PCs, laptops, and mobile phones, as shown in Figure 12.

Figure 12 shows that users can monitor several places, i.e., parks and rivers. The location settings can be changed depending on the device implementation. The bottom box of the interface provides information to find out whether someone is littering. Users can find out environmental information by looking at the top interface box. Data recorded on the interface include temperature, humidity, air quality, and water level.

This section presents a system implementation that has undergone testing for several weeks. To accommodate system limitations, testing was carried out at different times, i.e., morning, afternoon, and evening. It was also tested with different positions and angles between the camera and humans. We monitored human activity in rivers, in parks, inside homes, and outside the home. The system detected 432 human activities during the observation period and identified 237 incidents of littering that were detected by the system, while the rest were monitored as normal. This has been validated by providing information on the number of people who litter. Thus, the system has achieved success in distinguishing littering activities. This proves that our method is reliable in various situations, even though it is used in different environments or places.

The monitoring system using an RNN, IoT technology, and integrated sensors for real-time detection and notification that has been proposed in this paper has potential applications in a wide spectrum of domains beyond just litter detection, as follows:

(1): Traffic management and smart cities: identify real-time traffic violations such as illegal parking, breaking signals, or speeding.
(2): Traffic flow analysis: monitor traffic flow and predict congestion or accidents.
(3): Security and surveillance: detect unauthorized entry into restricted zones and potential threats in crowded places or critical infrastructure.
(4): Environmental monitoring: identifying illegal logging or forest clearing, detecting illegal hunting activities, or ensuring the safety of endangered species in their habitat.
(5): Health: using sensors to monitor patient movement in care facilities to detect falls or other anomalies.
(6): Agriculture and livestock: using sensors to detect disease and pest activity or monitor soil health, observing livestock movement and health in real time.
(7): Retail: monitor customer movements in retail stores to gain insight into shopping patterns and preferences and detect potential shoplifting activity in real-time.
(8): Facility usage analysis: monitoring public facilities such as parks, fitness centers, or libraries to collect data about peak times, user behavior, or facility health.
(9): Early warning systems: use integrated sensors to detect early signs of natural disasters such as earthquakes, tsunamis, or volcanic eruptions, monitor affected areas to assess damage, track relief efforts, or detect secondary hazards.
(10): Audience engagement analysis: observing audience behavior during a performance, concert, or exhibition to gather insights about engagement and preferences.

5. Conclusions

This paper has discussed pose detection to monitor littering activities. Detection is carried out to distinguish between normal activities and littering. The detector is equipped with several sensors useful for environmental surveillance. The proposed monitoring can be observed on several devices, such as PCs and mobile phones. The experimental results show that detection and recognition can be carried out accurately at various camera angles and lighting. Detection also has good results in different places. However, there are still problems with detection when objects move too fast or are too close to the camera. The integrated monitoring system with real-time notifications that has been proposed can also be used for other city surveillance applications. For future research, methods that are more robust to varying environmental conditions and object activities can be developed. Furthermore, comparative analysis with other well-known deep learning models can also be considered for better future research.

6. Patents

This research project was granted by the Ministry of Knowledge, Culture, Research, and Technology of the Republic of Indonesia as Surat Pencatatan Hak Cipta EC00202145092 on the 7th of September 2021 and EC00202264118. This project is also in the process of being registered as a patent.

Author Contributions

Conceptualization, N.L.H., A. and A.S.H.; methodology, N.L.H., A.S.H. and R.P.; software, O.F. and N.L.H.; validation, W.C., M.S. and S.A.H.S.; formal analysis, M.S. and R.P.; investigation, N.L.H. and A.B., resources, A.S.H.; data curation, N.L.H. and A.S.H.; writing—original draft preparation, N.L.H. and O.F.; writing—review and editing, S.A.H.S., W.R. and W.C.; visualization, N.L.H. and O.F.; supervision, W.R. and W.C.; project administration, O.F. and A.B.; funding acquisition, N.L.H. and W.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Kementerian Pendidikan, Kebudayaan, Riset, dan Teknologi Indonesia, grant number 081/SPK/D4/PPK.01.APTV /VI/2022. The second corresponding author acknowledge the Polish National Agency for Academic Exchange (NAWA) No. BPN/ULM/2022/1/00139/U/00001 for financial support.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the Kementerian Pendidikan, Kebudayaan, Riset, dan Teknologi Indonesia, and Politeknik Negeri Sriwijaya for their funding and support. The author would like to thank their colleagues in the Artificial Intelligence Laboratory of Electrical Engineering in Politeknik Negeri Sriwijaya. Finally, the authors thank the Intelligence Laboratory of Sriwijaya University and the Cyborg IT Center.

Conflicts of Interest

The authors declare that this research has no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Marfuah, D.; Noviyanti, R.D.; Kusudaryati, D.P.D.; Rahmala, G.U. Health education in improving clean and healthy life behavior (PHBS) at community in the Jebol Ngrombo village Baki Sukoharjo. J. Pengabdi. Dan Pemberdaya. Masy. Indones. 2022, 2, 309–320. [Google Scholar]
Kustiari, T.; Suryadi, U.; Nadipah, I. Counseling on processing of waste into solid organic fertilizer for GAPOKTAN farmers rukun tani Segobang village, Licin district, Banyuwangi regency, East Java. J. Pengabdi. Dan Pemberdaya. Masy. Indones. 2022, 2, 165–174. [Google Scholar]
Permatasari, A.; Triyono, T.; Walinegoro, B.G. Training on processing household waste into organic liquid fertilizer for PKK Cadres in Baturetno village. J. Pengabdi. Dan Pemberdaya. Masy. Indones. 2022, 2, 134–140. [Google Scholar]
Herdiansyah, H.; Brotosusilo, A.; Negoro, H.A.; Sari, R.; Zakianis, Z. Parental education and good child habits to encourage sustainable littering behavior. Sustainability 2021, 13, 8645. [Google Scholar] [CrossRef]
Siddiqua, A.; Hahladakis, J.N.; Al-Attiya, W.A.K.A. An overview of the environmental pollution and health effects associated with waste landfilling and open dumping. Environ. Sci. Pollut. Res. 2022, 29, 58514–58536. [Google Scholar] [CrossRef] [PubMed]
Abubakar, I.R.; Maniruzzaman, K.M.; Dano, U.L.; AlShihri, F.S.; AlShammari, M.S.; Ahmed, S.M.S.; Al-Gehlani, W.A.G.; Alrawaf, T.I. Environmental sustainability impacts of solid waste management practices in the global south. Int. J. Environ. Res. Public Health 2021, 19, 12717. [Google Scholar] [CrossRef] [PubMed]
Efendi, Y.; Imardi, S.; Muzawi, R.; Syaifullah, M. Application of RFID internet of things for school empowerment towards smart school. J. Pengabdi. Dan Pemberdaya. Masy. Indones. 2021, 1, 67–77. [Google Scholar] [CrossRef]
Qasim, A.B.; Pettirsch, A. Recurrent neural networks for video object detection. arXiv 2020, arXiv:2010.15740. [Google Scholar]
Husni, N.L.; Sari, P.A.R.; Handayani, A.S.; Dewi, T.; Seno, S.A.H.; Caesarendra, W.; Glowacz, A.; Oprzędkiewicz, K.; Sułowicz, M. Real-time littering activity monitoring based on image classification method. Smart Cities 2021, 4, 1496–1518. [Google Scholar] [CrossRef]
Xia, K.; Huang, J.; Wang, H. LSTM-CNN architecture for human activity recognition. IEEE Access 2020, 8, 56855–56866. [Google Scholar] [CrossRef]
Mohsen, S. Recognition of human activity using GRU deep learning algorithm. Multimed. Tools Appl. 2023. [Google Scholar] [CrossRef]
Hernández, Ó.G.; Morell, V.; Ramon, J.L.; Jara, C.A. Human pose detection for robotic-assisted and rehabilitation environments. Appl. Sci. 2021, 11, 4183. [Google Scholar] [CrossRef]
Noori, F.M.; Wallace, B.; Uddin, M.Z.; Torresen, J. A robust human activity recognition approach using OpenPose, motion features, and deep recurrent neural network. LNCS 2019, 11482, 299–310. [Google Scholar]
Yu, T.; Chen, J.; Yan, N.; Liu, X. A multi-layer parallel LSTM network for human activity recognition with smartphone sensors. In Proceedings of the 10th International Conference on Wireless Communications and Signal Processing (WCSP), Hangzhou, China, 18–20 October 2018; pp. 1–6. [Google Scholar]
Luvizon, D.C.; Picard, D.; Tabia, H. 2D/3D pose estimation and action recognition using multitask deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5137–5146. [Google Scholar]
Debnath, B.; Orbrien, M.; Yamaguchi, M.; Behera, A. Adapting MobileNets for mobile based upper body pose estimation. In Proceedings of the AVSS 2018—2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; pp. 1–6. [Google Scholar]
Alessandrini, M.; Biagetti, G.; Crippa, P.; Falaschetti, L.; Turchetti, C. Recurrent Neural Network for Human Activity Recognition in Embedded Systems Using PPG and Accelerometer Data. Electronics 2021, 10, 1715. [Google Scholar] [CrossRef]
Park, S.U.; Park, J.H.; Al-Masni, M.A.; Al-Antari, M.A.; Uddin, M.Z.; Kim, T.S. A depth camera-based human activity recognition via deep learning recurrent neural network for health and social care services. Procedia Comput. Sci. 2016, 100, 78–84. [Google Scholar] [CrossRef]
Singh, D.; Merdivan, E.; Psychoula, I.; Kropf, J.; Hanke, S.; Geist, M.; Holzinger, A. Human Activity recognition using recurrent neural networks. arXiv 2018, arXiv:1804.07144. [Google Scholar]
Perko, R.; Fassold, H.; Almer, A.; Wenighofer, R.; Hofer, P. Human Tracking and Pose Estimatin for Subsurface Operations. Available online: https://pure.unileoben.ac.at/en/publications/human-tracking-and-pose-estimatin-for-subsurface-operations (accessed on 26 October 2023).
Zhang, Y.; Wang, C.; Wang, X.; Liu, W.; Zeng, W. VoxelTrack: Multi-person 3D human pose estimation and tracking in the wild. arXiv 2021, arXiv:2108.02452. [Google Scholar] [CrossRef]
Megawan, S.; Lestari, W.S. Deteksi spoofing wajah menggunakan Faster R-CNN dengan arsitektur Resnet50 pada video. J. Nas. Tek. Elektro dan Teknol. Inf. 2020, 9, 261–267. [Google Scholar]
Rikatsih, N.; Supianto, A.A. Classification of posture reconstruction with univariate time series data type. In Proceedings of the 2018 International Conference on Sustainable Information Engineering and Technology (SIET), Malang, Indonesia, 10–12 November 2018; pp. 322–325. [Google Scholar]
Andriluka, M.; Pishchulin, L.; Gehler, P.; Schiele, B. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 3686–3693. [Google Scholar]
Fassold, H.; Gutjahr, K.; Weber, A.; Perko, R. A real-time algorithm for human action recognition in RGB and thermal video. arXiv 2023, arXiv:2304.01567v1. [Google Scholar]
Cheng, Y.; Yang, B.; Wang, B.; Tan, R.T. 3D human pose estimation using spatio-temporal networks with explicit occlusion training. In Proceedings of the AAAI 2020—34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 10631–10638. [Google Scholar]
Zhang, J.; Wang, Y.; Zhou, Z.; Luan, T.; Wang, Z.; Qiao, Y. Learning dynamical human-joint affinity for 3D pose estimation in videos. IEEE Trans. Image Process. 2021, 30, 7914–7925. [Google Scholar] [CrossRef]
Uddin, M.Z.; Torresen, J. A deep learning-based human activity recognition in darkness. In Proceedings of the 2018 Colour and Visual Computing Symposium, Gjovik, Norway, 19–20 September 2018; pp. 1–5. [Google Scholar]
Wang, J.; Xu, E.; Xue, K.; Kidzinski, L. 3D pose detection in videos: Focusing on occlusion. arXiv 2020, arXiv:2006.13517. [Google Scholar]
Steven, G.; Purbowo, A.N. Penerapan 3D human pose estimation indoor area untuk motion capture dengan menggunakan YOLOv4-Tiny, EfficientNet simple baseline, dan VideoPose3D. J. Infra 2022, 10, 1–7. [Google Scholar]
Liu, R.; Shen, J.; Wang, H.; Chen, C.; Cheung, S.-C.; Asari, V.K. Enhanced 3D human pose estimation from videos by using attention-based neural network with dilated convolutions. Int. J. Comput. Vis. 2021, 129, 1596–1615. [Google Scholar] [CrossRef]
Llopart, A. LiftFormer: 3D human pose estimation using attention models. arXiv 2020, arXiv:2009.00348. [Google Scholar]
Yu, C.; Wang, B.; Yang, B.; Tan, R.T. Multi-scale networks for 3D human pose estimation with inference stage optimization. arXiv 2020, arXiv:2010.06844. [Google Scholar]
Krizhevsky, B.A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2012, 60, 84–90. [Google Scholar] [CrossRef]
Wu, D.; Zheng, S.-J.; Zhang, X.-P.; Yuan, C.-A.; Cheng, F.; Zhao, Y.; Lin, Y.-J.; Zhao, Z.-Q.; Jiang, Y.-J.; Huang, D.-S. Deep learning-based methods for person re-identification: A comprehensive review. Neurocomputing 2019, 337, 354–371. [Google Scholar] [CrossRef]
Chen, Y.; Tian, Y.; He, M. Monocular human pose estimation: A survey of deep learning-based methods. Comput. Vis. Image Underst. 2020, 192, 102897. [Google Scholar] [CrossRef]
Wang, Z.; Chen, R.; Liu, M.; Dong, G.; Basu, A. SPGNet: Spatial projection guided 3D human pose estimation in low dimensional space. Smart Multimed. LNCS 2022, 13497, 1–15. [Google Scholar]

Figure 1. Block diagram of an activity monitoring system for littering.

Figure 2. Landmark detection for pose recognition.

Figure 3. Architecture of the proposed method.

Figure 4. The infrastructure models.

Figure 5. Flowchart of the proposed method.

Figure 6. FRONT captured position, (a,b) morning, (c,d) afternoon, (e,f) night.

Figure 7. LEFT captured position, (a,b) morning, (c,d) afternoon, (e,f) night.

Figure 8. RIGHT captured position, (a,b) morning, (c,d) afternoon, (e,f) night.

Figure 9. BACK captured position, (a,b) morning, (c,d) afternoon, (e,f) night.

Figure 10. Moving objects, (a,b) FRONT, (c,d) LEFT, (e,f) RIGHT, (g,h) BACK.

Figure 11. Sitting objects, (a,b) FRONT, (c,d) LEFT, (e,f) RIGHT, (g,h) BACK.

Figure 12. User interface.

Table 1. Human pose detection research.

No.	Area	Method/Technique	Datasets	Majority	Ref.
1.	Indoor	CNN-based	Images of humans.	For robotic-assisted and rehabilitation environments.	[12]
2.		Open-Pose, RNNs, LSTM	Multimodal human action dataset, such as standing up, sitting down, jumping, bending, waving, clapping, and throwing.	Using motion features.	[13,14]
3.		CNNs	High-level pose representation, such as drinking water, making a phone call	Using multitask deep learning.	[15,16]
4.		RNN using PPG	three different activities (resting, squat, and stepper).	Embedded systems using PPG and accelerometer data.	[17]
5.		RNN	Twelve activities, such as left arm, push right, goggles, and so on.	For health and social care services.	[18,19]
6.		YoloV3 to the Scaled-YoloV4, TCN	COCO, MPII, and HumanEva-1, such as smoking.	For subsurface operations.	[20,21]
7.	Outdoor	CNN, LSTM, Faster R-CNN	Walking, walking upstairs, laying, and yoga.	Lightweight deep learning model using smartphones	[22]
8.		LSTM, RNN	Walking, jogging, standing, walking class, jumping, and running.	Implemented in darkness.	[23,24]
9.		K-NN	Walking, running, swimming, and jumping.	Univariate time series data	[25]
10.	Indoor and outdoor	GCN	MPII, and HumanEva-1.	Using learning dynamics.	[26]
11.		Mask-RCNN	Walking and boxing.	Using attention models.	[27]

Table 2. Result for standing still objects.

No.	Captured Position	Time	Activity	Device Detection	Notification	Note
1.	FRONT	Morning	Normal	Normal	Off	Success
2.		Morning	Littering	Littering	On	Success
3.		Afternoon	Normal	Normal	Off	Success
4.		Afternoon	Littering	Littering	On	Success
5.		Night	Normal	Normal	Off	Success
6.		Night	Littering	Littering	On	Success
7.	LEFT	Morning	Normal	Normal	Off	Success
8.		Morning	Littering	Littering	On	Success
9.		Afternoon	Normal	Normal	Off	Success
10.		Afternoon	Littering	Littering	On	Success
11.		Night	Normal	Normal	Off	Success
12.		Night	Littering	Littering	On	Success
13.	RIGHT	Morning	Normal	Normal	Off	Success
14.		Morning	Littering	Littering	On	Success
15.		Afternoon	Normal	Normal	Off	Success
16.		Afternoon	Littering	Littering	On	Success
17.		Night	Normal	Normal	Off	Success
18.		Night	Littering	Littering	On	Success
19.	BACK	Morning	Normal	Normal	Off	Success
20.		Morning	Littering	Littering	On	Success
21.		Afternoon	Normal	Normal	Off	Success
22.		Afternoon	Littering	Littering	On	Success
23.		Night	Normal	Normal	Off	Success
24.		Night	Littering	Littering	On	Success

Table 3. Result for moving objects.

No.	Captured Position	Time	Activity	Device Detection	Notification	Note
1.	FRONT	Morning	Normal	Normal	Off	Success
2.		Morning	Littering	Littering	On	Success
3.		Afternoon	Normal	Normal	Off	Success
4.		Afternoon	Littering	Littering	On	Success
5.		Night	Normal	Normal	Off	Success
6.		Night	Littering	Littering	On	Success
7.	LEFT	Morning	Normal	Normal	Off	Success
8.		Morning	Littering	Littering	On	Success
9.		Afternoon	Normal	Normal	Off	Success
10.		Afternoon	Littering	Littering	On	Success
11.		Night	Normal	Normal	Off	Success
12.		Night	Littering	Littering	On	Success
13.	RIGHT	Morning	Normal	Normal	Off	Success
14.		Morning	Littering	Littering	On	Success
15.		Afternoon	Normal	Normal	Off	Success
16.		Afternoon	Littering	Littering	On	Success
17.		Night	Normal	Normal	Off	Success
18.		Night	Littering	Littering	On	Success
19.	BACK	Morning	Normal	Normal	Off	Success
20.		Morning	Littering	Littering	On	Success
21.		Afternoon	Normal	Normal	Off	Success
22.		Afternoon	Littering	Littering	On	Success
23.		Night	Normal	Normal	Off	Success
24.		Night	Littering	Littering	On	Success

Table 4. Result for sitting objects.

No.	Captured Position	Time	Activity	Device Detection	Notification	Note
1.	FRONT	Morning	Normal	Normal	Off	Success
2.		Morning	Littering	Littering	On	Success
3.		Afternoon	Normal	Normal	Off	Success
4.		Afternoon	Littering	Littering	On	Success
5.		Night	Normal	Normal	Off	Success
6.		Night	Littering	Littering	On	Success
7.	LEFT	Morning	Normal	Normal	Off	Success
8.		Morning	Littering	Littering	On	Success
9.		Afternoon	Normal	Normal	Off	Success
10.		Afternoon	Littering	Littering	On	Success
11.		Night	Normal	Normal	Off	Success
12.		Night	Littering	Littering	On	Success
13.	RIGHT	Morning	Normal	Normal	Off	Success
14.		Morning	Littering	Littering	On	Success
15.		Afternoon	Normal	Normal	Off	Success
16.		Afternoon	Littering	Littering	On	Success
17.		Night	Normal	Normal	Off	Success
18.		Night	Littering	Littering	On	Success
19.	BACK	Morning	Normal	Normal	Off	Success
20.		Morning	Littering	Littering	On	Success
21.		Afternoon	Normal	Normal	Off	Success
22.		Afternoon	Littering	Littering	On	Success
23.		Night	Normal	Normal	Off	Success
24.		Night	Littering	Littering	On	Success

Table 5. Environmental data.

No.	Temperature (°C)	Humidity (%)	Water Level (cm)	Air Quality (ADC.)
1.	29	72	107	156
2.	36	55	116	146
3.	35	60	105	138
4.	27	72	99	127
5.	37	55	102	137
6.	35	60	100	172
7.	37	57	107	122
8.	34	64	98	147
9.	32	68	101	166
10.	31	75	120	128
11.	30	78	161	145
12.	32	69	191	162

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Husni, N.L.; Felia, O.; Abdurrahman; Handayani, A.S.; Pasarella, R.; Bastari, A.; Sylvia, M.; Rahmaniar, W.; Seno, S.A.H.; Caesarendra, W. Pose Detection and Recurrent Neural Networks for Monitoring Littering Violations. Eng 2023, 4, 2722-2740. https://doi.org/10.3390/eng4040155

AMA Style

Husni NL, Felia O, Abdurrahman, Handayani AS, Pasarella R, Bastari A, Sylvia M, Rahmaniar W, Seno SAH, Caesarendra W. Pose Detection and Recurrent Neural Networks for Monitoring Littering Violations. Eng. 2023; 4(4):2722-2740. https://doi.org/10.3390/eng4040155

Chicago/Turabian Style

Husni, Nyayu Latifah, Okta Felia, Abdurrahman, Ade Silvia Handayani, Rosi Pasarella, Akhmad Bastari, Marlina Sylvia, Wahyu Rahmaniar, Seyed Amin Hosseini Seno, and Wahyu Caesarendra. 2023. "Pose Detection and Recurrent Neural Networks for Monitoring Littering Violations" Eng 4, no. 4: 2722-2740. https://doi.org/10.3390/eng4040155

APA Style

Husni, N. L., Felia, O., Abdurrahman, Handayani, A. S., Pasarella, R., Bastari, A., Sylvia, M., Rahmaniar, W., Seno, S. A. H., & Caesarendra, W. (2023). Pose Detection and Recurrent Neural Networks for Monitoring Littering Violations. Eng, 4(4), 2722-2740. https://doi.org/10.3390/eng4040155

Article Menu

Pose Detection and Recurrent Neural Networks for Monitoring Littering Violations

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Hardware

3.2. Software

3.2.1. Dataset

3.2.2. Pre-Processing

3.2.3. Proposed Model

3.3. Web Integration

3.4. Flowchart

4. Results and Discussion

4.1. Standing Still Object

4.2. Moving Object

4.3. Sitting Object

5. Conclusions

6. Patents

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

No.	Temperature (°C)	Humidity (%)	Water Level (cm)	Air Quality (ADC.)
1.	29	72	107	156
2.	36	55	116	146
3.	35	60	105	138
4.	27	72	99	127
5.	37	55	102	137
6.	35	60	100	172
7.	37	57	107	122
8.	34	64	98	147
9.	32	68	101	166
10.	31	75	120	128
11.	30	78	161	145
12.	32	69	191	162

No.	Temperature (°C)	Humidity (%)	Water Level (cm)	Air Quality (ADC.)
1.	29	72	107	156
2.	36	55	116	146
3.	35	60	105	138
4.	27	72	99	127
5.	37	55	102	137
6.	35	60	100	172
7.	37	57	107	122
8.	34	64	98	147
9.	32	68	101	166
10.	31	75	120	128
11.	30	78	161	145
12.	32	69	191	162

No.	Temperature (°C)	Humidity (%)	Water Level (cm)	Air Quality (ADC.)
1.	29	72	107	156
2.	36	55	116	146
3.	35	60	105	138
4.	27	72	99	127
5.	37	55	102	137
6.	35	60	100	172
7.	37	57	107	122
8.	34	64	98	147
9.	32	68	101	166
10.	31	75	120	128
11.	30	78	161	145
12.	32	69	191	162