A Reinforcement Learning Based Dirt-Exploration for Cleaning-Auditing Robot
Abstract
:1. Introduction
2. Related Works
- Model the autonomous auditing as a Markov decision process (MDP).
- Train the model with representation of indoor environments with manually assigned dirt distributions.
- Evaluate the learned policy on multiple simulation environments.
- Using in-house developed BELUGA cleaning audit robot, evaluate the performance of the proposed strategy.
3. System Overview
- Simple: Easier to model the problem since an analytical model for dirt distribution is harder to compute.
- Scalable: The dirt accumulation patterns can be simulated easily for training, and the RL model can be scaled to any environment.
- Reliable: The system’s performance depends upon the learned policy for selecting actions and less relies upon limiting factors like sensor accuracy, external lighting, etc.
4. Problem Formulation
4.1. Markov Decision Process (MDP)
4.2. Proximal Policy Optimization (PPO)
4.3. Network Architecture
Algorithm 1 Pseudocode for PPO implementation [29] |
|
5. Training and Validation
Policy Training and Evaluation
Algorithm 2 Pseudo code for PPO rollout |
|
6. Results and Discussion
- Success: Robot initiate a sampling action and the location truly have dirt accumulation, or the robot didn’t do a sampling action, and the location have no dirt accumulation
- Type-I error: Robot initiate a sampling action, and the location have no dirt accumulation
- Type-II error: Robot didn’t do a sampling action, and the location have dirt accumulation
7. Conclusions and Futureworks
- Establishing an exhaustive open-source dirt distribution dataset to enhance the training and improvements of the current method
- Chemical and microbial analysis in sample auditing
- A comprehensive comparative study on different on-policy algorithms in context of cleaning auditing
- Multi-agent reinforcement learning for dirt-exploration strategy
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Truong, N.; Nisar, T.; Knox, D.; Prabhakar, G. The influences of cleanliness and employee attributes on perceived service quality in restaurants in a developing country. Int. J. Cult. Tour. Hosp. Res. 2017, 11, 608–627. [Google Scholar] [CrossRef] [Green Version]
- Cleaning a Nation: Cultivating a Healthy Living Environment. Available online: https://www.clc.gov.sg/research-publications/publications/urban-systems-studies/view/cleaning-a-nation-cultivating-a-healthy-living-environment (accessed on 2 March 2021).
- Cleaning Industry Analysis 2020-Cost & Trends. Available online: https://www.franchisehelp.com/industry-reports/cleaning-industry-analysis-2020-cost-trends/ (accessed on 23 June 2021).
- Top Three Commercial Cleaning Trends in 2019. Available online: https://www.wilburncompany.com/top-three-commercial-cleaning-trends-in-2019/ (accessed on 23 June 2021).
- Diab-El Schahawi, M.; Zingg, W.; Vos, M.; Humphreys, H.; Lopez-Cerero, L.; Fueszl, A.; Zahar, J.R.; Presterl, E. Ultraviolet disinfection robots to improve hospital cleaning: Real promise or just a gimmick? Antimicrob. Resist. Infect. Control 2021, 10, 33. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Loeb, S.; Kim, J.H. LED revolution: Fundamentals and prospects for UV disinfection applications. Environ. Sci. Water Res. Technol. 2017, 3, 188–202. [Google Scholar] [CrossRef]
- Arnott, B.; Arnott, M. Automatic Floor Cleaning Machine and Process. U.S. Patent 10,006,192, 26 June 2018. [Google Scholar]
- Martinovs, A.; Mezule, L.; Revalds, R.; Pizica, V.; Denisova, V.; Skudra, A.; Kolcs, G.; Zaicevs, E.; Juhna, T. New device for air disinfection with a shielded UV radiation and ozone. Agron. Res. 2021, 19, 834–846. [Google Scholar] [CrossRef]
- Dammkoehler, D.; Jin, Z. Floor Cleaning Machine. U.S. Patent App. 29/548,203, 25 April 2017. [Google Scholar]
- Fleming, M.; Patrick, A.; Gryskevicz, M.; Masroor, N.; Hassmer, L.; Shimp, K.; Cooper, K.; Doll, M.; Stevens, M.; Bearman, G. Deployment of a touchless ultraviolet light robot for terminal room disinfection: The importance of audit and feedback. Am. J. Infect. Control 2018, 46, 241–243. [Google Scholar] [CrossRef] [PubMed]
- Prabakaran, V.; Mohan, R.E.; Sivanantham, V.; Pathmakumar, T.; Kumar, S.S. Tackling area coverage problems in a reconfigurable floor cleaning robot based on polyomino tiling theory. Appl. Sci. 2018, 8, 342. [Google Scholar] [CrossRef] [Green Version]
- Muthugala, M.; Vega-Heredia, M.; Mohan, R.E.; Vishaal, S.R. Design and control of a wall cleaning robot with adhesion-awareness. Symmetry 2020, 12, 122. [Google Scholar] [CrossRef] [Green Version]
- Sivanantham, V.; Le, A.V.; Shi, Y.; Elara, M.R.; Sheu, B.J. Adaptive Floor Cleaning Strategy by Human Density Surveillance Mapping with a Reconfigurable Multi-Purpose Service Robot. Sensors 2021, 21, 2965. [Google Scholar] [CrossRef]
- Chang, C.L.; Chang, C.Y.; Tang, Z.Y.; Chen, S.T. High-efficiency automatic recharging mechanism for cleaning robot using multi-sensor. Sensors 2018, 18, 3911. [Google Scholar] [CrossRef] [Green Version]
- Pathmakumar, T.; Sivanantham, V.; Anantha Padmanabha, S.G.; Elara, M.R.; Tun, T.T. Towards an Optimal Footprint Based Area Coverage Strategy for a False-Ceiling Inspection Robot. Sensors 2021, 21, 5168. [Google Scholar] [CrossRef]
- Giske, L.A.L.; Bjørlykhaug, E.; Løvdal, T.; Mork, O.J. Experimental study of effectiveness of robotic cleaning for fish-processing plants. Food Control 2019, 100, 269–277. [Google Scholar] [CrossRef]
- Lewis, T.; Griffith, C.; Gallo, M.; Weinbren, M. A modified ATP benchmark for evaluating the cleaning of some hospital environmental surfaces. J. Hosp. Infect. 2008, 69, 156–163. [Google Scholar] [CrossRef]
- Asgharian, R.; Hamedani, F.M.; Heydari, A. Step by step how to do cleaning validation. Int. J. Pharm. Life Sci. 2014, 5, 3365. [Google Scholar]
- Malav, S.; Saxena, N. Assessment of disinfection and cleaning validation in central laboratory, MBS hospital, Kota. J. Evol. Med Dent. Sci. 2018, 7, 1259–1263. [Google Scholar] [CrossRef]
- Al-Hamad, A.; Maxwell, S. How clean is clean? Proposed methods for hospital cleaning assessment. J. Hosp. Infect. 2008, 70, 328–334. [Google Scholar] [CrossRef]
- Cloutman-Green, E.; D’Arcy, N.; Spratt, D.A.; Hartley, J.C.; Klein, N. How clean is clean—Is a new microbiology standard required? Am. J. Infect. Control 2014, 42, 1002–1003. [Google Scholar] [CrossRef]
- Pathmakumar, T.; Kalimuthu, M.; Elara, M.R.; Ramalingam, B. An Autonomous Robot-Aided Auditing Scheme for Floor Cleaning. Sensors 2021, 21, 4332. [Google Scholar] [CrossRef] [PubMed]
- Smart, W.D.; Kaelbling, L.P. Effective reinforcement learning for mobile robots. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), Washington, DC, USA, 11–15 May 2002; Volume 4, pp. 3404–3410. [Google Scholar]
- Tai, L.; Paolo, G.; Liu, M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 31–36. [Google Scholar]
- Rivera, P.; Valarezo Añazco, E.; Kim, T.S. Object Manipulation with an Anthropomorphic Robotic Hand via Deep Reinforcement Learning with a Synergy Space of Natural Hand Poses. Sensors 2021, 21, 5301. [Google Scholar] [CrossRef]
- Kozjek, D.; Malus, A.; Vrabič, R. Reinforcement-Learning-Based Route Generation for Heavy-Traffic Autonomous Mobile Robot Systems. Sensors 2021, 21, 4809. [Google Scholar] [CrossRef] [PubMed]
- Pi, C.H.; Dai, Y.W.; Hu, K.C.; Cheng, S. General Purpose Low-Level Reinforcement Learning Control for Multi-Axis Rotor Aerial Vehicles. Sensors 2021, 21, 4560. [Google Scholar] [CrossRef] [PubMed]
- Bing, Z.; Lemke, C.; Morin, F.O.; Jiang, Z.; Cheng, L.; Huang, K.; Knoll, A. Perception-action coupling target tracking control for a snake robot via reinforcement learning. Front. Neurorobot. 2020, 14, 79. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Wang, D.; Fan, T.; Han, T.; Pan, J. A two-stage reinforcement learning approach for multi-UAV collision avoidance under imperfect sensing. IEEE Robot. Autom. Lett. 2020, 5, 3098–3105. [Google Scholar] [CrossRef]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, PMLR, Bejing, China, 22–24 June 2014; pp. 387–395. [Google Scholar]
- Mousavi, H.K.; Liu, G.; Yuan, W.; Takáč, M.; Muñoz-Avila, H.; Motee, N. A layered architecture for active perception: Image classification using deep reinforcement learning. arXiv 2019, arXiv:1909.09705. [Google Scholar]
- Hase, H.; Azampour, M.F.; Tirindelli, M.; Paschali, M.; Simson, W.; Fatemizadeh, E.; Navab, N. Ultrasound-guided robotic navigation with deep reinforcement learning. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 5534–5541. [Google Scholar]
- Choi, J.; Park, K.; Kim, M.; Seok, S. Deep reinforcement learning of navigation in a complex and crowded environment with a limited field of view. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 5993–6000. [Google Scholar]
- Pfeiffer, M.; Shukla, S.; Turchetta, M.; Cadena, C.; Krause, A.; Siegwart, R.; Nieto, J. Reinforced imitation: Sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations. IEEE Robot. Autom. Lett. 2018, 3, 4423–4430. [Google Scholar] [CrossRef] [Green Version]
- Niroui, F.; Zhang, K.; Kashino, Z.; Nejat, G. Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments. IEEE Robot. Autom. Lett. 2019, 4, 610–617. [Google Scholar] [CrossRef]
- Zuluaga, J.G.C.; Leidig, J.P.; Trefftz, C.; Wolffe, G. Deep reinforcement learning for autonomous search and rescue. In Proceedings of the NAECON 2018-IEEE National Aerospace and Electronics Conference, Dayton, OH, USA, 23–26 July 2018; pp. 521–524. [Google Scholar]
- Hu, J.; Niu, H.; Carrasco, J.; Lennox, B.; Arvin, F. Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning. IEEE Trans. Veh. Technol. 2020, 69, 14413–14423. [Google Scholar] [CrossRef]
- Sampedro, C.; Rodriguez-Ramos, A.; Bavle, H.; Carrio, A.; de la Puente, P.; Campoy, P. A fully-autonomous aerial robot for search and rescue applications in indoor environments using learning-based techniques. J. Intell. Robot. Syst. 2019, 95, 601–627. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Alqaraawi, A.; Schuessler, M.; Weiß, P.; Costanza, E.; Berthouze, N. Evaluating saliency map explanations for convolutional neural networks: A user study. In Proceedings of the 25th International Conference on Intelligent User Interfaces, Cagliari, Italy, 17–20 March 2020; pp. 275–285. [Google Scholar]
- Yoo, S.; Jeong, S.; Kim, S.; Jang, Y. Saliency-Based Gaze Visualization for Eye Movement Analysis. Sensors 2021, 21, 5178. [Google Scholar] [CrossRef]
- Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. High-dimensional continuous control using generalized advantage estimation. arXiv 2015, arXiv:1506.02438. [Google Scholar]
- Karlik, B.; Olgac, A.V. Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 2011, 1, 111–122. [Google Scholar]
- Liang, E.; Liaw, R.; Nishihara, R.; Moritz, P.; Fox, R.; Gonzalez, J.; Goldberg, K.; Stoica, I. Ray rllib: A composable and scalable reinforcement learning library. arXiv 2017, arXiv:1712.09381. [Google Scholar]
- Liang, E.; Liaw, R.; Nishihara, R.; Moritz, P.; Fox, R.; Goldberg, K.; Gonzalez, J.; Jordan, M.; Stoica, I. RLlib: Abstractions for distributed reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 3053–3062. [Google Scholar]
- Ketkar, N. Introduction to pytorch. In Deep Learning with Python; Springer: Berlin, Germany, 2017; pp. 195–208. [Google Scholar]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. Openai gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
- Moritz, P.; Nishihara, R.; Wang, S.; Tumanov, A.; Liaw, R.; Liang, E.; Elibol, M.; Yang, Z.; Paul, W.; Jordan, M.I.; et al. Ray: A distributed framework for emerging AI applications. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), Carlsbad, CA, USA, 8–10 October 2018; pp. 561–577. [Google Scholar]
- Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A.Y. ROS: An open-source Robot Operating System. In IEEE International Conference on Robotics and Automation (ICRA) Workshop on Open Source Software; IEEE: Kobe, Japan, 2009; Volume 3, p. 5. [Google Scholar]
- Marin-Plaza, P.; Hussein, A.; Martin, D.; Escalera, A.D.L. Global and local path planning study in a ROS-based research platform for autonomous vehicles. J. Adv. Transp. 2018, 2018, 6392697. [Google Scholar] [CrossRef]
- Fox, D.; Burgard, W.; Thrun, S. The dynamic window approach to collision avoidance. IEEE Robot. Autom. Mag. 1997, 4, 23–33. [Google Scholar] [CrossRef] [Green Version]
p | Waypoint in Robot Coordinates | Action Taken by the Robot |
---|---|---|
Robot stay in the current position | ||
Robot move to left | ||
Robot move forward | ||
Robot move right | ||
S | Sensor state | |
Sensor in idle state | ||
Sensor performs sampling |
PPO-Hyperparameters | Value |
---|---|
discount factor | 0.99 |
learning rate | 0.001 |
mini batch size | 128 |
num_workers | 20 |
num_sgd_iter | 16 |
clip_param | 0.2 |
learning_starts | |
buffer_size | |
train_batch_size | 32 |
Waypoint | Location on Map | Decision | Result |
---|---|---|---|
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Not Sample | Type-II error | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Not sample | Type-II error | ||
Not sample | Success | ||
Not sample | Success | ||
Not sample | Type-II error | ||
Sample | Success | ||
Sample | Success | ||
Not sample | Type-II error | ||
Sample | Success | ||
Total path length | 10.45537 m | ||
Total time taken | 86 s | ||
Accuracy of decision | 73.3% |
Waypoint | Location on Map | Decision | Result |
---|---|---|---|
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Type-I error | ||
Not sample | Success | ||
Not sample | Success | ||
Not sample | Success | ||
Not sample | Success | ||
Not sample | Success | ||
Not sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Total path length | 13.41366 m | ||
Total time taken | 93 s | ||
Accuracy of decision | 94.7% |
Waypoint | Location on Map | Decision | Result |
---|---|---|---|
Sample | Success | ||
Sample | Success | ||
Not Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Type-I error | ||
Sample | Type-I error | ||
sample | Success | ||
Not Sample | Success | ||
Not Sample | Type-II error | ||
Not Sample | Success | ||
Sample | Type-I error | ||
Not Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Not Sample | Type-II error | ||
Total path length | 17.76424 m | ||
Total time taken | 321 s | ||
Accuracy of exploration | 73.68% |
Waypoint | Location on Map | Decision | Result |
---|---|---|---|
Sample | Success | ||
Sample | Success | ||
Not Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | Success | ||
Sample | type-I error | ||
Sample | Success | ||
Sample | Success | ||
Not Sample | Success | ||
Not Sample | type-II error | ||
Sample | Success | ||
Sample | Success | ||
Not Sample | type-II error | ||
Sample | Success | ||
Total path length | 10.19308 m | ||
Total time taken | 137 s | ||
Accuracy of exploration | 80.0% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pathmakumar, T.; Elara, M.R.; Gómez, B.F.; Ramalingam, B. A Reinforcement Learning Based Dirt-Exploration for Cleaning-Auditing Robot. Sensors 2021, 21, 8331. https://doi.org/10.3390/s21248331
Pathmakumar T, Elara MR, Gómez BF, Ramalingam B. A Reinforcement Learning Based Dirt-Exploration for Cleaning-Auditing Robot. Sensors. 2021; 21(24):8331. https://doi.org/10.3390/s21248331
Chicago/Turabian StylePathmakumar, Thejus, Mohan Rajesh Elara, Braulio Félix Gómez, and Balakrishnan Ramalingam. 2021. "A Reinforcement Learning Based Dirt-Exploration for Cleaning-Auditing Robot" Sensors 21, no. 24: 8331. https://doi.org/10.3390/s21248331
APA StylePathmakumar, T., Elara, M. R., Gómez, B. F., & Ramalingam, B. (2021). A Reinforcement Learning Based Dirt-Exploration for Cleaning-Auditing Robot. Sensors, 21(24), 8331. https://doi.org/10.3390/s21248331