A Reinforcement Learning Based DirtExploration for CleaningAuditing Robot
Abstract
:1. Introduction
2. Related Works
 Model the autonomous auditing as a Markov decision process (MDP).
 Train the model with representation of indoor environments with manually assigned dirt distributions.
 Evaluate the learned policy on multiple simulation environments.
 Using inhouse developed BELUGA cleaning audit robot, evaluate the performance of the proposed strategy.
3. System Overview
 Simple: Easier to model the problem since an analytical model for dirt distribution is harder to compute.
 Scalable: The dirt accumulation patterns can be simulated easily for training, and the RL model can be scaled to any environment.
 Reliable: The system’s performance depends upon the learned policy for selecting actions and less relies upon limiting factors like sensor accuracy, external lighting, etc.
4. Problem Formulation
4.1. Markov Decision Process (MDP)
4.2. Proximal Policy Optimization (PPO)
4.3. Network Architecture
Algorithm 1 Pseudocode for PPO implementation [29] 

5. Training and Validation
Policy Training and Evaluation
Algorithm 2 Pseudo code for PPO rollout 

6. Results and Discussion
 Success: Robot initiate a sampling action and the location truly have dirt accumulation, or the robot didn’t do a sampling action, and the location have no dirt accumulation
 TypeI error: Robot initiate a sampling action, and the location have no dirt accumulation
 TypeII error: Robot didn’t do a sampling action, and the location have dirt accumulation
7. Conclusions and Futureworks
 Establishing an exhaustive opensource dirt distribution dataset to enhance the training and improvements of the current method
 Chemical and microbial analysis in sample auditing
 A comprehensive comparative study on different onpolicy algorithms in context of cleaning auditing
 Multiagent reinforcement learning for dirtexploration strategy
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
 Truong, N.; Nisar, T.; Knox, D.; Prabhakar, G. The influences of cleanliness and employee attributes on perceived service quality in restaurants in a developing country. Int. J. Cult. Tour. Hosp. Res. 2017, 11, 608–627. [Google Scholar] [CrossRef] [Green Version]
 Cleaning a Nation: Cultivating a Healthy Living Environment. Available online: https://www.clc.gov.sg/researchpublications/publications/urbansystemsstudies/view/cleaninganationcultivatingahealthylivingenvironment (accessed on 2 March 2021).
 Cleaning Industry Analysis 2020Cost & Trends. Available online: https://www.franchisehelp.com/industryreports/cleaningindustryanalysis2020costtrends/ (accessed on 23 June 2021).
 Top Three Commercial Cleaning Trends in 2019. Available online: https://www.wilburncompany.com/topthreecommercialcleaningtrendsin2019/ (accessed on 23 June 2021).
 DiabEl Schahawi, M.; Zingg, W.; Vos, M.; Humphreys, H.; LopezCerero, L.; Fueszl, A.; Zahar, J.R.; Presterl, E. Ultraviolet disinfection robots to improve hospital cleaning: Real promise or just a gimmick? Antimicrob. Resist. Infect. Control 2021, 10, 33. [Google Scholar] [CrossRef] [PubMed]
 Chen, J.; Loeb, S.; Kim, J.H. LED revolution: Fundamentals and prospects for UV disinfection applications. Environ. Sci. Water Res. Technol. 2017, 3, 188–202. [Google Scholar] [CrossRef]
 Arnott, B.; Arnott, M. Automatic Floor Cleaning Machine and Process. U.S. Patent 10,006,192, 26 June 2018. [Google Scholar]
 Martinovs, A.; Mezule, L.; Revalds, R.; Pizica, V.; Denisova, V.; Skudra, A.; Kolcs, G.; Zaicevs, E.; Juhna, T. New device for air disinfection with a shielded UV radiation and ozone. Agron. Res. 2021, 19, 834–846. [Google Scholar] [CrossRef]
 Dammkoehler, D.; Jin, Z. Floor Cleaning Machine. U.S. Patent App. 29/548,203, 25 April 2017. [Google Scholar]
 Fleming, M.; Patrick, A.; Gryskevicz, M.; Masroor, N.; Hassmer, L.; Shimp, K.; Cooper, K.; Doll, M.; Stevens, M.; Bearman, G. Deployment of a touchless ultraviolet light robot for terminal room disinfection: The importance of audit and feedback. Am. J. Infect. Control 2018, 46, 241–243. [Google Scholar] [CrossRef] [PubMed]
 Prabakaran, V.; Mohan, R.E.; Sivanantham, V.; Pathmakumar, T.; Kumar, S.S. Tackling area coverage problems in a reconfigurable floor cleaning robot based on polyomino tiling theory. Appl. Sci. 2018, 8, 342. [Google Scholar] [CrossRef] [Green Version]
 Muthugala, M.; VegaHeredia, M.; Mohan, R.E.; Vishaal, S.R. Design and control of a wall cleaning robot with adhesionawareness. Symmetry 2020, 12, 122. [Google Scholar] [CrossRef] [Green Version]
 Sivanantham, V.; Le, A.V.; Shi, Y.; Elara, M.R.; Sheu, B.J. Adaptive Floor Cleaning Strategy by Human Density Surveillance Mapping with a Reconfigurable MultiPurpose Service Robot. Sensors 2021, 21, 2965. [Google Scholar] [CrossRef]
 Chang, C.L.; Chang, C.Y.; Tang, Z.Y.; Chen, S.T. Highefficiency automatic recharging mechanism for cleaning robot using multisensor. Sensors 2018, 18, 3911. [Google Scholar] [CrossRef] [Green Version]
 Pathmakumar, T.; Sivanantham, V.; Anantha Padmanabha, S.G.; Elara, M.R.; Tun, T.T. Towards an Optimal Footprint Based Area Coverage Strategy for a FalseCeiling Inspection Robot. Sensors 2021, 21, 5168. [Google Scholar] [CrossRef]
 Giske, L.A.L.; Bjørlykhaug, E.; Løvdal, T.; Mork, O.J. Experimental study of effectiveness of robotic cleaning for fishprocessing plants. Food Control 2019, 100, 269–277. [Google Scholar] [CrossRef]
 Lewis, T.; Griffith, C.; Gallo, M.; Weinbren, M. A modified ATP benchmark for evaluating the cleaning of some hospital environmental surfaces. J. Hosp. Infect. 2008, 69, 156–163. [Google Scholar] [CrossRef]
 Asgharian, R.; Hamedani, F.M.; Heydari, A. Step by step how to do cleaning validation. Int. J. Pharm. Life Sci. 2014, 5, 3365. [Google Scholar]
 Malav, S.; Saxena, N. Assessment of disinfection and cleaning validation in central laboratory, MBS hospital, Kota. J. Evol. Med Dent. Sci. 2018, 7, 1259–1263. [Google Scholar] [CrossRef]
 AlHamad, A.; Maxwell, S. How clean is clean? Proposed methods for hospital cleaning assessment. J. Hosp. Infect. 2008, 70, 328–334. [Google Scholar] [CrossRef]
 CloutmanGreen, E.; D’Arcy, N.; Spratt, D.A.; Hartley, J.C.; Klein, N. How clean is clean—Is a new microbiology standard required? Am. J. Infect. Control 2014, 42, 1002–1003. [Google Scholar] [CrossRef]
 Pathmakumar, T.; Kalimuthu, M.; Elara, M.R.; Ramalingam, B. An Autonomous RobotAided Auditing Scheme for Floor Cleaning. Sensors 2021, 21, 4332. [Google Scholar] [CrossRef] [PubMed]
 Smart, W.D.; Kaelbling, L.P. Effective reinforcement learning for mobile robots. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), Washington, DC, USA, 11–15 May 2002; Volume 4, pp. 3404–3410. [Google Scholar]
 Tai, L.; Paolo, G.; Liu, M. Virtualtoreal deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 31–36. [Google Scholar]
 Rivera, P.; Valarezo Añazco, E.; Kim, T.S. Object Manipulation with an Anthropomorphic Robotic Hand via Deep Reinforcement Learning with a Synergy Space of Natural Hand Poses. Sensors 2021, 21, 5301. [Google Scholar] [CrossRef]
 Kozjek, D.; Malus, A.; Vrabič, R. ReinforcementLearningBased Route Generation for HeavyTraffic Autonomous Mobile Robot Systems. Sensors 2021, 21, 4809. [Google Scholar] [CrossRef] [PubMed]
 Pi, C.H.; Dai, Y.W.; Hu, K.C.; Cheng, S. General Purpose LowLevel Reinforcement Learning Control for MultiAxis Rotor Aerial Vehicles. Sensors 2021, 21, 4560. [Google Scholar] [CrossRef] [PubMed]
 Bing, Z.; Lemke, C.; Morin, F.O.; Jiang, Z.; Cheng, L.; Huang, K.; Knoll, A. Perceptionaction coupling target tracking control for a snake robot via reinforcement learning. Front. Neurorobot. 2020, 14, 79. [Google Scholar] [CrossRef]
 Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
 Wang, D.; Fan, T.; Han, T.; Pan, J. A twostage reinforcement learning approach for multiUAV collision avoidance under imperfect sensing. IEEE Robot. Autom. Lett. 2020, 5, 3098–3105. [Google Scholar] [CrossRef]
 Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, PMLR, Bejing, China, 22–24 June 2014; pp. 387–395. [Google Scholar]
 Mousavi, H.K.; Liu, G.; Yuan, W.; Takáč, M.; MuñozAvila, H.; Motee, N. A layered architecture for active perception: Image classification using deep reinforcement learning. arXiv 2019, arXiv:1909.09705. [Google Scholar]
 Hase, H.; Azampour, M.F.; Tirindelli, M.; Paschali, M.; Simson, W.; Fatemizadeh, E.; Navab, N. Ultrasoundguided robotic navigation with deep reinforcement learning. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 5534–5541. [Google Scholar]
 Choi, J.; Park, K.; Kim, M.; Seok, S. Deep reinforcement learning of navigation in a complex and crowded environment with a limited field of view. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 5993–6000. [Google Scholar]
 Pfeiffer, M.; Shukla, S.; Turchetta, M.; Cadena, C.; Krause, A.; Siegwart, R.; Nieto, J. Reinforced imitation: Sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations. IEEE Robot. Autom. Lett. 2018, 3, 4423–4430. [Google Scholar] [CrossRef] [Green Version]
 Niroui, F.; Zhang, K.; Kashino, Z.; Nejat, G. Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments. IEEE Robot. Autom. Lett. 2019, 4, 610–617. [Google Scholar] [CrossRef]
 Zuluaga, J.G.C.; Leidig, J.P.; Trefftz, C.; Wolffe, G. Deep reinforcement learning for autonomous search and rescue. In Proceedings of the NAECON 2018IEEE National Aerospace and Electronics Conference, Dayton, OH, USA, 23–26 July 2018; pp. 521–524. [Google Scholar]
 Hu, J.; Niu, H.; Carrasco, J.; Lennox, B.; Arvin, F. Voronoibased multirobot autonomous exploration in unknown environments via deep reinforcement learning. IEEE Trans. Veh. Technol. 2020, 69, 14413–14423. [Google Scholar] [CrossRef]
 Sampedro, C.; RodriguezRamos, A.; Bavle, H.; Carrio, A.; de la Puente, P.; Campoy, P. A fullyautonomous aerial robot for search and rescue applications in indoor environments using learningbased techniques. J. Intell. Robot. Syst. 2019, 95, 601–627. [Google Scholar] [CrossRef]
 Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
 Alqaraawi, A.; Schuessler, M.; Weiß, P.; Costanza, E.; Berthouze, N. Evaluating saliency map explanations for convolutional neural networks: A user study. In Proceedings of the 25th International Conference on Intelligent User Interfaces, Cagliari, Italy, 17–20 March 2020; pp. 275–285. [Google Scholar]
 Yoo, S.; Jeong, S.; Kim, S.; Jang, Y. SaliencyBased Gaze Visualization for Eye Movement Analysis. Sensors 2021, 21, 5178. [Google Scholar] [CrossRef]
 Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. Highdimensional continuous control using generalized advantage estimation. arXiv 2015, arXiv:1506.02438. [Google Scholar]
 Karlik, B.; Olgac, A.V. Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 2011, 1, 111–122. [Google Scholar]
 Liang, E.; Liaw, R.; Nishihara, R.; Moritz, P.; Fox, R.; Gonzalez, J.; Goldberg, K.; Stoica, I. Ray rllib: A composable and scalable reinforcement learning library. arXiv 2017, arXiv:1712.09381. [Google Scholar]
 Liang, E.; Liaw, R.; Nishihara, R.; Moritz, P.; Fox, R.; Goldberg, K.; Gonzalez, J.; Jordan, M.; Stoica, I. RLlib: Abstractions for distributed reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 3053–3062. [Google Scholar]
 Ketkar, N. Introduction to pytorch. In Deep Learning with Python; Springer: Berlin, Germany, 2017; pp. 195–208. [Google Scholar]
 Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. Openai gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
 Moritz, P.; Nishihara, R.; Wang, S.; Tumanov, A.; Liaw, R.; Liang, E.; Elibol, M.; Yang, Z.; Paul, W.; Jordan, M.I.; et al. Ray: A distributed framework for emerging AI applications. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), Carlsbad, CA, USA, 8–10 October 2018; pp. 561–577. [Google Scholar]
 Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A.Y. ROS: An opensource Robot Operating System. In IEEE International Conference on Robotics and Automation (ICRA) Workshop on Open Source Software; IEEE: Kobe, Japan, 2009; Volume 3, p. 5. [Google Scholar]
 MarinPlaza, P.; Hussein, A.; Martin, D.; Escalera, A.D.L. Global and local path planning study in a ROSbased research platform for autonomous vehicles. J. Adv. Transp. 2018, 2018, 6392697. [Google Scholar] [CrossRef]
 Fox, D.; Burgard, W.; Thrun, S. The dynamic window approach to collision avoidance. IEEE Robot. Autom. Mag. 1997, 4, 23–33. [Google Scholar] [CrossRef] [Green Version]
p  Waypoint in Robot Coordinates  Action Taken by the Robot 

${p}_{0}$  $(0,0)$  Robot stay in the current position 
${p}_{1}$  $(0,0.5)$  Robot move to left 
${p}_{2}$  $(0.5,0)$  Robot move forward 
${p}_{3}$  $(0,0.5)$  Robot move right 
S  Sensor state  
${S}_{0}$  Sensor in idle state  
${S}_{1}$  Sensor performs sampling 
PPOHyperparameters  Value 

discount factor $\gamma $  0.99 
learning rate ${l}_{r}$  0.001 
mini batch size  128 
num_workers  20 
num_sgd_iter  16 
clip_param  0.2 
learning_starts  ${10}^{4}$ 
buffer_size  $5\times {10}^{4}$ 
train_batch_size  32 
Waypoint  Location on Map  Decision  Result 

${W}_{1,0}$  $(1.33317,5.57795)$  Sample  Success 
${W}_{1,1}$  $(1.33591,4.84829)$  Sample  Success 
${W}_{1,2}$  $(1.33509,4.11675)$  Sample  Success 
${W}_{1,3}$  $(1.33080,3.38704)$  Not Sample  TypeII error 
${W}_{1,4}$  $(1.33087,2.66572)$  Sample  Success 
${W}_{1,5}$  $(1.32982,1.93444)$  Sample  Success 
${W}_{1,6}$  $(1.33186,1.21004)$  Sample  Success 
${W}_{1,7}$  $(2.10888,1.204991)$  Not sample  TypeII error 
${W}_{1,8}$  $(2.87477,1.20187)$  Not sample  Success 
${W}_{1,9}$  $(3.64719,1.20291)$  Not sample  Success 
${W}_{1,10}$  $(4.41937,1.20692)$  Not sample  TypeII error 
${W}_{1,11}$  $(5.18784,1.20249)$  Sample  Success 
${W}_{1,12}$  $(5.19334,1.93656)$  Sample  Success 
${W}_{1,13}$  $(5.18933,2.66006)$  Not sample  TypeII error 
${W}_{1,14}$  $(5.96312,2.66641)$  Sample  Success 
Total path length  10.45537 m  
Total time taken  86 s  
Accuracy of decision  73.3% 
Waypoint  Location on Map  Decision  Result 

${W}_{2,0}$  $(1.42324,5.09438)$  Sample  Success 
${W}_{2,1}$  $(1.42660,4.36318)$  Sample  Success 
${W}_{2,2}$  $(1.43137,3.6314)$  Sample  Success 
${W}_{2,3}$  $(1.43219,2.9083)$  Sample  Success 
${W}_{2,4}$  $(1.42612,2.1773)$  Sample  Success 
${W}_{2,5}$  $(1.42348,1.4509)$  Sample  Success 
${W}_{2,6}$  $(1.43123,0.723682)$  Sample  Success 
${W}_{2,7}$  $(2.19640,0.725312)$  Sample  Success 
${W}_{2,8}$  $(2.96881,0.718802)$  Sample  Success 
${W}_{2,9}$  $(3.74668,0.719092)$  Sample  TypeI error 
${W}_{2,10}$  $(3.73843,1.44793)$  Not sample  Success 
${W}_{2,11}$  $(3.74293,2.17497)$  Not sample  Success 
${W}_{2,12}$  $(3.74063,2.90702)$  Not sample  Success 
${W}_{2,13}$  $(4.51069,2.90921)$  Not sample  Success 
${W}_{2,14}$  $(5.28431,2.90556)$  Not sample  Success 
${W}_{2,15}$  $(5.28709,2.17860)$  Not sample  Success 
${W}_{2,16}$  $(6.05995,2.18154)$  Sample  Success 
${W}_{2,17}$  $(6.05311,1.45565)$  Sample  Success 
${W}_{2,18}$  $(5.28317,1.45127)$  Sample  Success 
Total path length  13.41366 m  
Total time taken  93 s  
Accuracy of decision  94.7% 
Waypoint  Location on Map  Decision  Result 

${W}_{3,0}$  $(0.22162,4.77094)$  Sample  Success 
${W}_{3,1}$  $(0.21404,3.88575)$  Sample  Success 
${W}_{3,2}$  $(0.21513,3.15650)$  Not Sample  Success 
${W}_{3,3}$  $(0.22232,2.43131)$  Sample  Success 
${W}_{3,4}$  $(0.21724,1.69945)$  Sample  Success 
${W}_{3,5}$  $(0.21509,0.97136)$  Sample  Success 
${W}_{3,6}$  $(0.21959,0.24172)$  Sample  Success 
${W}_{3,7}$  $(0.98596,0.24564)$  Sample  TypeI error 
${W}_{3,8}$  $(1.76468,0.24177)$  Sample  TypeI error 
${W}_{3,9}$  $(2.53047,0.24148)$  sample  Success 
${W}_{3,10}$  $(2.53720,0.970674)$  Not Sample  Success 
${W}_{3,11}$  $(2.53619,1.70709)$  Not Sample  TypeII error 
${W}_{3,12}$  $(2.53145,2.42771)$  Not Sample  Success 
${W}_{3,13}$  $(3.30528,2.43128)$  Sample  TypeI error 
${W}_{3,14}$  $(3.30776,1.70053)$  Not Sample  Success 
${W}_{3,15}$  $(4.07132,1.70240)$  Sample  Success 
${W}_{3,16}$  $(4.76464,1.71080)$  Sample  Success 
${W}_{3,17}$  $(4.75160,0.978684)$  Sample  Success 
${W}_{3,18}$  $(4.07316,0.97440)$  Not Sample  TypeII error 
Total path length  17.76424 m  
Total time taken  321 s  
Accuracy of exploration  73.68% 
Waypoint  Location on Map  Decision  Result 

${W}_{4,0}$  $(0.20626,3.84880)$  Sample  Success 
${W}_{4,1}$  $(0.20332,3.11855)$  Sample  Success 
${W}_{4,2}$  $(0.20837,2.39423)$  Not Sample  Success 
${W}_{4,3}$  $(0.20265,1.65911)$  Sample  Success 
${W}_{4,4}$  $(0.20471,0.93416)$  Sample  Success 
${W}_{4,5}$  $(0.20817,0.209653)$  Sample  Success 
${W}_{4,6}$  $(0.93839,0.208683)$  Sample  typeI error 
${W}_{4,7}$  $(1.66028,0.202513)$  Sample  Success 
${W}_{4,8}$  $(2.38826,0.204983)$  Sample  Success 
${W}_{4,9}$  $(3.12414,0.209353)$  Not Sample  Success 
${W}_{4,10}$  $(3.84809,0.208553)$  Not Sample  typeII error 
${W}_{4,11}$  $(4.57688,0.205633)$  Sample  Success 
${W}_{4,12}$  $(5.30855,0.205713)$  Sample  Success 
${W}_{4,13}$  $(5.30929,0.932024)$  Not Sample  typeII error 
${W}_{4,14}$  $(5.30382,1.659145)$  Sample  Success 
Total path length  10.19308 m  
Total time taken  137 s  
Accuracy of exploration  80.0% 
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. 
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pathmakumar, T.; Elara, M.R.; Gómez, B.F.; Ramalingam, B. A Reinforcement Learning Based DirtExploration for CleaningAuditing Robot. Sensors 2021, 21, 8331. https://doi.org/10.3390/s21248331
Pathmakumar T, Elara MR, Gómez BF, Ramalingam B. A Reinforcement Learning Based DirtExploration for CleaningAuditing Robot. Sensors. 2021; 21(24):8331. https://doi.org/10.3390/s21248331
Chicago/Turabian StylePathmakumar, Thejus, Mohan Rajesh Elara, Braulio Félix Gómez, and Balakrishnan Ramalingam. 2021. "A Reinforcement Learning Based DirtExploration for CleaningAuditing Robot" Sensors 21, no. 24: 8331. https://doi.org/10.3390/s21248331