Scene Classification Method Based on Multi-Scale Convolutional Neural Network with Long Short-Term Memory and Whale Optimization Algorithm
Abstract
:1. Introduction
- (1)
- This paper proposes a network capable of indoor scene recognition using 2D LiDAR. It is composed of a multi-scale CNN and LSTM network, which can effectively classify distance features and resolve the long-term dependence issue of neural networks.
- (2)
- A WOA algorithm is used to automatically optimize the initial learning rate, the regularization parameters and the parameters of the LSTM hidden layer of the neural network. The algorithm significantly reduces the time spent on manual parameter tuning, and achieves promising results in scene classification.
2. Scene Recognition Method Based on CNN for 2D LiDAR
2.1. LiDAR Data Preprocessing
2.2. Network with CNN and LTSM
2.2.1. Multi-Scale CNN Network
2.2.2. LSTM Algorithm
2.2.3. WOA Algorithm
3. Results
3.1. Results on Laboratory Datasets
3.1.1. Evaluation Indexes
3.1.2. Ablation Experiments
3.1.3. Experiments of WOA Algorithm
3.2. Results on Public Dataset
3.2.1. Experiments Validation
3.2.2. Comparison of Advanced Algorithms
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015. [Google Scholar]
- Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 2016 European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Sünderhauf, N.; Pham, T.T.; Latif, Y.; Milford, M.; Reid, I. Meaningful maps with object-oriented semantic mapping. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, Reykjavík, Iceland, 4–6 January 2017. [Google Scholar]
- McCormac, J.; Clark, R.; Bloesch, M.; Davison, A.; Leutenegger, S. Fusion++: Volumetric Object-Level SLAM. In Proceedings of the 2018 International Conference on 3D Vision, Verona, Italy, 5–8 September 2018. [Google Scholar]
- Sharma, A.; Dong, W.; Kaess, M. Compositional and Scalable Object SLAM. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation, Xi’an, China, 30 May–5 June 2021. [Google Scholar]
- Ni, J.J.; Shen, K.; Chen, Y.N.; Cao, W.D.; Yang, S.X. An Improved Deep Network-Based Scene Classification Method for Self-Driving Cars. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, Z.; Cheng, Y.; Wang, L.; Tan, T. MAPNet: Multi-modal Attentive Pooling Network for RGB-D Indoor Scene Classification. Pattern Recognit. 2019, 90, 436–449. [Google Scholar] [CrossRef]
- Song, X.H.; Herranz, L.; Jiang, S.Q. Depth CNNs for RGB-D scene recognition: Learning from scratch better than transferring from RGB-CNNs. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Mosella-Montoro, A.; Ruiz-Hidalgo, J. 2D–3D Geometric Fusion network using Multi-Neighbourhood Graph Convolution for RGB-D indoor scene classification. Inf. Fusion 2021, 76, 46–54. [Google Scholar] [CrossRef]
- Zhou, L.G.; Zhou, Y.H.Z.; Qi, X.N.; Hu, J.J.; Lam, T.L.; Xu, Y.S. Attentional Graph Convolutional Network for Structure-Aware Audiovisual Scene Classification. IEEE Trans. Instrum. Meas. 2021, 72, 1–15. [Google Scholar] [CrossRef]
- Mochurad, L.; Hladun, Y.; Tkachenko, R. An Obstacle-Finding Approach for Autonomous Mobile Robots Using 2D LiDAR Data. Big Data Cogn. Comput. 2023, 7, 43. [Google Scholar] [CrossRef]
- Chen, X.Y.L.; Milioto, A.; Palazzolo, E.; Giguere, P.; Behlcy, J.; Stachniss, C. SuMa++: Efficient LiDAR-based Semantic SLAM. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, Macau, China, 4–8 November 2019. [Google Scholar]
- Kosnar, K.; Vonasek, V.; Kulich, M.; Preucil, L. Comparison of shape matching techniques for place recognition. In Proceedings of the 2013 Europe-an Conference on Mobile Robots (ECMR), Barcelona, Spain, 25–27 September 2013. [Google Scholar]
- Mozos, O.M.; Stachniss, C.; Burgard, W. Supervised Learning of Places from Range Data using Adaboost. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005. [Google Scholar]
- Sousa, P.; Araiijo, R.; Nunes, U. Real-Time Labeling of Places using Support Vector Machines. In Proceedings of the 2007 IEEE International Symposium on Industrial Electronics, Vigo, Spain, 4–7 June 2007. [Google Scholar]
- Park, S.; Park, S.K. 2DPCA-based method for place classification using range scan. Electron. Lett. 2011, 47, 1364–1366. [Google Scholar] [CrossRef]
- Kaleci, B.; Şenler, Ç.M.; Dutağacı, H.; Parlaktuna, O. A probabilistic approach for semantic classification using laser range data in indoor environments. In Proceedings of the 2015 International Conference on Advanced Robotics, Istanbul, Turkey, 27–31 July 2015. [Google Scholar]
- Shi, L.; Kodagoda, S.; Dissanayake, G. Laser Range Data Based Semantic Labeling of Places. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010. [Google Scholar]
- Kaleci, B.; Şenler, C.M.; Dutağacı, H.; Parlaktuna, O. Semantic classification of mobile robot locations through 2D laser scans. Intell. Serv. Robot. 2020, 13, 63–85. [Google Scholar] [CrossRef]
- Kaleci, B.; Turgut, K.; Dutagaci, H. 2DLaserNet: A deep learning architecture on 2D laser scans for semantic classification of mobile robot locations. Eng. Sci. Technol. 2022, 28, 101027. [Google Scholar] [CrossRef]
- Yu, S.K.; Yan, F.; Zhuang, Y.; Gu, D.B. A Deep-Learning-Based Strategy for Kidnapped Robot Problem in Similar Indoor Environment. J. Intell. Robot. Syst. 2020, 100, 765–775. [Google Scholar] [CrossRef]
- Goeddel, R.; Olson, E. Learning Semantic Place Labels from Occupancy Grids using CNNs. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, Daejeon, Republic of Korea, 9–14 October 2016. [Google Scholar]
- Nikdel, P.; Chen, M.; Vaughan, R. Recognizing and Tracking High-Level, Human-Meaningful Navigation Features of Occupancy Grid Maps. In Proceedings of the 2020 17th Conference on Computer and Robot Vision, Bangkok, Thailand, 25–28 October 2020. [Google Scholar]
- Zheng, T.; Duan, Z.Z.; Wang, J.; Lu, G.D.; Li, S.J.; Yu, Z.Y. Research on Distance Transform and Neural Network Lidar Information Sampling Classification-Based Semantic Segmentation of 2D Indoor Room Maps. Sensors 2021, 21, 1365. [Google Scholar] [CrossRef] [PubMed]
- Turgut, K.; Kaleci, B. A Deep Learning Architecture for Place Classification in Indoor Environment via 2D Laser Data. In Proceedings of the 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey, 11–13 October 2019. [Google Scholar]
- Liao, Y.Y.; Kodagoda, S.; Wang, Y.; Shi, L.; Liu, Y. Place Classification with a Graph Regularized Deep Neural Network. IEEE Trans. Cogn. Dev. Syst. 2017, 9, 304–315. [Google Scholar] [CrossRef]
- Ulrich, I.; Nourbakhsh, I. Appearance-based place recognition for topological localization. In Proceedings of the 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), San Francisco, CA, USA, 24–28 April 2000. [Google Scholar]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Karen, S.; Andrew, Z. Very Deep Convolutional Networks for Large-Scale Visual Recognition. In Proceedings of the 2015 International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Seyedali, M.; Andrew, L. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar]
- Abdelmunim, H.; Farag, A.A. Elastic Shape Registration using an Incremental Free Form Deformation Approach with the ICP Algorithm. In Proceedings of the 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011. [Google Scholar]
CNN1 | CNN1 + CNN2 | CNN1 + LSTM | CNN1 + CNN2 + LSTM | |
---|---|---|---|---|
Accuracy (%) | 92.44 | 95.56 | 98.30 | 98.91 |
Macro-F1 | 0.9102 | 0.9525 | 0.9804 | 0.9841 |
Ttotal/s | 114.00 | 147.00 | 117.00 | 152.40 |
Tprediction/ms | 0.1036 | 0.1196 | 0.1078 | 0.1236 |
Room 1 | Room 2 | Room 3 | Room 4 | Room 5 | Room 6 | Room 7 | Room 8 | |
---|---|---|---|---|---|---|---|---|
Accuracy (%) | 98.87 | |||||||
Precision (%) | 98.85 | 99.15 | 98.86 | 97.64 | 96.79 | 99.25 | 98.37 | 97.66 |
Recall (%) | 99.42 | 98.64 | 97.74 | 96.12 | 95.57 | 99.25 | 97.73 | 100.00 |
F1 | 0.9899 | 0.9855 | 0.9915 | 0.9842 | 0.9688 | 0.9888 | 0.9951 | 0.9966 |
Macro-F1 | 0.9876 |
Room 1 | Room 2 | Room 3 | Room 4 | Room 5 | Room 6 | |
Accuracy (%) | 93.27 | |||||
Precision (%) | 80.00 | 100.00 | 92.31 | 83.87 | 93.05 | 95.65 |
Recall (%) | 100.00 | 84.62 | 85.71 | 86.67 | 91.30 | 95.65 |
F1 | 0.8889 | 0.9167 | 0.8889 | 0.8525 | 0.9217 | 0.9565 |
Macro-F1 | 0.9080 | |||||
Room 7 | Room 8 | Room 9 | Room 10 | Room 11 | Room 12 | |
Accuracy (%) | 93.27 | |||||
Precision (%) | 84.62 | 81.82 | 93.02 | 94.74 | 95.45 | 97.44 |
Recall (%) | 91.67 | 69.23 | 97.50 | 90.00 | 94.42 | 100.00 |
F1 | 0.8800 | 0.7500 | 0.9521 | 0.9231 | 0.9493 | 0.9870 |
Macro-F1 | 0.9080 |
Room 1 | Room 2 | Room 3 | Room 4 | Room 5 | Room 6 | |
Accuracy (%) | 94.35 | |||||
Precision (%) | 91.67 | 86.84 | 100.00 | 94.57 | 93.06 | 91.04 |
Recall (%) | 91.67 | 85.71 | 93.48 | 96.67 | 97.10 | 87.14 |
F1 | 0.9167 | 0.8627 | 0.9500 | 0.9560 | 0.9504 | 0.8905 |
Macro-F1 | 0.9312 | |||||
Room 7 | Room 8 | Room 9 | Room 10 | Room 11 | Room 12 | |
Accuracy (%) | 94.35 | |||||
Precision (%) | 90.91 | 100.00 | 91.60 | 91.94 | 96.86 | 97.93 |
Recall (%) | 83.33 | 89.74 | 97.56 | 96.61 | 94.87 | 99.47 |
F1 | 0.8696 | 0.9459 | 0.9449 | 0.9421 | 0.9585 | 0.9869 |
Macro-F1 | 0.9312 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ran, Y.; Xu, X.; Luo, M.; Yang, J.; Chen, Z. Scene Classification Method Based on Multi-Scale Convolutional Neural Network with Long Short-Term Memory and Whale Optimization Algorithm. Remote Sens. 2024, 16, 174. https://doi.org/10.3390/rs16010174
Ran Y, Xu X, Luo M, Yang J, Chen Z. Scene Classification Method Based on Multi-Scale Convolutional Neural Network with Long Short-Term Memory and Whale Optimization Algorithm. Remote Sensing. 2024; 16(1):174. https://doi.org/10.3390/rs16010174
Chicago/Turabian StyleRan, Yingying, Xiaobin Xu, Minzhou Luo, Jian Yang, and Ziheng Chen. 2024. "Scene Classification Method Based on Multi-Scale Convolutional Neural Network with Long Short-Term Memory and Whale Optimization Algorithm" Remote Sensing 16, no. 1: 174. https://doi.org/10.3390/rs16010174
APA StyleRan, Y., Xu, X., Luo, M., Yang, J., & Chen, Z. (2024). Scene Classification Method Based on Multi-Scale Convolutional Neural Network with Long Short-Term Memory and Whale Optimization Algorithm. Remote Sensing, 16(1), 174. https://doi.org/10.3390/rs16010174