Indoor Scene Classification through Dual-Stream Deep Learning: A Framework for Improved Scene Understanding in Robotics
Abstract
:1. Introduction
- A dual-stream deep learning framework is proposed for indoor scene classification
- The local stream exploits a fully convolutional network to extract fine-grained local features
- In the global stream, we modify the original VGG-16 by effectively integrating an atrous spatial pyramid pooling module to capture the global context of the scene.
- We performed experiments on challenging datasets. From the experimental results, we demonstrate the effectiveness of the proposed dual-stream framework.
2. Related Work
2.1. Traditional Methods
2.2. Deep Learning Models
3. Proposed Framework
3.1. Global Stream
3.1.1. Extracting Multi-Scale Features
3.1.2. Extracting Local Contextual Information
3.2. Local Stream
4. Experiment Results
4.1. Dataset
4.2. Evaluation Metrics
5. Ablation Study
- Configuration C1 utilizes the VGG-16 backbone network in the global stream without incorporating multi-scale features or ASPP. Furthermore, this configuration does not include a local stream.
- C2 builds upon C1 by introducing multi-scale processing in the global stream. This configuration does not incorporate an ASPP module and local stream.
- C3 further enhances the global stream by incorporating ASPP along with multi-scale processing; however, a local stream is missing.
- Configuration C4 does not include a global stream; however, it employs a local stream with VGG-16 as the backbone of the FCN and CNN networks.
- Configuration C5 integrates both global and local streams, leveraging the VGG-16 backbone network in the global stream along with multi-scale fusion and ASPP modules. In the local stream, it utilizes the ResNet architecture for both the FCN and CNN networks.
- Configuration C6 adopts ResNet as the backbone network for the global stream, incorporating multi-scale fusion and ASPP modules. Similar to C5, it utilizes ResNet in the local stream as well.
- Configuration C7 utilizes DenseNet in the global stream along with multi-scale fusion and ASPP modules. In the local stream, it employs ResNet.
- Configuration C8 utilizes EfficientNet as the backbone network in the global stream along with multi-scale fusion and ASPP modules and DenseNet in the local stream.
- Configuration C8 utilizes DenseNet as the backbone network in the global stream along with multi-scale fusion and ASPP modules and DenseNet in the local stream.
6. Time Complexity
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Choe, S.; Seong, H.; Kim, E. Indoor place category recognition for a cleaning robot by fusing a probabilistic approach and deep learning. IEEE Trans. Cybern. 2021, 52, 7265–7276. [Google Scholar] [CrossRef]
- Fragapane, G.; De Koster, R.; Sgarbossa, F.; Strandhagen, J.O. Planning and control of autonomous mobile robots for intralogistics: Literature review and research agenda. Eur. J. Oper. Res. 2021, 294, 405–426. [Google Scholar] [CrossRef]
- Ozkil, A.G.; Fan, Z.; Dawids, S.; Aanes, H.; Kristensen, J.K.; Christensen, K.H. Service robots for hospitals: A case study of transportation tasks in a hospital. In Proceedings of the 2009 IEEE International Conference on Automation and Logistics, Shenyang, China, 5–7 August 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 289–294. [Google Scholar]
- Kyrarini, M.; Lygerakis, F.; Rajavenkatanarayanan, A.; Sevastopoulos, C.; Nambiappan, H.R.; Chaitanya, K.K.; Babu, A.R.; Mathew, J.; Makedon, F. A survey of robots in healthcare. Technologies 2021, 9, 8. [Google Scholar] [CrossRef]
- Bertacchini, F.; Bilotta, E.; Pantano, P. Shopping with a robotic companion. Comput. Hum. Behav. 2017, 77, 382–395. [Google Scholar] [CrossRef]
- Garcia Ricardez, G.; Okada, S.; Koganti, N.; Yasuda, A.; Uriguen Eljuri, P.; Sano, T.; Yang, P.C.; El Hafi, L.; Yamamoto, M.; Takamatsu, J.; et al. Restock and straightening system for retail automation using compliant and mobile manipulation. Adv. Robot. 2020, 34, 235–249. [Google Scholar] [CrossRef]
- Javaid, M.; Haleem, A.; Singh, R.P.; Suman, R. Substantial capabilities of robotics in enhancing industry 4.0 implementation. Cogn. Robot. 2021, 1, 58–75. [Google Scholar] [CrossRef]
- Ma, S.; Jiang, H.; Han, M.; Xie, J.; Li, C. Research on automatic parking systems based on parking scene recognition. IEEE Access 2017, 5, 21901–21917. [Google Scholar] [CrossRef]
- Ni, J.; Shen, K.; Chen, Y.; Cao, W.; Yang, S.X. An improved deep network-based scene classification method for self-driving cars. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [Google Scholar] [CrossRef]
- Zhou, H.; Zhou, S. Scene categorization towards urban tunnel traffic by image quality assessment. J. Vis. Commun. Image Represent. 2019, 65, 102655. [Google Scholar] [CrossRef]
- Du, H.; Wang, W.; Wang, X.; Wang, Y. Autonomous landing scene recognition based on transfer learning for drones. J. Syst. Eng. Electron. 2023, 34, 28–35. [Google Scholar] [CrossRef]
- O’Mahony, N.; Campbell, S.; Krpalkova, L.; Riordan, D.; Walsh, J.; Murphy, A.; Ryan, C. Deep learning for visual navigation of unmanned ground vehicles: A review. In Proceedings of the 2018 29th Irish Signals and Systems Conference (ISSC), Belfast, UK, 21–22 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
- Ekici, M.; Seçkin, A.Ç.; Özek, A.; Karpuz, C. Warehouse drone: Indoor positioning and product counter with virtual fiducial markers. Drones 2022, 7, 3. [Google Scholar] [CrossRef]
- Asadi, K.; Suresh, A.K.; Ender, A.; Gotad, S.; Maniyar, S.; Anand, S.; Noghabaei, M.; Han, K.; Lobaton, E.; Wu, T. An integrated UGV-UAV system for construction site data collection. Autom. Constr. 2020, 112, 103068. [Google Scholar] [CrossRef]
- Wijayathunga, L.; Rassau, A.; Chai, D. Challenges and solutions for autonomous ground robot scene understanding and navigation in unstructured outdoor environments: A review. Appl. Sci. 2023, 13, 9877. [Google Scholar] [CrossRef]
- Tagarakis, A.C.; Kalaitzidis, D.; Filippou, E.; Benos, L.; Bochtis, D. 3d scenery construction of agricultural environments for robotics awareness. In Information and Communication Technologies for Agriculture—Theme III: Decision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 125–142. [Google Scholar]
- Zhou, L.; Zhou, Z.; Hu, D. Scene classification using a multi-resolution bag-of-features model. Pattern Recognit. 2013, 46, 424–433. [Google Scholar] [CrossRef]
- Khan, N.Y.; McCane, B.; Wyvill, G. SIFT and SURF performance evaluation against various image deformations on benchmark dataset. In Proceedings of the 2011 International Conference on Digital Image Computing: Techniques and Applications, Noosa, QLD, Australia, 6–8 December 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 501–506. [Google Scholar]
- Ayers, B.; Boutell, M. Home interior classification using SIFT keypoint histograms. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 1–6. [Google Scholar]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Giveki, D. Scale-space multi-view bag of words for scene categorization. Multimed. Tools Appl. 2021, 80, 1223–1245. [Google Scholar] [CrossRef]
- Li, T.; Mei, T.; Kweon, I.S.; Hua, X.S. Contextual bag-of-words for visual categorization. IEEE Trans. Circuits Syst. Video Technol. 2010, 21, 381–392. [Google Scholar] [CrossRef]
- Ergul, E.; Arica, N. Scene classification using spatial pyramid of latent topics. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 3603–3606. [Google Scholar]
- Xie, L.; Lee, F.; Liu, L.; Yin, Z.; Yan, Y.; Wang, W.; Zhao, J.; Chen, Q. Improved spatial pyramid matching for scene recognition. Pattern Recognit. 2018, 82, 118–129. [Google Scholar] [CrossRef]
- Gu, G.; Li, F.; Zhao, Y.; Zhu, Z. Scene classification based on spatial pyramid representation by superpixel lattices and contextual visual features. Opt. Eng. 2012, 51, 017201. [Google Scholar] [CrossRef]
- Labinghisa, B.A.; Lee, D.M. Indoor localization system using deep learning based scene recognition. Multimed. Tools Appl. 2022, 81, 28405–28429. [Google Scholar] [CrossRef]
- Yee, P.S.; Lim, K.M.; Lee, C.P. DeepScene: Scene classification via convolutional neural network with spatial pyramid pooling. Expert Syst. Appl. 2022, 193, 116382. [Google Scholar] [CrossRef]
- Wozniak, P.; Afrisal, H.; Esparza, R.G.; Kwolek, B. Scene recognition for indoor localization of mobile robots using deep CNN. In Proceedings of the Computer Vision and Graphics: International Conference, ICCVG 2018, Warsaw, Poland, 17–19 September 2018; Proceedings. Springer: Berlin/Heidelberg, Germany, 2018; pp. 137–147. [Google Scholar]
- Soroush, R.; Baleghi, Y. NIR/RGB image fusion for scene classification using deep neural networks. Vis. Comput. 2022, 39, 2725–2739. [Google Scholar] [CrossRef]
- Heikel, E.; Espinosa-Leal, L. Indoor scene recognition via object detection and TF-IDF. J. Imaging 2022, 8, 209. [Google Scholar] [CrossRef] [PubMed]
- Biswas, M.; Buckchash, H.; Prasad, D.K. pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems. arXiv 2023, arXiv:2308.06983. [Google Scholar] [CrossRef]
- Swadzba, A.; Wachsmuth, S. Indoor scene classification using combined 3D and gist features. In Asian Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2010; pp. 201–215. [Google Scholar]
- Swadzba, A.; Wachsmuth, S. A detailed analysis of a new 3D spatial feature vector for indoor scene classification. Robot. Auton. Syst. 2014, 62, 646–662. [Google Scholar] [CrossRef]
- Li, X.; Guo, Y. Multi-level adaptive active learning for scene classification. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part VII 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 234–249. [Google Scholar]
- Yu, J.; Tao, D.; Rui, Y.; Cheng, J. Pairwise constraints based multiview features fusion for scene classification. Pattern Recognit. 2013, 46, 483–496. [Google Scholar] [CrossRef]
- Choi, W.; Chao, Y.W.; Pantofaru, C.; Savarese, S. Indoor scene understanding with geometric and semantic contexts. Int. J. Comput. Vis. 2015, 112, 204–220. [Google Scholar] [CrossRef]
- Han, Y.; Liu, G. Efficient learning of sample-specific discriminative features for scene classification. IEEE Signal Process. Lett. 2011, 18, 683–686. [Google Scholar] [CrossRef]
- Zuo, Z.; Wang, G.; Shuai, B.; Zhao, L.; Yang, Q.; Jiang, X. Learning discriminative and shareable features for scene classification. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part I 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 552–568. [Google Scholar]
- Espinace, P.; Kollar, T.; Soto, A.; Roy, N. Indoor scene recognition through object detection. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1406–1413. [Google Scholar]
- Margolin, R.; Zelnik-Manor, L.; Tal, A. Otc: A novel local descriptor for scene classification. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part VII 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 377–391. [Google Scholar]
- Bai, S. Growing random forest on deep convolutional neural networks for scene categorization. Expert Syst. Appl. 2017, 71, 279–287. [Google Scholar] [CrossRef]
- Khan, S.H.; Hayat, M.; Porikli, F. Scene categorization with spectral features. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5638–5648. [Google Scholar]
- Pereira, R.; Gonçalves, N.; Garrote, L.; Barros, T.; Lopes, A.; Nunes, U.J. Deep-learning based global and semantic feature fusion for indoor scene classification. In Proceedings of the 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), Ponta Delgada, Portugal, 15–17 April 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 67–73. [Google Scholar]
- Pereira, R.; Garrote, L.; Barros, T.; Lopes, A.; Nunes, U.J. A deep learning-based indoor scene classification approach enhanced with inter-object distance semantic features. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 32–38. [Google Scholar]
- Seong, H.; Hyun, J.; Kim, E. FOSNet: An end-to-end trainable deep neural network for scene recognition. IEEE Access 2020, 8, 82066–82077. [Google Scholar] [CrossRef]
- Hayat, M.; Khan, S.H.; Bennamoun, M.; An, S. A spatial layout and scale invariant feature representation for indoor scene classification. IEEE Trans. Image Process. 2016, 25, 4829–4841. [Google Scholar] [CrossRef]
- Guo, W.; Wu, R.; Chen, Y.; Zhu, X. Deep learning scene recognition method based on localization enhancement. Sensors 2018, 18, 3376. [Google Scholar] [CrossRef] [PubMed]
- Basu, A.; Petropoulakis, L.; Di Caterina, G.; Soraghan, J. Indoor home scene recognition using capsule neural networks. Procedia Comput. Sci. 2020, 167, 440–448. [Google Scholar] [CrossRef]
- Sun, N.; Zhu, X.; Liu, J.; Han, G. Indoor scene recognition based on deep learning and sparse representation. In Proceedings of the 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Guilin, China, 29–31 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 844–849. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Khan, S.D.; Basalamah, S. Multi-scale person localization with multi-stage deep sequential framework. Int. J. Comput. Intell. Syst. 2021, 14, 1217–1228. [Google Scholar] [CrossRef]
- Zhang, S.; He, G.; Chen, H.B.; Jing, N.; Wang, Q. Scale adaptive proposal network for object detection in remote sensing images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 864–868. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic Differentiation in Pytorch. 2017. Available online: https://openreview.net/pdf/25b8eee6c373d48b84e5e9c6e10e7cbbbce4ac73.pdf?ref=blog.premai.io (accessed on 23 March 2024).
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Othman, K.; Rad, A. SRIN: A new dataset for social robot indoor navigation. Glob. J. Eng. Sci. 2020, 4, 1–6. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Global Stream | |||||
---|---|---|---|---|---|
Layer | No. of Layers | Input Size | Output Size | Kernel Size | # of Channels |
Input | 224 × 224 × 3 | 224 × 224 × 3 | |||
Conv1_x | 2 × Conv | 224 × 224 | 224 × 224 | 3 × 3 | 64 |
Maxpool_1 | 1 × Maxpooling | 224 × 224 | 112 × 112 | 2 × 2 | 64 |
Conv2_x | 2 × Conv | 112 × 112 | 112 × 112 | 3 × 3 | 128 |
Maxpool_2 | 1 × Maxpooling | 112 × 112 | 56 × 56 | 2 × 2 | 128 |
Conv3_x | 3 × Conv | 56 × 56 | 56 × 56 | 3 × 3 | 256 |
Maxpool_3 | 1 × Maxpooling | 56 × 56 | 28 × 28 | 2 × 2 | 256 |
Conv4_x | 3 × Conv | 28 × 28 | 28 × 28 | 3 × 3 | 512 |
Maxpool_4 | 1 × Maxpooling | 28 × 28 | 14 × 14 | 2 × 2 | 512 |
Conv5_x | 3 × Conv | 14 × 14 | 14 × 14 | 3 × 3 | 512 |
Fusion module | 1 × Conv | Input: Conv3_x (56 × 56) | 56 × 56 | 1 × 1 | 512 |
1 × De-Conv | Input:Conv4_x (28 × 28) | 56 × 56 | 2 × 2 | 512 | |
2 × De-Conv | Input: Conv5_x (14 × 14) | 56 × 56 | 2 × 2 | 512 | |
ASPP | 4 × Conv (d = 2, 4, 8,16) | Input: fusion (56 × 56) | 56 × 56 | 3 × 3 | 512 |
Local Stream | |||||
---|---|---|---|---|---|
Layer | No. of Layers | Input Size | Output Size | Kernel Size | # of Channels |
Input | S1: 56 × 56 × 3 S2: 112 × 112 × 3 S3: 224 × 224 × 3 | S1: 56 × 56 × 3 S2: 112 × 112 × 3 S3: 224 × 224 × 3 | |||
Conv1_x | 2 × Conv | S1: 56 × 56 S2: 112 × 112 S3: 224 × 224 | S1: 56 × 56 S2: 112 × 112 S3: 224 × 224 | 3 × 3 | 64 |
Maxpool_1 | 1 × Maxpooling | S1: 56 × 56 S2: 112 × 112 S3: 224 × 224 | S1: 28 × 28 S2: 56 × 56 S3: 112 × 112 | 2 × 2 | 64 |
Conv2_x | 2 × Conv | S1: 28 × 28 S2: 56 × 56 S3: 112 × 112 | S1: 28 × 28 S2: 56 × 56 S3: 112 × 112 | 3 × 3 | 128 |
Maxpool_2 | 1 × Maxpooling | S1: 28 × 28 S2: 56 × 56 S3: 112 × 112 | S1: 14 × 14 S2: 28 × 28 S3: 56 × 56 | 2 × 2 | 128 |
Conv3_x | 3 × Conv | S1: 14 × 14 S2: 28 × 28 S3: 56 × 56 | S1: 14 × 14 S2: 28 × 28 S3: 56 × 56 | 3 × 3 | 256 |
Maxpool_3 | 1 × Maxpooling | S1: 14 × 14 S2: 28 × 28 S3: 56 × 56 | S1: 7 × 7 S2: 14 × 14 S3: 28 × 28 | 2 × 2 | 256 |
Conv4_x | 3 × Conv | S1: 7 × 7 S2: 14 × 14 S3: 28 × 28 | S1: 7 × 7 S2: 14 × 14 S3: 28 × 28 | 3 × 3 | 512 |
Class Name | Precision | Recall | F1-Score |
---|---|---|---|
Bathroom | 1.00 | 1.00 | 1.00 |
Bedroom | 1.00 | 1.00 | 1.00 |
Dining room | 1.00 | 1.00 | 1.00 |
Kitchen | 1.00 | 0.91 | 0.95 |
Living room | 1.00 | 0.93 | 0.96 |
Methods | Bathroom | Bedroom | Dining Room | Kitchen | Living Room | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
AlexNet | 70.4 | 65.7 | 68.0 | 65.7 | 65.9 | 65.8 | 68.2 | 65.7 | 67.0 | 56.3 | 61.1 | 58.6 | 59.4 | 59.7 | 59.5 |
VGG-16 | 75.7 | 74.9 | 75.3 | 75.3 | 71.3 | 73.3 | 73.1 | 72.0 | 72.5 | 67.8 | 65.4 | 66.6 | 68.4 | 64.7 | 66.5 |
ResNet-50 | 89.0 | 76.4 | 82.2 | 84.4 | 76.1 | 80.0 | 81.2 | 79.4 | 80.3 | 79.3 | 75.7 | 77.4 | 79.5 | 78.6 | 79.0 |
EfficientNet | 95.8 | 93.3 | 94.5 | 96.2 | 92.7 | 94.4 | 94.1 | 94.5 | 94.3 | 93.2 | 87.7 | 90.4 | 92.0 | 89.5 | 90.7 |
DenseNet-121 | 90.3 | 92.7 | 91.5 | 90.6 | 89.6 | 90.1 | 93.6 | 89.8 | 91.6 | 84.4 | 85.9 | 85.1 | 87.5 | 86.0 | 86.7 |
ShuffleNet | 74.7 | 72.3 | 73.5 | 73.0 | 72.1 | 72.5 | 69.7 | 75.4 | 72.4 | 64.6 | 68.0 | 66.2 | 65.2 | 64.8 | 65.0 |
MobileNet | 74.7 | 70.5 | 72.5 | 73.9 | 69.4 | 71.5 | 72.2 | 69.0 | 70.6 | 65.9 | 62.9 | 64.3 | 64.8 | 60.2 | 62.4 |
ResNet-152 | 85.2 | 84.3 | 84.7 | 84.2 | 82.4 | 83.3 | 88.5 | 84.3 | 86.4 | 82.2 | 79.3 | 80.7 | 86.9 | 82.3 | 84.5 |
Proposed | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 91.0 | 95.0 | 100.0 | 93.0 | 96.0 |
Configs | Global Stream | Local Stream | Avg P | Avg R | Avg F1 | |||
---|---|---|---|---|---|---|---|---|
Backbone Network | Multi-Scale | ASPP |
FCN- Network |
CNN- Network | ||||
C1 | VGG-16 | No | 72.10 | 69.60 | 70.82 | |||
C2 | Yes | No | No | 79.97 | 76.42 | 78.15 | ||
C3 | Yes | No | 86.64 | 83.82 | 85.20 | |||
C4 | No | VGG-16 | 80.52 | 79.73 | 80.12 | |||
C5 | VGG-16 | Yes | Yes | ResNet | 100.00 | 96.80 | 98.37 | |
C6 | ResNet | Yes | ResNet | 99.07 | 98.14 | 98.60 | ||
C7 | DenseNet | ResNet | 99.64 | 97.76 | 98.69 | |||
C8 | EfficientNet | DenseNet | 98.02 | 97.15 | 97.58 | |||
C9 | DenseNet | DenseNet | 99.89 | 98.14 | 99.00 |
Method | Inference Time (in Seconds) | Avg P | Avg R | Avg F1 |
---|---|---|---|---|
AlexNet | 1.27 | 64.0 | 63.6 | 63.8 |
VGG-16 | 3.14 | 72.1 | 69.6 | 70.8 |
ResNet-50 | 2.56 | 82.7 | 77.2 | 79.8 |
DenseNet-121 | 2.37 | 89.3 | 88.8 | 89.0 |
Proposed | 6.52 | 100.0 | 96.8 | 98.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khan, S.D.; Othman, K.M. Indoor Scene Classification through Dual-Stream Deep Learning: A Framework for Improved Scene Understanding in Robotics. Computers 2024, 13, 121. https://doi.org/10.3390/computers13050121
Khan SD, Othman KM. Indoor Scene Classification through Dual-Stream Deep Learning: A Framework for Improved Scene Understanding in Robotics. Computers. 2024; 13(5):121. https://doi.org/10.3390/computers13050121
Chicago/Turabian StyleKhan, Sultan Daud, and Kamal M. Othman. 2024. "Indoor Scene Classification through Dual-Stream Deep Learning: A Framework for Improved Scene Understanding in Robotics" Computers 13, no. 5: 121. https://doi.org/10.3390/computers13050121
APA StyleKhan, S. D., & Othman, K. M. (2024). Indoor Scene Classification through Dual-Stream Deep Learning: A Framework for Improved Scene Understanding in Robotics. Computers, 13(5), 121. https://doi.org/10.3390/computers13050121