Urban Sound Recognition in Smart Cities Using an IoT–Fog Computing Framework and Deep Learning Models: A Performance Comparison
Abstract
1. Introduction
2. IoT and Smart City
2.1. Internet of Things (IoT)
2.2. Smart City
3. Material and Methodology
3.1. Data Set
3.2. Methodology
4. Experimental Results
4.1. CNN Model Performance Results
4.2. LSTM Model Performance Results
4.3. Dense Model Performance Results
5. Discussion
6. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- United Nations. United Nations Department of Economic and Social Affairs; United Nations: New York, NY, USA, 2018. [Google Scholar]
- Al-Turjman, F.; Zahmatkesh, H.; Shahroze, R. An overview of security and privacy in smart cities’ IoT communications. Trans. Emerg. Telecommun. Technol. 2019, 33, e3677. [Google Scholar] [CrossRef]
- Bibri, S.E.; Krogstie, J. The emerging data-driven Smart City and its innovative applied solutions for sustainability: The cases of London and Barcelona. Energy Inform. 2020, 3, 5. [Google Scholar] [CrossRef]
- Jasim, N.A.; TH, H.; Rikabi, S.A. Design and Implementation of Smart City Applications Based on the Internet of Things. Int. J. Interact. Mob. Technol. 2021, 15, 4–15. [Google Scholar]
- Rana, O.; Theodorou, M.; Zhao, L. Scalable real-time urban sound classification using fog computing. J. Parallel Distrib. Comput. 2019, 132, 62–72. [Google Scholar]
- Atzori, L.; Iera, A.; Morabito, G. The internet of things: A survey. Comput. Netw. 2010, 54, 2787–2805. [Google Scholar] [CrossRef]
- Al-Fuqaha, A.; Guizani, M.; Mohammadi, M.; Aledhari, M.; Ayyash, M. Internet of things: A survey on enabling technologies, protocols, and applications. IEEE Commun. Surv. Tutor. 2015, 17, 2347–2376. [Google Scholar] [CrossRef]
- Chataut, R.; Phoummalayvane, A.; Akl, R. Unleashing the power of IoT: A comprehensive review of IoT applications and future prospects in healthcare, agriculture, smart homes, smart cities, and industry 4.0. Sensors 2023, 23, 7194. [Google Scholar] [CrossRef] [PubMed]
- AlJamal, M.; Mughaid, A.; Bani-Salameh, H.; Alzubi, S.; Abualigah, L. Optimizing risk mitigation: A simulation-based model for detecting fake IoT clients in smart city environments. Sustain. Comput. Inform. Syst. 2024, 43, 101019. [Google Scholar] [CrossRef]
- Bonomi, F.; Milito, R.; Zhu, J.; Addepalli, S. Fog computing and its role in the internet of things. In Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing, New York, NY, USA, 17 August 2012; pp. 13–16. [Google Scholar]
- Tan, E.L.; Karnapi, F.A.; Ng, L.J.; Ooi, K.; Gan, W.S. Extracting urban sound information for residential areas in smart cities using an end-to-end IoT system. IEEE Internet Things J. 2021, 8, 14308–14321. [Google Scholar] [CrossRef]
- Piczak, K.J. Environmental sound classification with convolutional neural networks. In Proceedings of the 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, MA, USA, 17–20 September 2015; pp. 1–6. [Google Scholar]
- Zhang, C.; Liu, H.; Chen, Z. Distributed edge AI for urban sound recognition in smart cities. IEEE Internet Things J. 2022, 9, 835–846. [Google Scholar]
- Mahmud, R.; Kotagiri, R.; Buyya, R. Fog computing: A taxonomy, survey and future directions. In Internet of Everything; Springer: Berlin/Heidelberg, Germany, 2020; pp. 103–130. [Google Scholar]
- Zhao, Z.; Peng, Y.; Chen, Y.; Hu, Y. Energy-efficient fog computing for real-time speech recognition in IoT systems. IEEE Access 2018, 6, 31900–31911. [Google Scholar]
- Zhang, W.; Li, H.; Wang, X. Distributed fog computing for speech recognition with privacy protection in IoT. IEEE Commun. Mag. 2017, 55, 125–131. [Google Scholar]
- Zhao, G.; Pang, B.; Xu, Z.; Peng, D.; Zuo, D. Urban flood susceptibility assessment based on convolutional neural networks. J. Hydrol. 2020, 590, 125235. [Google Scholar] [CrossRef]
- Graves, A. Generating sequences with recurrent neural networks. arXiv 2013, arXiv:1308.0850. [Google Scholar]
- Sainath, T.N.; Weiss, R.J.; Senior, A.W.; Wilson, K.W.; Vinyals, O. Learning the speech front-end with raw waveform CLDNNs. In Interspeech; Google, Inc.: New York, NY, USA, 2015; pp. 1–5. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, Z.; Liu, C.; Fei, H.; Li, W.; Yu, J.; Cao, Y. Urban sound classification based on 2-order dense convolutional network using dual features. Appl. Acoust. 2020, 164, 107243. [Google Scholar] [CrossRef]
- Alotaibi, S.; Khan, M.K. Energy-efficient IoT-based speech recognition system using fog computing in smart homes. Sustain. Cities Soc. 2020, 55, 102045. [Google Scholar]
- Wang, J.; Wang, Z.; Zhang, P. Real-time speech recognition using fog computing for smart homes. J. Cloud Comput. 2021, 10, 1–13. [Google Scholar]
- Turgut, Z. Mobility Management for the Internet of Things. Ph.D. Thesis, Computer Engineering Programme, Department of Computer Engineering, Institute of Science, Istanbul University, Istanbul, Türkiye, 2018. [Google Scholar]
- Marjani, M.; Nasaruddin, F.; Gani, A.; Karim, A.; Hashem, I.A.T.; Siddiqa, A.; Yaqoob, I. Big IoT data analytics: Architecture, opportunities, and open research challenges. IEEE Access 2017, 5, 5247–5261. [Google Scholar]
- Xhaferra, E.; Ismaili, F.; Cina, E.; Mitre, A. A conceptual framework for leveraging cloud and fog computing in diabetes prediction via machine learning algorithms: A proposed implementation. J. Theor. Appl. Inf. Technol. 2024, 102, 6004–6026. [Google Scholar]
- Kaya, Ş.M.; Erdem, A.; Güneş, A. A smart data pre-processing approach to effective management of big health data in IoT edge. Smart Homecare Technol. TeleHealth 2021, 8, 9–21. [Google Scholar] [CrossRef]
- Li, S.; Choo, K.K.R.; Sun, Q.; Buchanan, W.J.; Cao, J. IoT forensics: Amazon echo as a use case. IEEE Internet Things J. 2019, 6, 6487–6497. [Google Scholar] [CrossRef]
- Sun, Y.; Wu, X.; Zhou, Q.; Yu, R. Fog computing and its applications for Internet of Things: A review. IEEE Access 2021, 9, 11734–11745. [Google Scholar]
- Kavre, M.; Gadekar, A.; Gadhade, Y. Internet of Things (IoT): A survey. In Proceedings of the 2019 IEEE Pune Section International Conference (PuneCon), Pune, India, 18–20 December 2019; pp. 1–6. [Google Scholar]
- Benites, A.J.; Simões, A.F. Assessing the urban sustainable development strategy: An application of a smart city services sustainability taxonomy. Ecol. Indic. 2021, 127, 107734. [Google Scholar] [CrossRef]
- Dahiya, S.; Chowdhury, R.; Tao, W.; Kumar, P. Biomass and lipid productivity by two algal strains of chlorella sorokiniana grown in hydrolysate of water hyacinth. Energies 2021, 14, 1411. [Google Scholar] [CrossRef]
- Yigitcanlar, T.; Desouza, K.C.; Butler, L.; Roozkhosh, F. Contributions and risks of artificial intelligence (AI) in building smarter cities: Insights from a systematic review of the literature. Energies 2020, 13, 1473. [Google Scholar] [CrossRef]
- Hollands, R.G. Will the real smart city please stand up? Intelligent, progressive, or entrepreneurial? In The Routledge Companion to Smart Cities; Routledge: Oxfordshire, UK, 2020; pp. 179–199. [Google Scholar]
- Tang, C.; Xia, S.; Liu, C.; Wei, X.; Bao, Y.; Chen, W. Fog-enabled smart campus: Architecture and challenges. In Proceedings of the Security and Privacy in New Computing Environments: Second EAI International Conference, SPNCE 2019, Tianjin, China, 13–14 April 2019; pp. 605–614. [Google Scholar]
- Baucas, M.J.; Spachos, P. Using cloud and fog computing for large scale IoT-based urban sound classification. Simul. Model. Pract. Theory 2020, 101, 102013. [Google Scholar] [CrossRef]
- Tokozume, Y.; Harada, T. Learning environmental sounds with end-to-end convolutional neural network. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017. [Google Scholar]
- Hershey, S.; Chaudhuri, S.; Ellis, D.P.; Gemmeke, J.F.; Jansen, A.; Moore, R.C.; Plakal, M.; Platt, D.; Saurous, R.A.; Seybold, B.; et al. CNN architectures for large-scale audio classification. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 131–135. [Google Scholar]
- Boddapati, V.; Petef, A.; Rasmusson, J.; Lundberg, L. Classifying environmental sounds using image recognition networks. Procedia Comput. Sci. 2017, 112, 2048–2056. [Google Scholar] [CrossRef]
- Costa, Y.M.; Oliveira, L.S.; Silla, C.N., Jr. An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. 2017, 52, 28–38. [Google Scholar] [CrossRef]
- Ateya, A.A.; Soliman, N.F.; Alkanhel, R.; Alhussan, A.A.; Muthanna, A.; Koucheryavy, A. Lightweight deep learning-based model for traffic prediction in fog-enabled dense deployed iot networks. J. Electr. Eng. Technol. 2023, 18, 2275–2285. [Google Scholar] [CrossRef]
- Peng, B.; Abdulla, W.H.; Kevin, I.; Wang, K. Urban Noise Monitoring using Edge Computing with CNN-LSTM on Jetson Nano. In Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taipei, Taiwan, 31 October–3 November 2023; pp. 2244–2250. [Google Scholar]
- Giannakopoulos, T.; Pikrakis, A. Introduction to Audio Analysis; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
- Su, Y.; Zhang, K.; Wang, J.; Madani, K. Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors 2019, 19, 1733. [Google Scholar] [CrossRef]
- Zhang, Z.; Xu, S.; Cao, S.; Zhang, S. Deep convolutional neural network with mixup for environmental sound classification. In Pattern Recognition and Computer Vision: First Chinese Conference, PRCV 2018, Guangzhou, China, 23–26 November 2018; Proceedings, Part II; Springer Nature: Cham, Switzerland, 2018. [Google Scholar]
- Su, Y.; Vosoughi, A.; Deng, S.; Tian, Y.; Xu, C. Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation. arXiv 2023, arXiv:2310.11713. [Google Scholar]
- Zhang, H.; Guan, J.; Zhu, Q.; Xiao, F.; Liu, Y. Anomalous sound detection using self-attention-based frequency pattern analysis of machine sounds. arXiv 2023, arXiv:2308.14063. [Google Scholar]
- Abdoli, S.; Cardinal, P.; Koerich, A.L. End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst. Appl. 2019, 136, 252–263. [Google Scholar] [CrossRef]
- Rahman, A.A.; Angel Arul Jothi, J. Classification of urbansound8k: A study using convolutional neural network and multiple data augmentation techniques. In Proceedings of the Soft Computing and its Engineering Applications: Second International Conference, icSoftComp 2020, Changa, Anand, India, 11–12 December 2020; Proceedings 2. Springer: Singapore, 2021; pp. 52–64. [Google Scholar]
- Yildirim, M. Automatic classification of environmental sounds with MFCC method and proposed deep model. Fırat Univ. J. Eng. Sci. 2022, 34, 449–457. [Google Scholar]
- Lezhenin, I.; Bogach, N.; Pyshkin, E. Urban sound classification using long short-term memory neural network. In Proceedings of the 2019 Federated Conference on Computer Science and Information Systems (FedCSIS), Leipzig, Germany, 1–4 September 2019; pp. 57–60. [Google Scholar]
- Barua, S.; Akter, T.; Musa, M.A.S.; Azim, M.A.A. Deep Learning Approach for Urban Sound Classification. Int. J. Comput. Appl. 2023, 975, 8887. [Google Scholar] [CrossRef]
- Mohaimenuzzaman, M.; Bergmeir, C.; West, I.; Meyer, B. Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices. Pattern Recognit. 2023, 133, 109025. [Google Scholar] [CrossRef]
- Deperlioglu, O.; Kose, U.; Gupta, D.; Khanna, A.; Sangaiah, A.K. Diagnosis of heart diseases by a secure internet of health things system based on autoencoder deep neural network. Comput. Commun. 2020, 162, 31–50. [Google Scholar] [CrossRef]
- Palanisamy, K.; Singhania, D.; Yao, A. Rethinking CNN models for audio classification. arXiv 2020, arXiv:2007.11154. [Google Scholar]






| Class | Precision | Recall | F1-Score | Support | 
|---|---|---|---|---|
| Air_conditioner | 0.93 | 0.93 | 0.93 | 195 | 
| Car_horn | 0.95 | 0.95 | 0.95 | 91 | 
| Children_playing | 0.81 | 0.84 | 0.83 | 205 | 
| Dog_bark | 0.84 | 0.82 | 0.83 | 182 | 
| Drilling | 0.86 | 0.92 | 0.89 | 202 | 
| Engine_idling | 0.93 | 0.94 | 0.94 | 216 | 
| Gun_shot | 0.97 | 0.85 | 0.91 | 87 | 
| Jackhammer | 0.95 | 0.92 | 0.93 | 187 | 
| Siren | 0.97 | 0.93 | 0.95 | 199 | 
| Street_music | 0.84 | 0.86 | 0.85 | 183 | 
| Accuracy | 0.90 | 0.90 | 0.90 | 1747 | 
| Macro avg | 0.90 | 0.90 | 0.90 | 1747 | 
| Weighted avg | 0.90 | 0.90 | 0.90 | 1747 | 
| Class | Precision | Recall | F1-Score | Support | 
|---|---|---|---|---|
| Air_conditioner | 0.89 | 0.90 | 0.89 | 195 | 
| Car_horn | 0.88 | 0.80 | 0.84 | 91 | 
| Children_playing | 0.66 | 0.74 | 0.70 | 205 | 
| Dog_bark | 0.71 | 0.65 | 0.68 | 182 | 
| Drilling | 0.87 | 0.83 | 0.85 | 202 | 
| Engine_idling | 0.94 | 0.88 | 0.90 | 216 | 
| Gun_shot | 0.83 | 0.75 | 0.79 | 87 | 
| Jackhammer | 0.87 | 0.88 | 0.88 | 187 | 
| Siren | 0.85 | 0.84 | 0.85 | 199 | 
| Street music | 0.70 | 0.81 | 0.75 | 183 | 
| Accuracy | 0.81 | 0.81 | 0.81 | 1747 | 
| Macro avg | 0.82 | 0.81 | 0.81 | 1747 | 
| Weighted avg | 0.82 | 0.81 | 0.81 | 1747 | 
| Class | Precision | Recall | F1-Score | Support | 
|---|---|---|---|---|
| Air_conditioner | 0.75 | 0.97 | 0.84 | 195 | 
| Car_horn | 0.97 | 0.86 | 0.91 | 91 | 
| Children_playing | 0.69 | 0.68 | 0.68 | 205 | 
| Dog_bark | 0.76 | 0.77 | 0.76 | 182 | 
| Drilling | 0.93 | 0.86 | 0.89 | 202 | 
| Engine_idling | 0.91 | 0.94 | 0.93 | 216 | 
| Gun_shot | 0.96 | 0.63 | 0.76 | 87 | 
| Jackhammer | 0.89 | 0.93 | 0.91 | 187 | 
| Siren | 0.96 | 0.90 | 0.93 | 199 | 
| Street music | 0.78 | 0.75 | 0.77 | 183 | 
| Accuracy | 0.84 | 0.84 | 0.84 | 1747 | 
| Macro avg | 0.86 | 0.83 | 0.84 | 1747 | 
| Weighted avg | 0.85 | 0.84 | 0.84 | 1747 | 
| Model | Metric | Air Conditioner | Car Horn | Children Playing | Dog Bark | Drilling | Engine Idling | Gunshot | Jackhammer | Siren | Street Music | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| CNN | Accuracy | 0.93 | 0.95 | 0.81 | 0.84 | 0.86 | 0.93 | 0.97 | 0.95 | 0.97 | 0.84 | 
| Precision | 0.93 | 0.95 | 0.84 | 0.82 | 0.92 | 0.94 | 0.85 | 0.92 | 0.93 | 0.86 | |
| Recall | 0.93 | 0.95 | 0.83 | 0.83 | 0.89 | 0.94 | 0.91 | 0.93 | 0.95 | 0.85 | |
| F1-score | 0.89 | 0.88 | 0.66 | 0.71 | 0.87 | 0.94 | 0.83 | 0.87 | 0.85 | 0.7 | |
| LSTM | Accuracy | 0.9 | 0.8 | 0.74 | 0.65 | 0.83 | 0.88 | 0.75 | 0.88 | 0.84 | 0.81 | 
| Precision | 0.89 | 0.84 | 0.7 | 0.68 | 0.85 | 0.9 | 0.79 | 0.88 | 0.85 | 0.75 | |
| Recall | 0.75 | 0.97 | 0.69 | 0.76 | 0.93 | 0.91 | 0.96 | 0.89 | 0.96 | 0.78 | |
| F1-score | 0.97 | 0.86 | 0.68 | 0.77 | 0.86 | 0.94 | 0.63 | 0.93 | 0.9 | 0.75 | |
| Dense | Accuracy | 0.84 | 0.91 | 0.68 | 0.76 | 0.89 | 0.93 | 0.76 | 0.91 | 0.93 | 0.77 | 
| Precision | 0.93 | 0.95 | 0.81 | 0.84 | 0.86 | 0.93 | 0.97 | 0.95 | 0.97 | 0.84 | |
| Recall | 0.93 | 0.95 | 0.84 | 0.82 | 0.92 | 0.94 | 0.85 | 0.92 | 0.93 | 0.86 | |
| F1-score | 0.93 | 0.95 | 0.83 | 0.83 | 0.89 | 0.94 | 0.91 | 0.93 | 0.95 | 0.85 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
İşler, B. Urban Sound Recognition in Smart Cities Using an IoT–Fog Computing Framework and Deep Learning Models: A Performance Comparison. Appl. Sci. 2025, 15, 1201. https://doi.org/10.3390/app15031201
İşler B. Urban Sound Recognition in Smart Cities Using an IoT–Fog Computing Framework and Deep Learning Models: A Performance Comparison. Applied Sciences. 2025; 15(3):1201. https://doi.org/10.3390/app15031201
Chicago/Turabian Styleİşler, Buket. 2025. "Urban Sound Recognition in Smart Cities Using an IoT–Fog Computing Framework and Deep Learning Models: A Performance Comparison" Applied Sciences 15, no. 3: 1201. https://doi.org/10.3390/app15031201
APA Styleİşler, B. (2025). Urban Sound Recognition in Smart Cities Using an IoT–Fog Computing Framework and Deep Learning Models: A Performance Comparison. Applied Sciences, 15(3), 1201. https://doi.org/10.3390/app15031201
 
        


 
       