Speech Identification and Comprehension in the Urban Soundscape
Abstract
:1. Introduction
2. Related Work
3. Experimental Setup
4. Results
4.1. Word Spotting and Question Answering
4.2. Masker Characterisation
5. Discussion
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Hammer, M.S.; Swinburn, T.K.; Neitzel, R.L. Environmental Noise Pollution in the United States: Developing an Effective Public Health Response. Environ. Health Perspect. 2014, 122, 115–119. [Google Scholar] [CrossRef] [PubMed]
- Sørensen, M.; Andersen, Z.J.; Nordsborg, R.B.; Becker, T.; Tjønneland, A.; Overvad, K.; Raaschou-Nielsen, O. Long-Term Exposure to Road Traffic Noise and Incident Diabetes: A Cohort Study. Environ. Health Perspect. 2013, 121, 217–222. [Google Scholar] [PubMed]
- Passchier-Vermeer, W.; Passchier, W.F. Noise exposure and public health. Environ. Health Perspect. 2010, 108, 123–131. [Google Scholar] [CrossRef]
- Marchegiani, L.; Posner, I. Leveraging the urban soundscape: Auditory perception for smart vehicles. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 6547–6554. [Google Scholar]
- Meucci, F.; Pierucci, L.; Re, E.D.; Lastrucci, L.; Desii, P. A real-time siren detector to improve safety of guide in traffic environment. In Proceedings of the 16th European Signal Processing Conference, Lausanne, Switzerland, 25–29 August 2008; pp. 1–5. [Google Scholar]
- Schröder, J.; Goetze, S.; Grützmacher, V.; Anemüller, J. Automatic acoustic siren detection in traffic noise by part-based models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 493–497. [Google Scholar]
- Lidestam, B.; Holgersson, J.; Moradi, S. Comparison of informational vs. energetic masking effects on speechreading performance. Front. Psychol. 2014, 5, 639. [Google Scholar] [CrossRef] [PubMed]
- International Organization for Standardization. Ergonomics—Assessment of Speech Communication. ISO9921. 2013. Available online: https://www.iso.org/standard/33589.html (accessed on 4 May 2018).
- Stone, M.A.; Füllgrabe, C.; Mackinnon, R.C.; Moore, B.C.J. The importance for speech intelligibility of random fluctuations in “steady” background noise. J. Acoust. Soc. Am. 2011, 130, 2874–2881. [Google Scholar] [CrossRef] [PubMed]
- Marchegiani, L.; Fafoutis, X. A Behavioral Study on the Effects of Rock Music on Auditory Attention. In Proceedings of the International Workshop on Human Behavior Understanding, Barcelona, Spain, 22 October 2013; pp. 15–26. [Google Scholar]
- Moore, B.C.J.; Gockel, H. Factors Influencing Sequential Stream Segregation. Acta Acust. United Acust. 2002, 88, 320–333. [Google Scholar]
- Cooke, M.; Lecumberri, M.L.G.; Barker, J. The foreign language cocktail party problem: Energetic and informational masking effects in non-native speech perception. J. Acoust. Soc. Am. 2008, 123, 414–427. [Google Scholar] [CrossRef] [PubMed]
- Marchegiani, L.; Fafoutis, X. On cross-language consonant identification in second language noise. J. Acoust. Soc. Am. 2015, 138, 2206–2209. [Google Scholar] [CrossRef] [PubMed]
- Levitt, H. Noise reduction in hearing aids: A review. J. Rehabilit. Res. Dev. 2001, 21, 111–121. [Google Scholar]
- Kochkin, S. MarkeTrak V: “Why my hearing aids are in the drawer” The consumers’ perspective. Hear. J. 2000, 52, 34–41. [Google Scholar] [CrossRef]
- Bronkhorst, A.W. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acta Acust. United Acust. 2000, 86, 117–128. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Lyon, R.F.; Katsiamis, A.G.; Drakakis, E.M. History and future of auditory filter models. In Proceedings of the IEEE International Symposium on Circuits and Systems, Paris, France, 30 May–2 June 2010; pp. 3809–3812. [Google Scholar]
- Van Engen, K.J.; Bradlow, A.R. Sentence recognition in native-and foreign-language multi-talker background noise. J. Acoust. Soc. Am. 2007, 121, 519–526. [Google Scholar] [CrossRef] [PubMed]
- Lecumberri, M.G.; Cooke, M. Effect of masker type on native and non-native consonant perception in noise. J. Acoust. Soc. Am. 2006, 119, 2445–2454. [Google Scholar] [CrossRef]
- Zekveld, A.A.; Kramer, S.E.; Festen, J.M. Cognitive load during speech perception in noise: The influence of age, hearing loss, and cognition on the pupil response. Ear Hear. 2011, 32, 498–510. [Google Scholar] [CrossRef] [PubMed]
- Warren, P.S.; Katti, M.; Ermann, M.; Brazel, A. Urban bioacoustics: It’s not just noise. Anim. Behav. 2006, 71, 491–502. [Google Scholar] [CrossRef]
- Stansfeld, S.; Haines, M.; Brown, B. Noise and health in the urban environment. Rev. Environ. Health 2000, 15, 43–82. [Google Scholar] [CrossRef] [PubMed]
- Yang, W.; Kang, J. Acoustic comfort evaluation in urban open public spaces. Appl. Acoust. 2005, 66, 211–229. [Google Scholar] [CrossRef]
- Yang, W.; Kang, J. Soundscape and sound preferences in urban squares: A case study in Sheffield. J. Urban Des. 2005, 10, 61–80. [Google Scholar] [CrossRef]
- Holmes, E.; Folkeard, P.; Johnsrude, I.S.; Scollie, S. Semantic context improves speech intelligibility and reduces listening effort for listeners with hearing impairment. Int. J. Audiol. 2018. [Google Scholar] [CrossRef] [PubMed]
- Miller, G.A.; Heise, G.A.; Lichten, W. The intelligibility of speech as a function of the context of the test materials. J. Exp. Psychol. 1951, 41, 329. [Google Scholar] [CrossRef] [PubMed]
- Fontan, L.; Tardieu, J.; Gaillard, P.; Woisard, V.; Ruiz, R. Relationship between speech intelligibility and speech comprehension in babble noise. J. Speech Lang. Hear. Res. 2015, 58, 977–986. [Google Scholar] [CrossRef] [PubMed]
- Davies, W.; Mahnken, P.; Gamble, P.; Plack, C. Measuring and mapping soundscape speech intelligibility. In Proceedings of the Euronoise 2009, Edinburgh, UK, 26–28 October 2009. [Google Scholar]
- Astolfi, A.; Bottalico, P.; Barbato, G. Subjective and objective speech intelligibility investigations in primary school classrooms. J. Acoust. Soc. Am. 2012, 131, 247–257. [Google Scholar] [CrossRef] [PubMed]
- Cooke, M. A glimpsing model of speech perception in noise. J. Acoust. Soc. Am. 2006, 119, 1562–1573. [Google Scholar] [CrossRef] [PubMed]
- Taal, C.H.; Hendriks, R.C.; Heusdens, R.; Jensen, J. An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 2011, 19, 2125–2136. [Google Scholar] [CrossRef]
- Stoet, G. PsyToolkit: A software package for programming psychological experiments using Linux. Behav. Res. Methods 2010, 42, 1096–1104. [Google Scholar] [CrossRef] [PubMed]
- Stoet, G. PsyToolkit: A Novel Web-Based Method for Running Online Questionnaires and Reaction-Time Experiments. Teach. Psychol. 2017, 44, 24–31. [Google Scholar] [CrossRef]
- Salamon, J.; Jacoby, C.; Bello, J.P. A Dataset and Taxonomy for Urban Sound Research. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3 November 2014. [Google Scholar]
- Davies, M. BYU-BNC; Based on the British National Corpus from Oxford University Press; Oxford University Press: Oxford, UK, 2004; Available online: https://corpus.byu.edu/bnc/ (accessed on 4 May 2018).
- Holdsworth, J.; Nimmo-Smith, I.; Patterson, R.; Rice, P. Implementing a Gammatone Filter Bank. Available online: https://www.pdn.cam.ac.uk/other-pages/cnbh/files/publications/SVOSAnnexC1988.pdf (accessed on 15 March 2018).
- Kjems, U.; Boldt, J.B.; Pedersen, M.S.; Lunner, T.; Wang, D. Role of mask pattern in intelligibility of ideal binary-masked noisy speech. J. Acoust. Soc. Am. 2009, 126, 1415–1426. [Google Scholar] [CrossRef] [PubMed]
- Marchegiani, L.; Karadogan, S.G.; Andersen, T.; Larsen, J.; Hansen, L.K. The role of top-down attention in the cocktail party: Revisiting cherry’s experiment after sixty years. In Proceedings of the 10th International Conference on Machine Learning and Applications and Workshops (ICMLA), Honolulu, HI, USA, 18–21 December 2011; Volume 1, pp. 183–188. [Google Scholar]
- Toshio, I. An optimal auditory filter. In Proceedings of the IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 15–18 October 1995; pp. 198–201. [Google Scholar]
- Ellis, D.P.W. “Gammatone-Like Spectrograms”. 2009. Available online: http://www.ee.columbia.edu/dpwe/resources/matlab/gammatonegram/ (accessed on 15 March 2018).
- Glasberg, B.R.; Moore, B.C. Derivation of auditory filter shapes from notched-noise data. Hear. Res. 1990, 47, 103–138. [Google Scholar] [CrossRef]
- Rehman, A.; Wang, Z.; Brunet, D.; Vrscay, E.R. SSIM-inspired image denoising using sparse representations. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 1121–1124. [Google Scholar]
- Channappayya, S.S.; Bovik, A.C.; Caramanis, C.; Heath, R.W. SSIM-optimal linear image restoration. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, NV, USA, 31 March–4 April 2008; pp. 765–768. [Google Scholar]
- Avanaki, A.N. Exact global histogram specification optimized for structural similarity. Opt. Rev. 2009, 16, 613–621. [Google Scholar] [CrossRef]
- Davis, M.H.; Johnsrude, I.S. Hierarchical processing in spoken language comprehension. J. Neurosci. 2003, 23, 3423–3431. [Google Scholar] [CrossRef] [PubMed]
- Drullman, R.; Bronkhorst, A.W. Speech perception and talker segregation: Effects of level, pitch, and tactile support with multiple simultaneous talkers. J. Acoust. Soc. Am. 2004, 116, 3090–3098. [Google Scholar] [CrossRef] [PubMed]
- Song, J.H.; Skoe, E.; Banai, K.; Kraus, N. Training to improve hearing speech in noise: Biological mechanisms. Cerebral Cortex 2011, 22, 1180–1190. [Google Scholar] [CrossRef] [PubMed]
- Kollmeier, B.; Warzybok, A.; Hochmuth, S.; Zokoll, M.A.; Uslar, V.; Brand, T.; Wagener, K.C. The multilingual matrix test: Principles, applications, and comparison across languages: A review. Int. J. Audiol. 2015, 54, 3–16. [Google Scholar] [CrossRef] [PubMed]
- Brungart, D.S.; Sheffield, B.M.; Kubli, L.R. Development of a test battery for evaluating speech perception in complex listening environments. J. Acoust. Soc. Am. 2014, 136, 777–790. [Google Scholar] [CrossRef] [PubMed]
- Keidser, G. Introduction to Special Issue: Towards Ecologically Valid Protocols for the Assessment of Hearing and Hearing Devices. J. Am. Acad. Audiol. 2016, 27, 502–503. [Google Scholar] [CrossRef] [PubMed]
Level | Construction Work | Car Engine | Heavy Traffic |
---|---|---|---|
L10 | −16.7 | −16.2 | −16.5 |
L90 | −40.7 | −39.1 | −41.7 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Marchegiani, L.; Fafoutis, X.; Abbaspour, S. Speech Identification and Comprehension in the Urban Soundscape. Environments 2018, 5, 56. https://doi.org/10.3390/environments5050056
Marchegiani L, Fafoutis X, Abbaspour S. Speech Identification and Comprehension in the Urban Soundscape. Environments. 2018; 5(5):56. https://doi.org/10.3390/environments5050056
Chicago/Turabian StyleMarchegiani, Letizia, Xenofon Fafoutis, and Sahar Abbaspour. 2018. "Speech Identification and Comprehension in the Urban Soundscape" Environments 5, no. 5: 56. https://doi.org/10.3390/environments5050056
APA StyleMarchegiani, L., Fafoutis, X., & Abbaspour, S. (2018). Speech Identification and Comprehension in the Urban Soundscape. Environments, 5(5), 56. https://doi.org/10.3390/environments5050056