Effect of Mouth Mask and Face Shield on Speech Spectrum in Slovak Language

: In this paper, with the aim of assessing the deterioration of speech intelligibility caused by a speaker wearing a mask, different face masks (surgical masks, FFP2 mask, homemade textile-based protection and two kinds of plastic shields) are compared in terms of their acoustic ﬁltering effect, measured by placing the mask on an artiﬁcial head/mouth simulator. For investigating the additional effects on the speaker’s vocal output, speech was also recorded while people were reading a text when wearing a mask, and without a mask. In order to discriminate between effects of acoustic ﬁltering by the mask and mask-induced effects of vocal output changes, the latter was monitored by measuring vibrations at the suprasternal notch, using an attached accelerometer. It was found that when wearing a mask, people tend to slightly increase their voice level, while when wearing plastic face shield, they reduce their vocal power. Unlike the Lombard effect, no signiﬁcant change was found in the spectral content. All face mask and face shields attenuate frequencies above 1–2 kHz. In addition, plastic shields also increase frequency components to around 800 Hz, due to resonances occurring between the face and the shield. Finally, special attention was given to the Slavic languages, in particular Slovak, which contain a large variety of sibilants. Male and female speech, as well as texts with and without sibilants, was compared.


Introduction
The current pandemic situation, caused by the COVID-19 virus, has forced the national governments of different countries to introduce various measures to prevent people from spreading the virus. Since COVID-19 belongs to the category of respiratory infections, it mainly spreads through exhaled aerosol. Common mitigating measures are to keep rooms well-ventilated, to regularly cleaning surfaces, and to ask citizens to wear face masks, in some situations combined with face shields [1][2][3][4][5][6], both to protect others and themselves from inhaling virus-containing microdroplets [7]. All of these measures have been recommended by the World Health Organization after declaring the pandemic at the beginning of 2020 [4].
Face masks effectively protect us from spreading and inhaling the virus, but this comes at the price of the deteriorated quality of verbal communication. A decrease in speech intelligibility has been reported already in various situations [8][9][10][11][12][13]. There are several reasons that can explain reduced speech understanding. Probably the most significant are (1) the fact that a face mask acts as an acoustic filter; (2) the mask acting as a barrier, blocking visual access of the listener to the speaker's mouth movements during communication; (3) changes in articulation of the speaking person wearing the mask. The effect of the reduction in speech intelligibility on the quality of communication is significant, especially 2 of 14 in environments with high levels of background noise, and in the presence of sound reverberation. Moreover, the effect of visual barrier is emphasized in cases of listeners suffering from hearing impairment [14]. Most face masks are based on various fabrics with multiple layers. These act as an attenuator and porous sound absorber, which damps the sound energy, especially at middle and high frequencies. The attenuation is largest above 2000 Hz [15], which overlaps the 500-4000 Hz range, which is crucial for understanding speech [16].
A large amount of research on the influence of face masks on speech understanding has been recently reported [8,10,15,[17][18][19][20][21][22][23]. Some authors aimed mostly at the performance of surgical face masks, and reported the absence of a significant influence on speech intelligibility during communication [23][24][25] and singing perception [9]. Bottalico et al. explored the influence of wearing face protective elements on communication in a classroom. It was found that there is a difference between surgical face masks and N95 masks, compared with other fabric masks in terms of more favorable sound attenuation. Therefore, the use of fabric masks in classroom environments was discouraged [17].
In the present study, various experiments were performed, and different types of face protections were investigated, such as disposable face masks, homemade cloth face masks, medical respirators with FFP filters, and plastic shields. In order to understand the influence of the face masks and plastic shields on speech production and speech perception, several scenarios were measured, analyzed and compared. The measurements consisted of (1) recordings on people reading a text with and without face protection; (2) measurements on an artificial head with implemented artificial mouth; (3) measurements of the speech-induced vibration at speakers' suprasternal notch, in order to assess whether the speaker changes his or her vocal output when wearing a face mask or shield. The suprasternal notch vibration measurement data were used for signal normalization during the evaluation of individual cases. The analysis took into account the importance of sibilants in Slovak language, which is typical of Slavic languages in general. Sound and vibration measurements of speakers reciting texts containing and missing sibilants were compared. Taking into account typical features of Slovak languages, in the analysis of speech, a distinction between the text containing sibilant and no sibilants was made. Effects were investigated both for male and female voices.

Laboratory Facility and Setup
The experiments were performed in a newly built acoustic laboratory at the Faculty of Civil Engineering of The Slovak University of Technology in Bratislava. The listening room is soundproof and quasi-anechoic. The mineral wool is covered by a thin layer of acoustically transparent elastic textile. Sound absorption on the ceiling is reached by 40 mm thick SQ Ecophon ® panels. The floor consists of elastic rubber, covered by carpet. One of the walls is made of highly insulating glassed wall (Glass Solution ® ) placed at 1 m distance from the façade, to allow daylight to enter while shielding the room from traffic noise that would be transmitted through the building facade. (Figure 1 left). The background noise level in the empty room was measured by a NOR140 sound analyzer and was found to be below L A,eq = 20 dB. Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 12 Figure 1. Photo of the laboratory (left) and artificial head with loudspeaker placed in the mouth opening (right).

Artificial Head
A home-made artificial mouth, dedicated to this project, was based on a PVC manikin head. It consisted of a cavity inside of the shell, which was otherwise filled with liquid silicone. A 2.5-inch passive loudspeaker was placed into the mouth opening to simulate human speech (Figure 1 right) in the frequency range 125-8000 Hz.
The well-controlled sound output of the system allowed us to determine the filter characteristics of the masks of interest. It served as a reference for measurements of effects of masks on the vocal output of speaking people, which not only included acoustic filtering by the mask, but also effects of mask-induced changes in articulation and mouth/lip movements. An artificial head completely neglects this effect. For the purpose of our experiment, we call this built-up device an artificial head.

Participants
Eight persons participated in the experiment. All of them were native Slovaks without cognitive or speech disorders. The age of the participants was between 25 and 35 years old (3 female and 5 male). They were informed about the purpose of the experiment and were instructed prior to recording to talk relaxedly in their usual way, i.e., at their normal speech level.

Measurements
Measurement on the artificial head (mouth) as well as recordings of speaking people (that were reading the text) were performed by a Behringer ECM 8000 microphone at a distance of 1 m between the receiver (microphone) and sound source (i.e., artificial mouth/or person). The position of the microphone in the room was chosen according to the requirements of standard ISO 9921:2003 [26]. During the recordings, participants were asked to not move their heads, to ensure a constant distance from microphone.
In addition to the microphone-based recordings, parallel skin vibrations in the suprasternal notch were measured by a miniature IEPE accelerometer (MMF, KS95C.10), charged by an ICP sensor signal conditioner (PCB, MODEL 480C02). These data served as a reference signal for the possible normalization of variations in speech intensity [27].
In the experiments using the artificial head, pink noise was played through the loudspeaker charged by a laboratory amplifier (Bruel and Kjaer, Type 2706) in the mouth opening, simulating an artificial mouth. All the devices were connected to the PC via DAQ (Behringer, U-PHORIA UMC404HD).
In the experiments with people, two kinds of texts were inherited from logopedic exercises for Slovak language pronunciation learning. The first one was focused on training the pronunciation of sibilants, while the second one did not contain any sibilants. The idea behind this was to investigate the influence of the presence of sibilants on the speech produced by a talking person, and the difference between the effect of mask wearing on

Artificial Head
A home-made artificial mouth, dedicated to this project, was based on a PVC manikin head. It consisted of a cavity inside of the shell, which was otherwise filled with liquid silicone. A 2.5-inch passive loudspeaker was placed into the mouth opening to simulate human speech (Figure 1 right) in the frequency range 125-8000 Hz.
The well-controlled sound output of the system allowed us to determine the filter characteristics of the masks of interest. It served as a reference for measurements of effects of masks on the vocal output of speaking people, which not only included acoustic filtering by the mask, but also effects of mask-induced changes in articulation and mouth/lip movements. An artificial head completely neglects this effect. For the purpose of our experiment, we call this built-up device an artificial head.

Participants
Eight persons participated in the experiment. All of them were native Slovaks without cognitive or speech disorders. The age of the participants was between 25 and 35 years old (3 female and 5 male). They were informed about the purpose of the experiment and were instructed prior to recording to talk relaxedly in their usual way, i.e., at their normal speech level.

Measurements
Measurement on the artificial head (mouth) as well as recordings of speaking people (that were reading the text) were performed by a Behringer ECM 8000 microphone at a distance of 1 m between the receiver (microphone) and sound source (i.e., artificial mouth/or person). The position of the microphone in the room was chosen according to the requirements of standard ISO 9921:2003 [26]. During the recordings, participants were asked to not move their heads, to ensure a constant distance from microphone.
In addition to the microphone-based recordings, parallel skin vibrations in the suprasternal notch were measured by a miniature IEPE accelerometer (MMF, KS95C.10), charged by an ICP sensor signal conditioner (PCB, MODEL 480C02). These data served as a reference signal for the possible normalization of variations in speech intensity [27].
In the experiments using the artificial head, pink noise was played through the loudspeaker charged by a laboratory amplifier (Bruel and Kjaer, Type 2706) in the mouth opening, simulating an artificial mouth. All the devices were connected to the PC via DAQ (Behringer, U-PHORIA UMC404HD).
In the experiments with people, two kinds of texts were inherited from logopedic exercises for Slovak language pronunciation learning. The first one was focused on training the pronunciation of sibilants, while the second one did not contain any sibilants. The idea behind this was to investigate the influence of the presence of sibilants on the speech produced by a talking person, and the difference between the effect of mask wearing on the speech in these two scenarios. The length of both texts was very similar, and the typical Appl. Sci. 2021, 11, 4829 4 of 14 time necessary to read the text was about 30 s. In order to reduce the effects of variations in sound pressure level during the reading, variations in speed of reading and other random artefacts, each person was asked to read the texts 3 times. Next, the average result per person was considered.

Face Protections Used in Experiments
It is known that different face mask filters sound differently [18]. For this reason, a variety of masks were chosen for this experiment: 3 different types of face mask, with and without a bracket as a support, and 2 types of face shields. Altogether, 8 different variants of face covering were considered. For a summary, see Table 1 and Figure 2. the speech in these two scenarios. The length of both texts was very similar, and the typical time necessary to read the text was about 30 s. In order to reduce the effects of variations in sound pressure level during the reading, variations in speed of reading and other random artefacts, each person was asked to read the texts 3 times. Next, the average result per person was considered.

Face Protections Used in Experiments
It is known that different face mask filters sound differently [18]. For this reason, a variety of masks were chosen for this experiment: 3 different types of face mask, with and without a bracket as a support, and 2 types of face shields. Altogether, 8 different variants of face covering were considered. For a summary, see Table 1 and Figure 2.

Results and Discussion
First, the measurements without face protection were performed, in order to obtain the absolute voice spectra of the different speakers that participated in the experiment. The average results for the two kinds of texts, i.e., with/without sibilants, are shown in

Results and Discussion
First, the measurements without face protection were performed, in order to obtain the absolute voice spectra of the different speakers that participated in the experiment. The average results for the two kinds of texts, i.e., with/without sibilants, are shown in Figure 3. The figure shows the sound pressure level of voice recordings in the third octave bands, distinguishing male (light-grey bars) and female (dark-grey bars) speakers. Figures show recorded speakers speaking freely, without the use of face protection. The two graphs in Figure 3 also include the standard deviation per third octave band between measured data on people, together with a result measured by means of artificial mouth (solid line). Note that theoretically, this would only be a continuous horizontal line, since the produced sound signal was pink noise. In practice, the result is influenced (as shown in Figure 3) by the loudspeaker placement in the opening and its spectral and directivity characteristics. In our experiment, the perfectly flat response of the artificial head is, however, not an issue, as we are interested in relative differences of sound pressure level in the situations with and without mask protection. The dashed line in Figure 3 represents the background noise level in the laboratory during the recording sessions. There is a visible difference in the voice spectrum of text including sibilants (right) and without sibilants (left): their presence is mainly pronounced at high frequencies above 3150 Hz. The overall difference between male and female voices is confirmed at low frequencies in the case of the speech including sibilants. For this reason, further analysis was also performed separately for men and women.  Figure 3 also include the standard deviation per third octave band between measured data on people, together with a result measured by means of artificial mouth (solid line). Note that theoretically, this would only be a continuous horizontal line, since the produced sound signal was pink noise. In practice, the result is influenced (as shown in Figure 3) by the loudspeaker placement in the opening and its spectral and directivity characteristics. In our experiment, the perfectly flat response of the artificial head is, however, not an issue, as we are interested in relative differences of sound pressure level in the situations with and without mask protection. The dashed line in Figure 3 represents the background noise level in the laboratory during the recording sessions. There is a visible difference in the voice spectrum of text including sibilants (right) and without sibilants (left): their presence is mainly pronounced at high frequencies above 3150 Hz. The overall difference between male and female voices is confirmed at low frequencies in the case of the speech including sibilants. For this reason, further analysis was also performed separately for men and women.   Table 2 summarize the data measured on the artificial head, showing the general effect of various face masks on the sound pressure level spectrum in the third octave bands. There is no effect of the mouth mask at frequencies below 630 Hz. For plastic shields, there is no effect below 400 Hz. Effects are most pronounced above 2000 Hz.
There is also a clear influence of the two face shields (thick solid line and thick dashed line), resulting in an increase in the sound pressure level of around 800 Hz. This is consistent with the data measured by Corey, 2020 [18]. This effect is a consequence of cavity resonance between the face and the shield. Table 2 provides a detailed overview and comparison between all investigated scenarios. It highlights the effect of face protection at different chosen frequency ranges.   Table 2 summarize the data measured on the artificial head, showing the general effect of various face masks on the sound pressure level spectrum in the third octave bands. There is no effect of the mouth mask at frequencies below 630 Hz. For plastic shields, there is no effect below 400 Hz. Effects are most pronounced above 2000 Hz. Appl. Sci. 2021, 11, x FOR PEER REVIEW 6 of 12  Table 2. ΔLp (dB) and STD (dB) for data calculated for three different frequency ranges from 125 to 10 kHz, 600 to 10 kHz, and 2 to 10 kHz. Figure 5 shows detailed results for all investigated face coverings. The impact of wearing a mouth mask is shown in the frequency domain. It is expressed by differences in the sound pressure level without and with mouth protection in each third octave band. Results measured from people were compared with data obtained from the artificial mouth experiment. The presented results are the average values of all participants (for each particular mask), structured in two columns, the recited text containing sibilants and the text without sibilants, respectively, in the left and right column. A distinction is made  There is also a clear influence of the two face shields (thick solid line and thick dashed line), resulting in an increase in the sound pressure level of around 800 Hz. This is consistent with the data measured by Corey, 2020 [18]. This effect is a consequence of cavity resonance between the face and the shield. Table 2 provides a detailed overview and comparison between all investigated scenarios. It highlights the effect of face protection at different chosen frequency ranges. Figure 5 shows detailed results for all investigated face coverings. The impact of wearing a mouth mask is shown in the frequency domain. It is expressed by differences in the sound pressure level without and with mouth protection in each third octave band. Results measured from people were compared with data obtained from the artificial mouth experiment. The presented results are the average values of all participants (for each particular mask), structured in two columns, the recited text containing sibilants and the text without sibilants, respectively, in the left and right column. A distinction is made between male (light-grey bars) and female readers (dark-grey bars). The results are accompanied by the spectra obtained by transmitting pink noise from the artificial head (solid line). slightly dropping to around 6-8 kHz.
In experiments with people wearing plastic shields, a cavity resonance of around 800 Hz was confirmed (the same for both shields) for both cases (with/without sibilants).
The agreement between the artificial head-based results and people is better in cases without a bracket. In cases with a bracket, the effect measured by the artificial mouth was overestimated, mostly at high frequencies above ca. 5 kHz. One of the addressed research questions was to understand whether people increase their vocal intensity when wearing mouth-face protection. This was checked by means of accelerometer. Figure 6 shows the results from accelerometer measurements (in terms of measured velocity) performed on two male and one female. The differences between individuals are relatively large, and three types of behavior can be seen. However, an increase in the fundamental frequency can be seen in all cases. In the case of female speech, this had a much stronger effect. In all cases, people's vocal output was lower when wearing face shields in comparison with either cloth or fabric mouth mask protection. No significant difference between the speech with and without sibilants was found; however, the standard deviation in cases with sibilants is much larger ( Figure 5). This means that the speech intelligibility of people wearing face masks is highly individual in Slavic languages. In both cases (with and without sibilants), the impact of mouth masks above 3000 Hz is quasi constant for all high frequencies, with cases with a bracket even slightly dropping to around 6-8 kHz.
In experiments with people wearing plastic shields, a cavity resonance of around 800 Hz was confirmed (the same for both shields) for both cases (with/without sibilants).
The agreement between the artificial head-based results and people is better in cases without a bracket. In cases with a bracket, the effect measured by the artificial mouth was overestimated, mostly at high frequencies above ca. 5 kHz.
One of the addressed research questions was to understand whether people increase their vocal intensity when wearing mouth-face protection. This was checked by means of accelerometer. Figure 6 shows the results from accelerometer measurements (in terms of measured velocity) performed on two male and one female. The differences between individuals are relatively large, and three types of behavior can be seen. However, an increase in the fundamental frequency can be seen in all cases. In the case of female speech, this had a much stronger effect. In all cases, people's vocal output was lower when wearing face shields in comparison with either cloth or fabric mouth mask protection.
The results indicated that the voice level and also the voice increase due to mouth protection is to some extent individual. Therefore, in addition to the presented data, the effect of sound level normalization was also investigated for a few people. Figure 7 shows an example of the effect of the mask: 2-ply cloth + BR. The presented data were normalized according to measurements of vibration by an accelerometer attached to the person reading the text. The influence of the normalization of recordings is visible mostly in a frequency range between 200 and 2500 Hz. This range is a substantial part of the human speech frequency spectrum [28].
It is clear that more research should be conducted by means of listening tests involving the perceptual assessment of speech intelligibility with and without a mask. Using only the objective spectral filtering of sound, without taking into account phenomena such as articulation and visual feedback, might not be sufficient when real speech intelligibility needs to be assessed. their vocal intensity when wearing mouth-face protection. This was checked by means of accelerometer. Figure 6 shows the results from accelerometer measurements (in terms of measured velocity) performed on two male and one female. The differences between individuals are relatively large, and three types of behavior can be seen. However, an increase in the fundamental frequency can be seen in all cases. In the case of female speech, this had a much stronger effect. In all cases, people's vocal output was lower when wearing face shields in comparison with either cloth or fabric mouth mask protection.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 10 of 12 The results indicated that the voice level and also the voice increase due to mouth protection is to some extent individual. Therefore, in addition to the presented data, the effect of sound level normalization was also investigated for a few people. Figure 7 shows an example of the effect of the mask: 2-ply cloth + BR. The presented data were normalized according to measurements of vibration by an accelerometer attached to the person reading the text. The influence of the normalization of recordings is visible mostly in a frequency range between 200 and 2500 Hz. This range is a substantial part of the human speech frequency spectrum [28].
It is clear that more research should be conducted by means of listening tests involving the perceptual assessment of speech intelligibility with and without a mask. Using only the objective spectral filtering of sound, without taking into account phenomena such as articulation and visual feedback, might not be sufficient when real speech intelligibility needs to be assessed. quency range between 200 and 2500 Hz. This range is a substantial part of the human speech frequency spectrum [28].
It is clear that more research should be conducted by means of listening tests involving the perceptual assessment of speech intelligibility with and without a mask. Using only the objective spectral filtering of sound, without taking into account phenomena such as articulation and visual feedback, might not be sufficient when real speech intelligibility needs to be assessed.

Conclusions
In this article, three kinds of protective face masks, i.e., surgical mask, FFP2 mask, homemade textile-based protection (with and without a bracket), and two different plastic shields, were compared with respect to their spectral filtering effect on third octave bands.

Conclusions
In this article, three kinds of protective face masks, i.e., surgical mask, FFP2 mask, homemade textile-based protection (with and without a bracket), and two different plastic shields, were compared with respect to their spectral filtering effect on third octave bands. Experiments were based on measurements of human subjects and an artificial mouth simulator. In addition, experiments with an accelerometer were also included in the study in order to detect possible changes in the overall vocal power of a person talking with and without face-mouth protection. Finally, special attention was given to linguistic features, in particular to Slovak language, which contains many sibilants.
It was found that all face masks and face shields attenuate frequencies above 1-2 kHz, while plastic shields, in addition to this, also increase frequency components to around 800 Hz. This was due to resonances occurring in the cavity between the face and the shield.
A relatively good agreement was found between the artificial mouth-based results and experiments with people reading texts in cases without a bracket. Once a bracket was involved, the effect measured by the artificial mouth overestimated the filtering effect at high frequencies above ca 5 kHz.
The accelerometer measurements showed that people wearing face masks tend to increase their voice level in general. This can be seen in the increase in the sound levels of their voice at fundamental frequency. It was relatively difficult to quantify and generalize the increase in voice levels accurately, as this was highly individual. The largest differences between people were found at very high frequencies, where the standard deviation reached values up to 5 dB. When wearing the face shields, the increased voice power was not pronounced; thus, in some cases, people tended to speak at lower intensities.
The differences between the male and female voices at low frequencies were confirmed and affected the results when sibilants were present in speech.
Finally, when comparing only mean values, no significant difference between the speech with and without sibilants was found, but an important observation was made: the standard deviation in cases with sibilants is much larger than without, which means that the speech intelligibility of people wearing face masks is highly individual in Slavic languages.