Speech Impaired by Half Masks Used for the Respiratory Tract Protection

Filtering half masks belong to the group of personal protective equipment in the work environment. They protect the respiratory tract but may hinder breath and suppress speech. The present work is focused on the attenuation of sound by the half masks known as “filtering facepieces”, FFPs, of various construction and filtration efficiency. Rather than study the perception of speech by humans, we used a generator of white noise and artificial speech to obtain objective characteristics of the attenuation. The generator speaker was either covered by an FFP or remained uncovered while a class 1 meter measured sound pressure levels in 1/3 octave bands with center frequencies 100–20 kHz at distances from 1 to 5 m from the speaker. All five FFPs suppressed acoustic waves from the octave bands with center frequencies of 1 kHz and higher, i.e., in the frequency range responsible for 80% of the perceived speech intelligibility, particularly in the 2 kHz-octave band. FFPs of higher filtration efficiency stronger attenuated the sound. Moreover, the FFPs changed the voice timbre because the attenuation depended on the wave frequency. The two combined factors can impede speech intelligibility.


Introduction
Airborne dust in the working environment is a health risk factor, e.g., in mining [1], wood and furniture [2,3], and construction industry, as well as in the welding and grinding [4] and biomass processing [5]. Half masks covering the nose, mouth, and chin are personal protective equipment against harmful physical and biological agents. A European standard EN 149:2001+A1:2009 specifies three classes of the half masks called filtering facepieces (FFP): FFP1, FFP2, and FFP3 [6]. These are mechanical filter respirators with various filter efficiency, equipped or not with inhalation or exhalation valves. Healthcare workers use FFPs as protection against bacteria and viruses, SARS-CoV-2 in particular [1,7]. World Health Organization recommends several types of masks for the public to reduce the transmission of viruses [8]. However, the recommendations are not entirely followed even in hospitals, although the situation in the COVID-19 wards is better than in others [9].
Although it improves safety, personal protective equipment may cause work to be uncomfortable. For this reason, workers sometimes do not obey the rules of personal protection in small and medium-sized enterprises in particular [3]. The respirators may cause difficulty in breathing due to clogging by dust [10]. Therefore, they are temporary rather than permanent protective equipment [11]. Indeed, obstructed breathing is a serious problem if a physical effort is required to perform work. Albeit the difference between resting energy expenditures measured for the subjects wearing and not wearing FFP2 masks proved statistically insignificant, the oxygen consumption and the carbon dioxide production were slightly higher with the filtering masks [12]. Choi et al. [13] found that the energy cost of a single inhalation varied depending on the type of a half mask in the range Int. J. Environ. Res. Public Health 2022, 19, 7012 2 of 12 up to 10 mJ. That was about 7.1 mJ for half masks with a valve, and discomfort was rated 4.6 on a 6-point scale. Thus, the effort needed to inhale air contributes significantly to the comfort of wearing a half mask [13].
Another question is the impact of mask-wearing on speech clarity. Interest in this subject has increased as the masks as a countermeasure against the spread of COVID-19 have been adopted. Probably everyone noticed difficulty in verbal communication when wearing a mask. In particular, users of hearing aids and cochlear implants have to put more effort to understand the speech of mask-wearing persons [14]. However, Cohn et al. [15] reported that speakers with half-masks were more intelligible than those without the masks, provided they spoke clearly as to someone who might have trouble understanding the speaker. That was an exception rather than a rule. Masks worsened the intelligibility of casual and emotional speech [15]. Perception of speech in classrooms depended on the mask type used by the speaker apart from the speaker-listener distance [16]. An experiment involving twenty healthcare workers showed that speech recognition was decreased by 7% on average when speakers wore half-masks [17].
The reported works dealt mainly with speech perception by individuals. Apart from that, Oren et al. [18] studied the perception of the singing voices. Moreover, they analyzed changes in spectra of acoustic chirp signals caused by masks of several types: neck gaiter, disposable surgical mask, N95 mask (an equivalent of the FFP2 according to EN 149-2001 standard), and acoustic foam. In general, suppression and amplification of waves of particular frequencies depended on the mask type. N95 respirator suppressed acoustic waves of frequencies between 2 and 5 kHz and above 6 kHz. The authors concluded that the N95 respirator most strongly disrupted the auditory perceptual characteristics of the singing voice.
The SARS-CoV-2 pandemic spurred the development of filtering masks. The number of patents in March and April 2020 increased by about 100% compared with the period before the pandemic [19]. New designs improved the filtration efficiency and the wearing comfort [20][21][22]. However, we are not aware of the improved acoustic characteristics of the new masks despite the crucial role played by verbal communication in the life and work environment.
A majority of the studies reported in this brief review dealt with the perception of the human voice. Investigations of objective parameters, such as attenuation of sound waves by filtering masks, were scarce. We decided to fill that gap at least partially. Thus, we studied FFPs of various types using a calibrated source of the acoustic signal and sound level meter and analyzer.

Research Material
We tested five convex-shaped disposable filtering half-masks of the FFP type, CE0194 certified according to EN 149:2001+A1:2009 standard [6] (Figure 1). All the masks were from one manufacturer, made of synthetic fiber, and equipped with an adjustable nose clip. The FFP1 and FFP2 were anti-dust half-masks without bactericidal inserts, while the FFP3 was a half-mask with such insert. The "+" sign in the FFP marking denoted a mask equipped with an exhalation valve for better breathing.

Apparatus and Methodology
Svantek SVAN 979 class 1 sound and vibration analyzer compliant with 61672-1: 2013 standard [23], equipped with a GRAS 40AE 1/2'' microphone, was the measurements of sound pressures. Before and after each measurement series, ter was checked with the class 1 Sound Calibrator SV36 according to the IEC 609 standard [24]. Bedrock TalkBox BTB65 provided the acoustic signal. All apparat in the measurements had valid calibration certificates.
The measurements were carried out in a medium-sized laboratory room, ab m long, 6.2 m wide, and 3.2 m high. The reverberation time in the room was asse 0.3 s for the furniture and equipment arranged for this study, as that was for the s classroom arrangement.
The microphone and TalkBox stood on tripods 1.6 m above the floor, whic average mouth location of a standing adult human according to anthropometric plied, e.g., in the C50 speech clarity measurements [25]. The speaker-microphone was from 1 to 5 m, and they were aligned using the built-in laser pointer of the T SVAN 979 m analyzed and recorded the acoustic signal from the TalkBox in 1/3 wide bands. The generated total sound pressure levels were 60 and 72 dB. Back level noise of ca. 28 dB was sufficiently small to neglect it. Sound pressure leve octave bands with center frequency from 100 Hz to 20 kHz were considered in calculations.
In the first series of measurements, the TalkBox emitted white noise, whil second, it simulated human speech defined by IEC 60268-16:2020 standard [26]. Th urements were carried out for the TalkBox speaker uncovered and covered by eac five FFPs. Figure 2 shows the TalkBox with a tested mask on the speaker. The ma covered the speaker. What may seem to be a gap between the box and the mask of the box made of grey plastic.

Apparatus and Methodology
Svantek SVAN 979 class 1 sound and vibration analyzer compliant with the IEC 61672-1: 2013 standard [23], equipped with a GRAS 40AE 1/2" microphone, was used in the measurements of sound pressures. Before and after each measurement series, the meter was checked with the class 1 Sound Calibrator SV36 according to the IEC 60942: 2017 standard [24]. Bedrock TalkBox BTB65 provided the acoustic signal. All apparatus used in the measurements had valid calibration certificates.
The measurements were carried out in a medium-sized laboratory room, about 11.7 m long, 6.2 m wide, and 3.2 m high. The reverberation time in the room was assessed for 0.3 s for the furniture and equipment arranged for this study, as that was for the standard classroom arrangement.
The microphone and TalkBox stood on tripods 1.6 m above the floor, which is the average mouth location of a standing adult human according to anthropometric data applied, e.g., in the C50 speech clarity measurements [25]. The speaker-microphone distance was from 1 to 5 m, and they were aligned using the built-in laser pointer of the TalkBox. SVAN 979 m analyzed and recorded the acoustic signal from the TalkBox in 1/3 octave-wide bands. The generated total sound pressure levels were 60 and 72 dB. Background level noise of ca. 28 dB was sufficiently small to neglect it. Sound pressure levels in 1/3 octave bands with center frequency from 100 Hz to 20 kHz were considered in further calculations.
In the first series of measurements, the TalkBox emitted white noise, while in the second, it simulated human speech defined by IEC 60268-16:2020 standard [26]. The measurements were carried out for the TalkBox speaker uncovered and covered by each of the five FFPs. Figure 2 shows the TalkBox with a tested mask on the speaker. The masks fully covered the speaker. What may seem to be a gap between the box and the mask is a part of the box made of grey plastic.

White Noise
In this measurement series, TalkBox emitted white noise and its immission was recorded by the SVAN 979.
Each measurement of the sound pressure levels lasted 60 s, divided into six 10 s-long intervals. In this manner, six sets of the sound pressure level values in 1/3 octave-wide frequency bands, Lf,10s, were recorded for each experimental arrangement. The latter included: the TalkBox speaker (covered by one of the five FFPs or without cover), the speaker-microphone distance (d = 1 m or 5 m), and the white noise pressure level (Lwn = 60 or 72 dB). The raw Lf,10s results are in the attached Supplementary Files: "White noise 1 m" and "White noise 5 m". Since the Lf,10s values for the given experimental arrangement and center frequency f were equal within the uncertainty range, they were averaged for the 60 s-long measurement time. Finally, the Lf,60s values were calculated for octave-wide bands to match the frequency bands of the ANSI speech spectrum [27].
Attenuation of sound by the FFP in the octave band of center frequency f is the following difference between the respective Lf,60s values: Four sets of ΔLf,60s values were obtained for each FFP studied from Equation (1). They are reported in Table 2, together with respective average values calculated according to the additivity rule for the squared sound pressures. The averaging was possible because the particular ΔLf,60s(d,Lwn) values for given f were equal within the measurement uncertainty limits for the class 1 m. The latter is ±1.1 dB for f = 1 kHz and is higher for other frequencies [23].
The ΔLf,60s are plotted in Figure 3. Note that small negative values of ΔLf,60s are within the measurement uncertainty limits and do not prove the signal amplification.

White Noise
In this measurement series, TalkBox emitted white noise and its immission was recorded by the SVAN 979.
Each measurement of the sound pressure levels lasted 60 s, divided into six 10 s-long intervals. In this manner, six sets of the sound pressure level values in 1/3 octave-wide frequency bands, L f,10s , were recorded for each experimental arrangement. The latter included: the TalkBox speaker (covered by one of the five FFPs or without cover), the speaker-microphone distance (d = 1 m or 5 m), and the white noise pressure level (L wn = 60 or 72 dB). The raw L f,10s results are in the attached Supplementary Files: "White noise 1 m" and "White noise 5 m". Since the L f,10s values for the given experimental arrangement and center frequency f were equal within the uncertainty range, they were averaged for the 60 s-long measurement time. Finally, the L f,60s values were calculated for octave-wide bands to match the frequency bands of the ANSI speech spectrum [27].
Attenuation of sound by the FFP in the octave band of center frequency f is the following difference between the respective L f,60s values: Four sets of ∆L f,60s values were obtained for each FFP studied from Equation (1). They are reported in Table 2, together with respective average values calculated according to the additivity rule for the squared sound pressures. The averaging was possible because the particular ∆L f,60s (d,L wn ) values for given f were equal within the measurement uncertainty limits for the class 1 m. The latter is ±1.1 dB for f = 1 kHz and is higher for other frequencies [23]. The ∆L f,60s are plotted in Figure 3. Note that small negative values of ∆L f,60s are within the measurement uncertainty limits and do not prove the signal amplification.
1 Figure 3. Attenuation of sound by the five FFPs in octave bands with center frequencies f. Pointsaveraged values ∆L f,60s (cf. Table 2), whiskers-minimum-maximum range. Lines are guides for the eye only. Plus sign in the FFP symbol denotes a mask with an exhalation valve.

Simulated Speech
The primary goal of the second experiment was to collect data for a comparison of the speech disruption predicted from the FFPs attenuation characteristics with the results of direct measurements. The measurements regime was similar to the previous one, except that TalkBox emitted simulated human speech defined by IEC 60268-16:2020 standard [26] rather than white noise.
The speaker-microphone distance d was 1, 2, 3, 4, or 5 m, the emitted maximum sound pressure level L hs was 72 dB, and the averaging time of the measured sound pressure levels was 1 s, while the total measurement time was 10 s in each run. The SVAN 979 meter analyzed the acoustic signal and recorded the sound pressure levels in 1/3 octave bands.
Three samples of each FFP type were tested for within-subject variability. The Shapiro-Wilk test evidenced that distributions of the acoustic pressures in 1/3 octave-wide frequency bands differed from the normal distribution at the 5% level of significance. However, the distributions for each FFP type did not show statistically significant differences in the Kruskal-Wallis ANOVA test, and respective median values of the sound pressure levels also did not.
For consistency with the FFPs attenuation characteristics, the 1/3 octave sound pressure levels were summed up in each octave-wide band with center frequencies from 125 Hz to 16 kHz. In this manner, 120 experimental time series of the L f,1s for each FFP were obtained. Further, they could be compared with the L f,1s series calculated from the attenuation characteristics of the FFPs, ∆L f,60s reported in Table 2, according to the following formula: (2) L f,1s (no FFP) in Equation (2)  The speaker-microphone distance d was 1, 2, 3, 4, or 5 m, the emitted maximum sound pressure level Lhs was 72 dB, and the averaging time of the measured sound pressure levels was 1 s, while the total measurement time was 10 s in each run. The SVAN 979 m analyzed the acoustic signal and recorded the sound pressure levels in 1/3 octave bands.
Three samples of each FFP type were tested for within-subject variability. The Shapiro-Wilk test evidenced that distributions of the acoustic pressures in 1/3 octavewide frequency bands differed from the normal distribution at the 5% level of significance. However, the distributions for each FFP type did not show statistically significant differences in the Kruskal-Wallis ANOVA test, and respective median values of the sound pressure levels also did not.
For consistency with the FFPs attenuation characteristics, the 1/3 octave sound pressure levels were summed up in each octave-wide band with center frequencies from 125 Hz to 16 kHz. In this manner, 120 experimental time series of the Lf,1s for each FFP were obtained. Further, they could be compared with the Lf,1s series calculated from the attenuation characteristics of the FFPs, ΔLf,60s reported in Table 2, according to the following formula: Lf,1s (with the FFP) = Lf,1s (no FFP) − ΔLf,60s. (2) Lf,1s (no FFP) in Equation (2)

Discussion
The white noise experiments showed that all the studied FFPs suppressed acoustic waves from the octave bands with center frequencies of 1 kHz and higher (Figure 3). Xue et al. [28] showed that the frequencies above 1 kHz in human speech are crucial for vowels articulation, thus for a proper understanding of the speech. According to French and Steinberg [29], four octave-wide bands with center frequencies from 1 to 8 kHz account for 20, 30, 25, and 5% of the perceived speech intelligibility (the numbers were calculated from the data reported by French and Steinberg in Table III of their paper and they are probably valid for non-tonal languages). Thus, the FFPs affected the speech transmission in the frequency range where 80% of the information is transferred, notably in the octave band with the center frequency of 2 kHz. As could have been expected, the better the filtration efficiency, the stronger the suppression. The exhalation valve mounted in the mask slightly increased the attenuation, particularly in the 16 kHz octave band. This frequency range is of no importance for speech intelligibility. It seems reasonable to suppose that suppression depends on the density and thickness of the mask material, such as the non-woven synthetics in the studied FFPs. Many such materials are in general use. For this reason, the reported attenuation characteristics can be inappropriate for other FFPs, even those of the same types. Thus, a generalization would be premature at this stage of the study.
The simulated speech experiments evidenced that the time series of the speech calculated from the attenuation characteristics of the FFPs were very close to those measured directly (Figures 4 and 5). Thus, human voice suppression can be analyzed semi-quantitatively based on a normalized speech spectrum, such as that reported in ANSI 3.5-1997 standard [27]. Figure 6 illustrates the disruptions caused by FFPs on the ANSI speech spectra expressed as the sound pressure levels at the one-meter distance in front of the speaker's mouth. FFPs substantially decrease the speech loudness in the 2, 4, and 8 kHzoctave bands. To compensate for the change, the speaking person would have to raise the normal voice or even shout rather than speak loud where it is necessary. That would result

Discussion
The white noise experiments showed that all the studied FFPs suppressed acoustic waves from the octave bands with center frequencies of 1 kHz and higher ( Figure 3). Xue et al. [28] showed that the frequencies above 1 kHz in human speech are crucial for vowels articulation, thus for a proper understanding of the speech. According to French and Steinberg [29], four octave-wide bands with center frequencies from 1 to 8 kHz account for 20, 30, 25, and 5% of the perceived speech intelligibility (the numbers were calculated from the data reported by French and Steinberg in Table III of their paper and they are probably valid for non-tonal languages). Thus, the FFPs affected the speech transmission in the frequency range where 80% of the information is transferred, notably in the octave band with the center frequency of 2 kHz. As could have been expected, the better the filtration efficiency, the stronger the suppression. The exhalation valve mounted in the mask slightly increased the attenuation, particularly in the 16 kHz octave band. This frequency range is of no importance for speech intelligibility. It seems reasonable to suppose that suppression depends on the density and thickness of the mask material, such as the non-woven synthetics in the studied FFPs. Many such materials are in general use. For this reason, the reported attenuation characteristics can be inappropriate for other FFPs, even those of the same types. Thus, a generalization would be premature at this stage of the study.
The simulated speech experiments evidenced that the time series of the speech calculated from the attenuation characteristics of the FFPs were very close to those measured directly (Figures 4 and 5). Thus, human voice suppression can be analyzed semiquantitatively based on a normalized speech spectrum, such as that reported in ANSI 3.5-1997 standard [27]. Figure 6 illustrates the disruptions caused by FFPs on the ANSI speech spectra expressed as the sound pressure levels at the one-meter distance in front of the speaker's mouth. FFPs substantially decrease the speech loudness in the 2, 4, and 8 kHz-octave bands. To compensate for the change, the speaking person would have to raise the normal voice or even shout rather than speak loud where it is necessary. That would result in an increased share of low-frequency waves in the disrupted speech spec-trum. Thus, FFPs not only attenuate the speech but change the timbre of voice. The latter is crucial for proper interpersonal communication. The changed voice timbre discloses the stress level and emotional arousal [30]. Non-verbal information in audible spectra influences emotional responses to speech. Disrupted speech could be particularly annoying for people with partial hearing loss [31]. As presbycusis impedes speech understanding [32], the attenuation of high-pitch tones by an FFP covering the speaker's mouth could cause additional discomfort for elderly listeners. This ailment affects about one-third people of age between 65 and 74 years and almost half of those older than 75 [33]. Of course, louder speech requires more effort in inhaling the air, which causes additional discomfort for the mask wearer. Thus, an attenuation characteristic of masks would be a piece of welcome information for potential users. in an increased share of low-frequency waves in the disrupted speech spectrum. Thus, FFPs not only attenuate the speech but change the timbre of voice. The latter is crucial for proper interpersonal communication. The changed voice timbre discloses the stress level and emotional arousal [30]. Non-verbal information in audible spectra influences emotional responses to speech. Disrupted speech could be particularly annoying for people with partial hearing loss [31]. As presbycusis impedes speech understanding [32], the attenuation of high-pitch tones by an FFP covering the speaker's mouth could cause additional discomfort for elderly listeners. This ailment affects about one-third people of age between 65 and 74 years and almost half of those older than 75 [33]. Of course, louder speech requires more effort in inhaling the air, which causes additional discomfort for the mask wearer. Thus, an attenuation characteristic of masks would be a piece of welcome information for potential users.

1.
All the studied FFPs suppress acoustic waves from the octave bands with center frequencies of 1 kHz and higher, i.e., in the frequency range responsible for 80% of the perceived speech intelligibility. In particular, FFPs significantly attenuate the acoustic waves belonging to the 2 kHz octave responsible for 30% of the intelligibility.

2.
The better the mask filtration efficiency, the stronger is the sound suppression. The masks with exhalation valves suppress sound slightly more than their counterparts without such equipment; the difference is little except in the octave band with the center frequency of 16 kHz. The latter, however, has no practical importance for the understanding of speech. 3.
The speaker-listener distance does not influence the characteristics of the speech deterioration significantly. To compensate for the FFP attenuation, a speaking person would have to raise the voice by one "loudness level" of the speech as defined in ANSI 3.5-1997 standard. Different attenuation in octave bands causes a change in the voice timbre. That can impede speech understanding.
The above conclusions suggest that the agencies for safety and health at work should consider including objective speech attenuation measurements in the relevant standards. Good communication is crucial for safety and comfort in the work environment.