Wearable Impedance-Matched Noise Canceling Sensor for Voice Pickup †

: Communicating under extreme noise conditions remains challenging in spite of higher-order noise-canceling microphones, throat microphones, and signal processing. Both natural and human-made background ambient noise can disturb the conveyance of information because of high noise levels. Noise cancellation, which is used frequently in audio technology, has limits in noise reduction and does not guarantee clear vocal pickup in these severe situations. A contact microphone that is attached directly to the medium of interest has the potential to pick up vocal signals with reduced noise. In this study, an electrostatic transducer with an elastomer layer that is impedance-matched to the human body is used to pick up speech sounds through constant contact on the chin and cheek. By attaching the wearable device directly to the skin, the medium of air is bypassed, and airborne noise is passively canceled. Because of the acoustic impedance-matched layer, the sensor is more sensitive to low frequencies under 500 Hz, so frequency equalization was implemented to ﬂatten the frequency response throughout the vocal range. The perceptual evaluation of speech quality (PESQ) scores of the wearable device with equalization averaged around 2.6 on a scale from –0.5 to 4.5. Speech recordings were also collected in a noise ﬁeld of 85 dB, and the performance was compared to a cardioid lapel mic, a cardioid dynamic mic, and an omnidirectional condenser mic. The recordings revealed a signiﬁcantly reduced presence of white noise in the contact sensor. This study provides preliminary results that show potential vocal applications for a wearable impedance-matched sensor.


Introduction
Extreme noise conditions such as construction and heavy traffic can reduce the quality of communication.Active noise cancellation, piezoelectric throat microphones, and signal processing techniques are methods used to improve the conveyance of information in the presence of ambient noise.Efforts in active noise cancellation and adaptive filtering algorithms provide attenuation of about 20-30 dB [1,2].This may not be sufficient for optimal communication and voice pickup in environments with high noise levels.Various higher-order microphones have been developed for improved directionality, but many higher-order microphones also have high noise sensitivity [3].
A recently developed electrostatic transducer has the potential to reduce high noise levels while picking up sounds.The sensor's elastomer layer is impedance-matched to the skin, the medium of interest [4,5].By attaching the device to the skin, the medium of air is bypassed, so the transducer passively rejects airborne noise while reducing the loss of signal energy [5].The impedance-matched sensor has been implemented in a wide range of settings such as musical acoustics [6] and body sound monitoring [5].When the transducer is placed on areas with high vocal vibration, such as under the chin or on the cheek, it can be used as a wearable sensor with high noise-cancellation abilities for voice pickup.This paper focuses on enhancing the speech-pickup abilities of the acoustic impedance-matched transducer and comparing it to more widely used microphones to demonstrate the sensor's potential in vocal applications.

Impedance-Matched Transducer
The electrostatic transducer was created with a tuned elastomer layer with coated microstructures and a charged fluorinated ethylene propylene (FEP) film, as seen in Figure 1.Corona charging was used to charge the FEP film, and the layers were encased with shielding to create a thin shape (Figure 2a,b).

Experimental Setup
Medical tape (3M Tegaderm™) was used to adhere the device to the cheek and under the chin at positions shown in Figure 3.For comparison, three conventionally used microphones were selected: a cardioid lapel microphone (AT898 Lavalier Mic, Audio-Technica, Tokyo, Japan), a cardioid dynamic microphone (e835 Dynamic Mic, Sennheiser, Wedemark, Germany), and an omnidirectional condenser microphone (Yeti Pro Mic, Blue Microphones.China; on omnidirectional mode).A Focusrite Scarlett 2i4 Audio Interface collected the output of the transducer and microphones, and Audacity was used to record the audio.Phantom power of 48 V was supplied by the audio interface for all recordings, and the gain was adjusted to avoid clipping.The microphones were positioned based on polar patterns.The cardioid dynamic mic was placed on a tabletop microphone holder at a 45-degree angle downward and about 1 inch away from the subject's mouth to ensure maximum pickup.The omnidirectional condenser mic is best placed 4-10 inches away from the source, so it was propped upward and placed 8 inches away from the subject's mouth.The cardioid lapel mic was attached facing upward to the upper chest area of the subject with a magnetic clip.

Speech Quality
To measure the quality of the speech recorded by the transducer, the first list in the Harvard sentences was used [7].Each list in the Harvard sentences is phonetically balanced and widely used for speech quality measurements.The subject recorded all 10 sentences from List 1, and additional samples included a tongue-twister and counting from 1-10 (Table 1).The transducer, lapel, and omnidirectional mic were each simultaneously recorded with the dynamic mic, which was set as the reference because of its robustness.
Tongue-twister She sells seashells by the seashore.
Harvard sentences 1.The birch canoe slid on the smooth planks.List 1 2. Glue the sheet to the dark blue background.
3. It's easy to tell the depth of a well.4.These days a chicken leg is a rare dish. 5. Rice is often served in round bowls.6.The juice of lemons makes fine punch.7. The box was thrown beside the parked truck.8.The hogs were fed chopped corn and garbage.9. Four hours of steady work faced us.10.A large size in stockings is hard to sell.

Post-Processing
Key phonetic features exist in the frequency range up to 6-8 kHz, and higher frequencies can also add spectral information [8].The acoustic transducer is sensitive to lower frequencies below 500 Hz, so post-processing was needed to enhance the speech recordings.Lowpass filtering at 40 Hz was performed to remove unwanted noise from the impedance-matched transducer recordings.The recordings from the transducer were then amplified by 8 dB to match the volume of the other microphones.
To flatten the frequency response of the transducer's output, commercial equalization software (Logic Pro 10.7.9) was used to match the frequency response of the transducer recording to that of the dynamic mic recording.Equalization had a cutoff frequency at around 1300 Hz for the cheek-positioned recordings and a cutoff frequency at around 1900 Hz for the chin-positioned recordings to minimize high-frequency noise.These cutoffs were determined by maximizing the perceptual evaluation of speech quality (PESQ) [9].

Speech Quality Measurement
To quantify the quality of the speech, the PESQ score was calculated.PESQ takes into consideration noise and audio distortion [9].The scores range from −0.5 to 4.5 with scores between 2 and 3 needing moderate effort to understand, and scores 3 and above needing less effort to understand.

Noise Cancellation
A noise field was created using two speakers (Yamaha HS8) placed at two opposing corners of a sound booth.White noise generated in Audacity was played through both speakers synchronously, and the decibel level was measured using a sound level meter (Martel 322).The decibel levels were measured in dBA with the meter placed at the site of the recording equipment.The subject sat between the two speakers with the transducers, and the microphones were positioned similarly to the description above (Figure 4).An '/a/' sound was recorded at noise levels of 60 dB to 85 dB in 5 dB increments, and a reference recording was made without noise.

Results
The speech quality and noise cancellation results are summarized below.

Speech Quality
Table 2 presents the PESQ scores of the impedance-matched transducer and comparison microphones.The PESQ score increased for all sentences after post-processing for the acoustic transducer.The average post-processed transducer score is about 2.59 and about 0.5 less than the average score of the other two microphones.Despite containing more artifacts, such as buzzing, the cheek position had a slightly higher post-processed PESQ score than of the cheek position.

Noise Cancellation
Spectrograms of the signals (Figure 5) show reduced noise in the transducer recordings compared to the other microphones.The white noise visibly appears throughout the frequency range in the audio signals for the comparison microphones at 85 dB.The transducer audio signals have very little visible white noise, and they are similar throughout different noise-level environments.The white noise is also audibly significant in the three microphone recordings compared to the transducer recordings.

Conclusions
The current paper provides preliminary results on the vocal pickup abilities of an acoustic impedance-matched transducer.The transducer captured similar audio quality at the cheek and chin positions.Equalization improved the speech quality of the transducer recordings, increasing the PESQ score to an intelligible level of 2.6 from the original 1.7.The speech quality may reach a similar standing to other microphones if post-processing methods and additional transducer tuning techniques are further investigated.For noise cancellation, the transducer proved to have superior noise reduction capabilities in comparison to three different microphones.Little noise was detected even at loud noise levels of 85 dB.Future comparison with contact microphones such as throat microphones may prove to be helpful.The impedance-matched sensor demonstrates potential as a wearable noise-canceling contact microphone in vocal applications, particularly for extreme noise conditions.

Figure 2 .
Figure 2. (a) Bottom view of the transducer.(b) Side view of the transducer.

Figure 3 .
Figure 3. Position of the transducer when taped to the cheek and under the chin on the subject.

Figure 4 .
Figure 4. Sound booth set up for noise cancellation experiment.

Figure 5 .
Figure 5. Spectrograms of the recordings at no noise and 85 dB noise using Hamming window and 50% overlap.

Table 1 .
Sentences recorded for speech quality experiment.

Table 2 .
Table of PESQ scores for each transducer position and microphone.The dynamic mic was used as the reference.
* All numbered sentences are from List 1 of the Harvard Sentences.