3.1.1. Design and Fabrication of Piezoelectric MEMS Speakers
Piezoelectric actuation, with the advantages of small driving voltage and large actuation force, has been widely used in many MEMS devices, including ink-jet printer heads [
50], MEMS scanning mirrors [
51], ultrasonic motors [
52], RF resonators [
53], and acoustic generators [
54]. Among them, piezoelectric MEMS speakers are important applications and are attracting more and more interest. Piezoelectric MEMS speakers based on different piezoelectric materials, such as zinc oxide (ZnO), aluminum nitride (AlN), and lead zirconate titanate (PZT), have been presented for hearing aid or earphone applications [
35,
55,
56]. Piezoelectric MEMS speakers mainly consist of a piezoelectric vibration diaphragm and an acoustic cavity. Typical vibration diaphragms can be designed as beam-like piezoelectric actuators [
57] (
Figure 5a), fully clamped diaphragms with piezoelectric layers embedded [
12] (
Figure 5b), or partially clamped diaphragms surrounded by piezoelectric actuators [
21] (
Figure 5c). Various piezoelectric MEMS speakers based on different designs have been demonstrated [
12,
58].
The fabrication process of piezoelectric MEMS speakers with various structures can be different, depending on whether the diaphragm needs to be released from both sides (
Figure 5a,c) or the backside only (
Figure 5b), but their general steps are similar. Here, an example for the design of MEMS speakers with a partially clamped diaphragm (
Figure 5c) is presented to illustrate the typical fabrication process. As shown in
Figure 6, firstly, an insulation layer (Si
xN
y or SiO
2), a bottom electrode layer, and a piezoelectric layer are deposited in sequence on a silicon-on-insulator (SOI) substrate (
Figure 6a). After that, the piezoelectric layer is patterned by wet etching or reactive ion etching (RIE) to expose the bottom electrode [
59,
60] (
Figure 6b). Next, a top electrode is deposited and patterned (
Figure 6c). After that, RIE is used to define a diaphragm and a set of piezoelectric actuators on the front side (
Figure 6d). Subsequently, the acoustic cavity is defined on the backside with a two-sided photolithography and formed by the deep reactive ion etching (DRIE) of silicon or wet etching with KOH (
Figure 6e). The buried oxide layer is used as the etch stop and finally removed by RIE or vapor hydrofluoric acid to release the moveable structures (
Figure 6f). For the fabrication of fully clamped diaphragms in
Figure 5b, the process step shown in
Figure 6d can be skipped.
In the design and fabrication of piezoelectric MEMS speakers, the material of the piezoelectric layer is important as it will affect the selection of the fabrication method and the performance of the fabricated devices. Next, the piezoelectric materials for making MEMS speakers will be discussed.
3.1.2. Piezoelectric Materials
Lead zirconate titanate (PZT) ceramics, single-crystal lithium niobate (LiNbO
3), and single-crystal lead magnesium niobate-lead titanate (PMN-PT) are widely used bulk piezoelectric materials with high piezoelectric coefficients and electromechanical coupling factors for piezoelectric transducers [
61]. However, how to thin down these materials remains an issue in fabricating piezoelectric MEMS devices. With the advancement of thin film deposition technologies, piezoelectric thin films including ZnO, AlN, and PZT can be fabricated by sputtering or sol-gel methods, which have been applied to fabricate piezoelectric MEMS devices, such as microspeakers [
62,
63]. Among these materials, ZnO is one of the most commonly used for making piezoelectric thin film devices such as film bulk acoustic wave resonators (FBAR), surface acoustic wave (SAW) resonators, piezoelectric micromachined ultrasonic transducers (pMUTs), and microspeakers in early years. ZnO-based piezoelectric MEMS speakers have been developed as early as in 1996, when Lee et al. fabricated a piezoelectric cantilever transducer that worked both as a microphone and a microspeaker [
58]. In their system, the 2000 × 2000 × 4.5 μm
3 piezoelectric cantilever was fabricated based on a 0.5 μm-thick ZnO layer with the magnetron sputtering method. In 2003, Ko et al. presented a piezoelectric microspeaker based on a clamped 3000 × 3000 × 3 μm
3 diaphragm. This micromachined transducer also has a thin ZnO film as the piezoelectric layer, which is deposited on a membrane of low-stress silicon nitride of 1.5 μm [
64].
Another type of piezoelectric material, AlN, has also been well studied and characterized in the past few decades. A thin film of AlN is normally deposited by the reactive magnetron sputtering method. Sputtered AlN thin films have better chemical and thermal stability than ZnO. The lower conductivity of AlN compared to ZnO also results in lower power loss [
65]. With these advantages, AlN has also been a good candidate for fabricating the piezoelectric layer of MEMS speakers. In 2007, Seo et al. presented piezoelectric microspeakers with circular-type and cross-type electrode configurations based on a 0.5-μm-thick AlN film [
36]. With a diaphragm size of 4 × 4 mm
2, the AlN-based microspeakers achieved good acoustic performance with a high sound pressure level (SPL).
However, it is challenging to sputter ZnO and AlN with controlled properties. Their morphology and crystalline quality will highly affect the piezoelectric constants of materials. In a fabrication process, the sputtering rate and residual stress are dependent on the sputtering condition and film thicknesses [
66,
67]. Sputtering with heated substrates (above 300 ℃) have been reported with large residual stresses [
35,
68], which will wrinkle the diaphragm of fabricated piezoelectric MEMS speakers and affect the sound pressure output. It is possible to deal with such residual stress problem by adding a stress compensation layer or fabricating dome-shaped diaphragms to reduce the effect of the residual stress. For example, in 2000, Han et al. reported dome-shaped piezoelectric MEMS speakers built on 1.5-μm-thick Parylene diaphragms, which can easily release the residual stress through volumetric shrinkage or expansion [
69]. In 2009, Yi et al. reported piezoelectric AlN MEMS speakers with improved performance by controlling the residual stress of the compressively stressed diaphragm using Si
xN
y films [
35]. The results revealed that the SPLs of the piezoelectric AlN microspeakers were increased by more than 10 dB when the residual stresses became more compressive, especially at the low frequency region.
Other limitations of sputtering ZnO and AlN thin films include low deposition rates (tens of nm/min), small film thicknesses, and small piezoelectric constants [
67,
70]. The lower value of piezoelectric constants will directly limit the vibration amplitude of a piezoelectric diaphragm and lead to poor acoustic performance. By contrast, PZT thin films have greater piezoelectric constants and are favorable for the applications of piezoelectric actuation. The sputtering and sol-gel methods have also been employed to deposit PZT thin films with typical thicknesses of 0.5–2 μm, which can be applied to a wide range of applications [
63]. For example, in 2009, Cho et al. fabricated a piezoelectric MEMS speaker based on a sol-gel PZT thin film with a thickness of 700 nm [
11]. The fabricated MEMS speaker had a circular diaphragm with a diameter of 2 mm, which achieved SPLs of 79 dB at 1 kHz, 87 dB at 5 kHz, and 90 dB at 10 kHz under a driving voltage of 13 V. However, sputtered and sol-gel PZT films also suffer from residual stresses and limited thicknesses. Thicker sol-gel PZT films require multiple coatings and high temperature annealing, which will cause serious stress issues. Moreover, since the piezoelectric properties of deposited thin films are largely dependent on the crystal orientation and substrate condition, proper buffer layers are required to prevent the material interdiffusion and oxidation and help to obtain good piezoelectric properties with lower residual stress.
The material properties of these commonly used piezoelectric thin films and the commercial ceramic PZT are summarized in
Table 1. Since most of piezoelectric MEMS speakers work on the d
31 mode of the piezoelectric layer, only the d
31 piezoelectric constant is listed in the table for comparison. Among these materials, AlN thin films have the smallest piezoelectric constant, while PZT thin films exhibit the highest piezoelectric constant, which is about 10 to 20 times greater than that of ZnO thin films. However, the piezoelectric constant of PZT films also vary in a wide range, dependent on the film thickness, deposition, and poling conditions. In particular, the piezoelectric coefficient of the commercial ceramic PZT (e.g., PZT-5H) can reach 300 pm/V [
71], which makes it a promising candidate for the construction of piezoelectric transducers.
3.1.3. Approaches to Improve SPLs
Although a large number of piezoelectric MEMS speakers have been demonstrated based on various piezoelectric thin films with promising results, inadequate sound pressure level (SPL) outputs and non-flat frequency responses are common challenges of these devices. High SPLs of over 90 dB were achieved in a few piezoelectric MEMS speakers, but they were measured either in canals or ear simulators or at high-frequency resonances. Piezoelectric MEMS speakers with high SPLs (90 dB or above) over wide frequency ranges, especially in open air and low-frequency range, are needed for broader applications such as mobile phones, laptops, wearable electronics, and Internet of Things (IoT) devices. Therefore, several approaches have been proposed to improve the SPLs of piezoelectric MEMS speakers in terms of materials and fabrication processes and structure designs, which will be reviewed in the following.
Materials and Fabrication Processes
As discussed in
Section 3.1.1, the commonly used piezoelectric thin films of ZnO and AlN deposited by sputtering or sol-gel methods suffer from large residual stresses and limited thickness. For sputtered or sol-gel PZT, their obtained piezoelectric constants are also not comparable with those of bulk piezoelectric crystals or ceramics. As illustrated in
Table 1, the piezoelectric constant of ceramic PZT is over four times greater than that of sputtered or sol-gel PZT. Thus, ceramic PZT was gradually employed in fabricating the piezoelectric layer of MEMS speakers with particular fabrication process to thin down this material. In 2009, Kim et al. thinned ceramic PZT down to around 40 μm and fabricated piezoelectric MEMS speakers based on it, and they measured an SPL of 90 dB (
5 dB) in the audible frequency range under a 32-V
pp drive at 1 cm away from the MEMS speaker in an anechoic box [
17]. The fabricated MEMS speaker also exhibited a total harmonic distortion (THD) of less than 15% from 400 Hz to 8 kHz. However, the acoustic diaphragm was as large as 20 mm × 18 mm.
Since the resonant frequency of a diaphragm is affected by its area and thickness, scaling down the diaphragm size requires a thinner piezoelectric layer to maintain a proper resonant frequency. In 2020, Wang et al. presented a piezoelectric MEMS speaker based on thin ceramic PZT [
16]. By using wafer bonding and chemical mechanical polishing techniques, ceramic PZT was thinned down to only 5 μm and applied to fabricate MEMS speakers. An optical image of the fabricated MEMS speaker and a cross-section SEM image of the device layers are shown in
Figure 7a1,a2. Thin ceramic PZT not only exhibits much greater piezoelectric constants than sol-gel or sputtered PZT thin films but also has a wider range of thicknesses, thus allowing the scaling of diaphragms within size restrictions for different applications. With a 6 mm diameter diaphragm, the fabricated MEMS speaker achieved a maximum SPL of 119 dB measured at 1 cm under a 10-V
pp drive, as shown in Figure 9a [
16].
Furthermore, lead-free piezoelectric ceramics with high piezoelectric constants have also been explored for fabricating piezoelectric MEMS speakers. For example, in 2014, Gao et al. fabricated piezoelectric MEMS speakers using potassium sodium niobate ((K,Na)NbO
3, KNN)-based multilayer piezoelectric ceramics [
77]. They employed a tape casting and cofiring process and used Ag–Pd alloys as an inner electrode. A schematic of the multilayer ceramics based piezoelectric MEMS speaker and a cross-section SEM image of the multilayer KNN-based ceramics are shown in
Figure 7b1,b2, respectively. With a form factor of 23 × 27 × 0.6 mm
3, using three layers of 30-μm-thick KNN-based ceramics, the fabricated MEMS speakers showed an average SPL of 87 dB from 1 kHz to 20 kHz measured at 3.16 cm under a 5-V
rms drive.
Structure Designs
As illustrated in
Section 2.1, the output SPL of a MEMS speaker is directly determined by the frequency, area, and displacement of its diaphragm. Increasing the out-of-plane displacement of piezoelectric diaphragms is an effective approach to improve SPLs, especially at low frequency, as a much larger displacement is required at low frequency to achieve the same SPL at high frequency. Therefore, various designs of piezoelectric MEMS speakers have been proposed to improve their SPLs by changing the diaphragm structures, electrode configurations, or using an array form to enhance their acoustic performance.
Diaphragm Structures
In 2018, Stoppel et al. demonstrated a piezoelectric MEMS speaker based on a 2-μm-thick sputtered PZT with two open cuts on a square diaphragm (4 × 4 mm
2) for in-ear applications, as shown in
Figure 8a [
18]. Without a closed diaphragm, four individual actuators are mechanically decoupled from each other and thus can achieve larger out-of-plane displacements. The measurement in an ear simulator showed a high SPL of above 81 dB from 20 Hz and above 100 dB from 4.7 kHz to 15.8 kHz under a 2-V
pp drive, as shown in
Figure 9b. The measured THD was less than 2% at most frequencies, except for the subharmonics of the resonance frequency, where the THD was increased to 7%.
In 2020, Cheng et al. presented a piezoelectric MEMS speaker with enhanced SPL by designing suspension-spring actuators with a dual-electrode driving [
21]. As shown in
Figure 8b, the designed MEMS speaker consisted of a circular moveable diaphragm and four flexible spring actuators. Dual-curve spring actuators with dual-electrode driving were utilized to achieve larger displacements than single-curve spring actuators under the same form factor. Measurements in a 3-cm-long tube showed a maximum SPL of 90.1 dB at the resonance of 1.85 kHz under a 2-V
pp drive, which was 28 dB higher than the SPL of a fully clamped diaphragm speaker at the same frequency (
Figure 9c). The measured THD of the dual-curve spring device was also lower than those of the clamped diaphragm devices, which was less than 2% at most frequencies and low than 8% at the resonant or harmonic frequencies.
In addition to employing unsealed vibration diaphragms with large displacements, Wang et al. proposed a rigid–flexible vibration coupling mechanism in 2021. By depositing a Parylene film on a pre-etched diaphragm, the fabricated MEMS speaker can maintain large displacements of the unsealed diaphragms without acoustic loss. Measurement in an ear simulator under a 2-V drive showed SPLs can exceed 59 dB from 250 Hz to 20 kHz, with the maximum value of 101.2 dB obtained at the resonance of 6.7 kHz [
78].
To improve SPLs over a broad frequency range, in 2021, Wang et al. proposed a cantilever array design with an in-phase/out-of-phase hybrid driving method to realize a broadband piezoelectric MEMS speaker [
79]. As shown in
Figure 8c, the device consisted of four piezoelectric cantilevers with different dimensions, the four resonance frequencies of which contribute to the broadband performance of the MEMS speaker. In this device, in order to avoid the sound pressure cancellation due to the large phase shifts around the resonances of the cantilevers, a hybrid drive voltage with a combination of both in-phase and out-of-phase signals was applied to ensure that the cantilevers vibrate in the same direction. Measurements showed a broadband frequency response from 100 Hz to 10 kHz with an SPL of 70 dB or higher and a maximum SPL of 110 dB at 1.54 kHz in an ear simulator under a 2-V
pp drive.
Electrode Configurations
Efforts have also been devoted to improving the SPLs of MEMS speakers by the special design of electrode configurations. Electrode configurations on piezoelectric diaphragms are important as they largely determine the excitation mode, vibration displacement, and electromechanical coupling efficiency. As introduced in
Section 2.1, most piezoelectric MEMS speakers work on the d
31 flexural vibration mode of piezoelectric diaphragms with the electrical field applied in the thickness direction and the strain generated in the lateral directions. In addition to the d
31 vibration mode, piezoelectric materials can also be excited in the d
33 mode with the applied electrical field and the generated stain in the same direction, typically in the thickness direction. Typically, the magnitude of the d
33 constant of a piezoelectric material is roughly two times larger than that of the d
31 constant. Therefore, by proper electrode configurations, the d
33 mode of piezoelectric diaphragms can be excited with larger out-of-plane displacements than the d
31 mode. In 2015, Kim et al. presented a piezoelectric MEMS speaker based on the d
33 mode PMN-PT single crystal diaphragm with a circular inter-digitated electrode (IDE) configuration and studied the effects of the patterned electrodes on the acoustic characteristics of the MEMS speaker [
23]. A single crystal PMN-PT was thinned down to 10 μm to form an 8.5 mm diameter diaphragm by grinding, polishing, and inductively-coupled-plasma (ICP) etching, followed by metallization with circular IDE patterns on the top, as shown in
Figure 10a. Measurements showed improved SPL with increasing area of the patterned IDE. With an 8 mm diameter IDE, the MEMS speaker showed an average SPL of above 70 dB from 1 kHz to 10 kHz and a maximum SPL of around 100 dB at 1 cm under a 5-V
rms drive.
In addition to the IDE configuration that can excite the piezoelectric d
33 mode for SPL improvement, dual-electrode configuration has been investigated to improve the SPLs of piezoelectric MEMS speakers working on the d
31 mode. In 2020, Tseng et al. presented a piezoelectric MEMS speaker with the SPL improved by dual-electrode driving [
56]. The schematic of the designed MEMS speaker is shown in
Figure 10b, where the square diaphragm consists of four triangular plates whose vibrations are synchronized by a connection mass. The low frequency response can be enhanced by reducing the size of the gaps between the triangular plates. Each triangular plate can be driven by an inner electrode and an outer electrode with a 180° phase difference to actuate the piston mode of the diaphragm to increase the SPL. Measurements showed a SPL enhancement of 9.5 dB under the dual-electrode driving in comparison with the single (inner or outer) electrode driving.
In addition to the 180° out-of-phase, other phase differences in dual-electrode driving and their influences on the SPL improvement of piezoelectric MEMS speakers have been studied. In 2021, Wang et al. presented a ceramic PZT-based piezoelectric MEMS speaker with the SPL improved by dual-electrode driving and studied the effects of the phase difference at different frequencies [
24]. As shown in
Figure 10c, the reported MEMS speaker consists of an inner circular electrode and an outer ring-shaped electrode. By applying sine waves on these two electrodes with a phase difference tuned from 0° to 360° in the experiments, the measurement results revealed that the SPL changed significantly with the phase difference and was frequency dependent, peaking at different phase differences for different frequencies. With the optimal phase differences, a 2–10 dB SPL improvement can be achieved in the frequency band spanning from 600 Hz to 10 kHz, compared with the single-electrode driving method.
Array Structures
Another approach to improve the SPLs of the piezoelectric MEMS speakers is using digital sound reconstruction or speaker arrays. Different from traditional sound generation techniques that rely on the vibration amplitudes and frequencies of a single or a few diaphragms to achieve high SPL at specific frequencies, digital sound reconstruction generates loud sound by adding the outputs of a large number of speaker pixels that can be excited individually by signals with different frequency compositions [
80]. Typically, a speaker array containing 2
n speaker pixels is used in digital sound reconstruction, where
n is the bit number, and each pixel contributes a small amount of sound pressure in the system. In 2015, Casset et al. implemented digital sound reconstruction with piezoelectric MEMS speaker arrays [
81].
Figure 11a shows the fabricated speaker array packaged on an electronic board. With a chip size of 4 × 4 cm
2, the speaker array contains 256 piezoelectric diaphragms based on a 2-μm sol-gel PZT film. The output SPL of the speaker array reached over 100 dB at 13 cm. In 2016, Arevalo et al. increased the bit number and presented a 10-bit (1024 elements) piezoelectric MEMS speaker array with a chip size of 2.3 × 2.3 cm
2 [
82]. An optical image of part of the speaker array is shown in
Figure 11b. The characterization results demonstrated the potential of piezoelectric MEMS loudspeaker arrays for digital sound reconstruction, but more efforts are still needed to optimize the design for better acoustic performances.