Above and below: Military Aircraft Noise in Air and under Water at Whidbey Island, Washington

Military operations may result in noise impacts on surrounding communities and wildlife. A recent transition to more powerful military aircraft and a national consolidation of training operations to Whidbey Island, WA, USA, provided a unique opportunity to measure and assess both in-air and underwater noise associated with military aircraft. In-air noise levels (110 ± 4 dB re 20 μPa rms and 107 ± 5 dBA) exceeded known thresholds of behavioral and physiological impacts for humans, as well as terrestrial birds and mammals. Importantly, we demonstrate that the number and cumulative duration of daily overflights exceed those in a majority of studies that have evaluated impacts of noise from military aircraft worldwide. Using a hydrophone deployed near one runway, we also detected sound signatures of aircraft at a depth of 30 m below the sea surface, with noise levels (134 ± 3 dB re 1 μPa rms) exceeding thresholds known to trigger behavioral changes in fish, seabirds, and marine mammals, including Endangered Southern Resident killer whales. Our study highlights challenges and problems in evaluating the implications of increased noise pollution from military operations, and knowledge gaps that should be prioritized with respect to understanding impacts on people and sensitive wildlife.


Introduction
Military aircraft activity in the Salish Sea, Washington State, has been increasing over the past decade due to changes in operations and training for personnel out of the Naval Air Station Whidbey Island (NASWI). Although naval flights have been operating in the area for decades, the recent transition from Northrop Grumman EA-6B Prowler to the more powerful Boeing EA-18G Growler aircraft for electronic warfare has led to increases in the number of complaints about noise, including concern for area wildlife. Consolidation of nationwide training for these aircraft to NASWI increased the fleet size by 44% (from 82 to 118 aircraft) in 2019, with corresponding increases in air carrier practice, electronic warfare training, and overall base operations [1]. The changes at NASWI are reflective of a broader national trend in military base closures and consolidation, which are likely to intensify community noise and air pollution in some areas [2]. The implications of a concurrent change to more powerful aircraft and increased operations for noise pollution have not been

In-Air Acoustic Data Collection
Growlers were recorded in air at Moran County Beach Park (48.3693, −122.6662), the nearest public location from the underwater recording site (see below) on September 13 and 16, 2019, and located under FCLP flight track 14 for Ault Field (Figure 1b). On both days, FCLPs were scheduled from "Morning to Late Afternoon", and were done on track 14, with jets circling south to north; as a result, the recorder was capturing sound associated with landings. An observer logged the type and number of all visible and (in the case of takeoffs) audible aircraft events and noted the direction of travel and flight activity as landing, pass, or takeoff. A Songmeter SM4 autonomous recorder (Wildlife Acoustics, Maynard, MA, USA) collected audio data from aircraft landings and flyover events. Sound was sampled at 48 kHz and with zero gain added. The Songmeter was deployed between 0930 and 1530 on September 13 and between 1100 and 1500 on September 16. A sound level data logger (Extech 407760; Nashua, NH, USA) was deployed at the same time recording A-weighted sound levels (dBA) at 1-s intervals; however, the data logger failed to record on the 13th, so simultaneous sound pressure levels were collected with audio data on September 16 only. practice (FCLP) by Growlers. FCLPs are intended to replicate conditions for carrier-based takeoffs and landings and feature repeated "touch-and-go" flights; a certain number of these must be conducted at night to adequately prepare pilots. Although FCLP is the dominant type of aircraft training at NASWI, other base aircraft activities include electronic warfare and air-to-air combat training in nearby military operations areas [10], submarine detection, and cargo aircraft training [17].  The number of aircraft events during the FCLPs on September 13 and 16 was summarized from the visual observations for the period of 1100-1500 (when FCLP activity occurred) on both days. To provide a visual representation of the timing and duration of FCLPs, long-term spectral averages (LTSAs) were generated for the same periods each day in MATLAB, using 1-s and 1-Hz resolutions.

Underwater Acoustic Data Collection
Growlers were recorded under water with a SoundTrap 300 STD autonomous recorder (Ocean Instruments, Auckland, New Zealand) that was factory-calibrated and programmed to record continuously at 96 kHz sampling frequency (fs) prior to deployment. The SoundTrap was deployed off the northwest coast of Whidbey Island, approximately 1400 m from the end of the east-west runway and 1000 m from the shoreline (Figure 1b). This location is below the path of aircraft taking off to the west, and FCLP flight tracks 7 and 32.
The SoundTrap was suspended in a metal cage 2 m above the sandy mud sea floor, and was moored using a system of concrete blocks, sinking line, and two floats (Figure 1c). The SoundTrap was deployed twice for two weeks, totaling 28 days of data collection. In between deployments, the SoundTrap was retrieved for charging and downloading data. It was first deployed on 15 August 2019, at 13:54 PDT at 48.3492, −122.6917, at a depth of 33.2 m, and then on August 29, 2019, at 12:12 PDT, in a similar location (48.3494, −122.6907), at a depth of 34.7 m. The weather throughout this month was variable, consisting of rain, wind, sun, and clouds. Growlers taking off to the west flew over the SoundTrap at an altitude of 120-190 m above sea level.

Data Analysis
Visual observations confirmed the occurrence of 23 single Growler flights over the Songmeter on September 16, during an FCLP session with only one aircraft operating. Opportunistic observations on four dates (August 15, 28, 29, and September 12) visually confirmed the occurrence of ten Growler flights over the SoundTrap. These overflights were manually identified in the recordings using Audacity© (Version 2.3.2; retrieved 20 September 2019 from https://audacityteam.org/) and a 15-s audio file was saved for each overflight. The audio files were analyzed using a custom script in MATLAB (version 2018b; The MathWorks Inc., Natick, MA, USA). Each 15-s file was calibrated and then Fourier-transformed in 1-s Hann windows (i.e., the number of Fast Fourier Transform components NFFT equaled the number of samples per second) with 67% overlap. A time series of band levels was computed between 20 Hz and 20 kHz, corresponding to the frequency band occupied by Growler overflights. The peak band level was assumed to correspond to the time when the plane was directly overhead. The mean-square sound pressure spectral density (in short, power spectral density, PSD; [20]) from the corresponding 1-s window was saved. A 140-min sample of underwater ambient noise (i.e., sound received at this location from all sources but the signal of interest: airplane noise) was collated from before, in between, and after the overflights, and also calibrated and Fourier-transformed in 1-s Hann windows with 67% overlap.
Over all overflights, median and quartile PSD levels, one-third octave band levels, and weighted levels were computed. One-third octave band levels were obtained by integrating the PSD into bands that are 1/3 of an octave wide, then applying 10×log 10 [20]. One-third octave band levels were compared to published audiograms to estimate which parts of the in-air and underwater noise spectra might be audible to the two ESA-listed species (SRKW and marbled murrelet), and at what levels. Audiogram data were extracted from publications using the software program WebPlotDigitizer (Version 4.2; A. Rohatgi, Pacifica, CA, USA) if data tables were not published. The killer whale underwater audiogram followed the model proposed by Branstetter et al. (2017) [21]. In the absence of killer whale critical ratio data across the frequency band of Growler noise, one-third octave bands were used as a conservative estimate (see Figure 4A in Erbe et al. 2016) [22]. There is no audiogram available for marbled murrelet, so audiograms of other seabirds were used as surrogates. In-air and underwater audiograms for cormorant (Phalacrocorax carbo sinensis) were measured by Johansen et al. (2016) [23], the in-air audiogram for the lesser scaup (Aythya affinis) duck by Crowell et al. (2016) [24], and the in-air audiograms for common murre (Uria aalge) and Atlantic puffin (Fratercula arctica) by Mooney et al. (2019) [25]. One-third octave bands were also used for the birds in air and under water [24,26]. We report A-weighted levels for humans as well as audiogram-weighted levels for the animals in air and under water. Audiogram-weighting involved filtering the sound spectrum by the animal audiogram prior to integration over frequency. In praxis, the audiogram was interpolated to 1-Hz resolution for comparison to the noise spectrum, also in 1-Hz resolution. Over the range of frequencies where the noise PSD levels exceeded the audiogram levels, the audiogram levels were subtracted from the noise PSD levels at each frequency, yielding differences in dB at each frequency. Differences were converted to linear quantities (by applying 10ˆ(level/10)), which were then integrated over frequency, and the result was converted to a level-quantity (by taking 10×log 10 ), yielding the audiogram-weighted level in dBth.
To evaluate the scope of potential impact at the ecosystem level, we compared the distribution of recorded (i.e., received) levels in air and under water with thresholds of behavioral and physiological stress responses for humans and a suite of representative terrestrial and marine species. Selection of representative species, responses, and thresholds from the literature was guided by two criteria: if the species occurred in or was a reasonable surrogate for species in the Salish Sea area, and if the study used noise stimuli that was a sensible proxy (i.e., low-mid frequency, broadband) for aircraft noise. Whenever possible, studies that established or modeled a noise-dose relationship were used; in the case of modeled probability, the 50% likelihood of response was used as the threshold. Despite recognition that human-weighting of sound pressure levels is understood to be potentially unsuitable for wildlife [16,27], we found that most terrestrial studies nonetheless evaluated responses to A-weighted sound pressure levels. In addition to thresholds for people [28,29], the final suite of terrestrial species (or genus) contrasted against in-air received levels were: marbled murrelet [30], owls [31][32][33], harlequin duck (Histrionicus histriónicas) [34], and caribou (Rangifer tarandus) [35]. Marine species selected for contrast with underwater received levels were: killer whales [36], common murre [37], harbor porpoise (Phocoena phocoena) [38], herring (Clupea harengus) [39], and California sea lion (Zalophus californianus) [40].
Although our study design did not allow for comparison of underwater sound from Growlers with other surface-confirmed anthropogenic sources (in this area, primarily boats), we used LTSAs to visually represent and contrast underwater sound from Growlers and vessels. We used the weekly notifications of FCLPs (Table S2) to focus on dates and time periods (e.g., "Midmorning", "Late Afternoon") when training was scheduled for Ault Field, and created LTSAs for these periods (1-s and 1-Hz resolutions) to identify periods when both Growler noise and vessel noise were present. Three 1-h LTSAs were generated to visualize the underwater soundscape under varying flight and vessel activity.

Comparison of Sound Levels and Flight Activity with Prior Studies
To place sound levels and flight activity at NASWI in the context of those documented in other studies, we conducted a literature review to identify studies of impacts of military low-altitude flights (MLAF) on people and wildlife. We restricted our search to these studies because the noise strength, onset rate, and intermittent nature of MLAF are distinct from commercial or general aviation aircraft [7,41]. In particular, comparable environmental noise levels (>100 dBA) are encountered only rarely in other contexts [3]. Our initial search resulted in 26 primary research articles that evaluated impacts of MLAF on people or communities (i.e., annoyance, hearing damage or loss, and effects on mental and physical health), and 34 articles that examined impacts on wildlife (Data S1). A subset was removed before extracting noise data; reasons for exclusion included inability to obtain full articles, reporting of events only (vs. noise), or non-relevant context (e.g., air shows) (Data S1); some studies also had multiple publications related to the same dataset (Table S1). The final number of studies from which noise metrics were extracted was 12 (people) and 18 (wildlife) ( Table S1). The number of studies that have measured or modeled impacts of underwater noise from aircraft was too low for meaningful analysis, and included studies therefore only reflect in-air conditions.
From each study, three metrics were extracted or estimated: (1) maximum received sound level, (2) typical or average number of daily events > 100 dBA, and (3) total daily duration in seconds > 100 dBA (Table S1). If a typical number of daily MLAF events was not reported, we calculated the average number of daily events as the total reported events divided by the number of days when recording took place; since military activity usually occurs almost exclusively on weekdays, weekend days were excluded from this formulation. A threshold of 100 dBA was used because it was relevant to the current study and is frequently used as a reporting threshold, facilitating the extraction of metrics across disparate studies that could include events both below and above that threshold.
The region of study, year in which the study was conducted, and (for wildlife) the focal taxonomic group were also extracted. If a study included multiple noise treatments or examined geographic areas with different received levels, metrics were extracted for each treatment or geographic area. If the duration of individual flight events was not reported, a conservative estimate of 4 s per overflight event (based on mean reported event duration across all field studies) was used (Table S1). The same metrics were then calculated for the current study using the sound pressure level data and observed flight events from September 13 and 16. These two dates do not represent maximum daily periods with FCLPs, which is up to 8 time periods per day, but typical and moderate activity on training days (Table S2). The relative positions of different studies with respect to the three metrics were contrasted separately for people and wildlife.

Sound Levels, Audibility, and Response Thresholds
The waveform and spectrum of an example overflight recorded in air are shown in Figure 2a. Broadband levels (20 Hz-20 kHz) exceeded 117 dB re 20 µPa for about 1 s in this example.
Averaged over all 23 overflights, the received level was 110 ± 4 dB re 20 µPa rms and 107 ± 5 dBA; maximum received levels were 119 dB re 20 µPa and 118 dBA. In-air noise covered a frequency band from 20 Hz to greater than 10 kHz, peaking between 50 Hz and 1 kHz (Figure 3a). Comparing 1 / 3 octave band levels with audiograms indicated that in-air noise from Growlers would be audible to all species within the limits of the audiogram measurements available, which ranged from a minimum of 250 Hz for cormorants to a maximum of 8 kHz for ducks ( Figure 3b). Audiogram-weighted levels suggested that murre might experience less disturbance (18-28 dBth) from Growlers compared with puffins (60-65 dBth), cormorants (65-71 dBth), and ducks (81-88 dBth; Table 1). A-weighted noise levels experienced by people ranged from 104 to 109 dBA.
The waveform and spectrum of an example overflight recorded under water are shown in Figure 2b. Broadband levels exceeded 131 dB re 1 µPa for about 1 s in this example. Averaged over the 10 overflights, the received level in the strongest 1-s window was 134 ± 3 dB re 1 µPa rms. The underwater noise recorded during the 10 overflights covered a frequency band from 20 Hz to 30 kHz, peaking between 200 Hz and 1 kHz (Figure 4a). Based on intersection with audiograms, Growler noise penetrating the water was expected to be audible to killer whales between 200 Hz and 40 kHz, and to cormorants between 1 kHz and 4 kHz (Figure 4b). Audiogram-weighted levels indicated Growler flights would result in 48-56 dBth of noise for killer whales, and 40-44 dBth for cormorants (Table 1)  Averaged over all 23 overflights, the received level was 110 ± 4 dB re 20 µPa rms and 107 ± 5 dBA; maximum received levels were 119 dB re 20 µPa and 118 dBA. In-air noise covered a frequency band from 20 Hz to greater than 10 kHz, peaking between 50 Hz and 1 kHz (Figure 3a). Comparing ⅓ octave band levels with audiograms indicated that in-air noise from Growlers would be audible to all species within the limits of the audiogram measurements available, which ranged from a minimum of 250 Hz for cormorants to a maximum of 8 kHz for ducks (Figure 3b). Audiogramweighted levels suggested that murre might experience less disturbance (18-28 dBth) from Growlers  In-air (a) received power spectral density (PSD) from 23 overflights (grey), and median (blue) and quartile (red and green) levels. (b) One-third octave band levels (median and quartiles; blue, red, and green, respectively) are compared to the in-air audiograms of cormorants, ducks (i.e., lesser scaup), murres, and puffins. Noise above the audiogram lines is expected to be audible. The waveform and spectrum of an example overflight recorded under water are shown in Figure  2b. Broadband levels exceeded 131 dB re 1 µPa for about 1 s in this example. Averaged over the 10 overflights, the received level in the strongest 1-s window was 134 ± 3 dB re 1 µPa rms. The underwater noise recorded during the 10 overflights covered a frequency band from 20 Hz to 30 kHz, In-air (a) received power spectral density (PSD) from 23 overflights (grey), and median (blue) and quartile (red and green) levels. (b) One-third octave band levels (median and quartiles; blue, red, and green, respectively) are compared to the in-air audiograms of cormorants, ducks (i.e., lesser scaup), murres, and puffins. Noise above the audiogram lines is expected to be audible. When compared with thresholds of behavioral and physiological stress responses in humans and a suite of terrestrial wildlife (i.e., terrestrial birds and mammals), we found that in-air received levels exceeded all identified thresholds (Figure 5a). Underwater received levels exceeded thresholds of startle response for common murre and avoidance by killer whales. The strongest received levels exceeded the threshold of startle response for herring and harbor porpoise, but were below those associated with avoidance in California sea lions (Figure 5b). peaking between 200 Hz and 1 kHz (Figure 4a). Based on intersection with audiograms, Growler noise penetrating the water was expected to be audible to killer whales between 200 Hz and 40 kHz, and to cormorants between 1 kHz and 4 kHz (Figure 4b). Audiogram-weighted levels indicated Growler flights would result in 48-56 dBth of noise for killer whales, and 40-44 dBth for cormorants (Table 1). When compared with thresholds of behavioral and physiological stress responses in humans and a suite of terrestrial wildlife (i.e., terrestrial birds and mammals), we found that in-air received levels exceeded all identified thresholds (Figure 5a). Underwater received levels exceeded thresholds of startle response for common murre and avoidance by killer whales. The strongest received levels exceeded the threshold of startle response for herring and harbor porpoise, but were below those associated with avoidance in California sea lions (Figure 5b).  LTSAs of in-air recordings show the pattern of FCLPs as 30-60 min periods of rapid consecutive flights interspersed with shorter intervals of reduced or no flights ( Figure S1). Underwater noise was detected on multiple dates and time periods when FCLPs were scheduled, with the same pattern of clustered activity (Figure 6a,b). Visual contrasts of underwater noise from FCLPs and routine takeoffs show the unique characteristics of sound from Growlers compared to vessels, and that received levels from Growlers are likely to exceed those associated with a range of typical vessel noise (Figure 6a-c).

Comparison of Sound Levels and Flight Activity with Prior Studies
On September 13, 185 landings and overhead passes of aircraft at the north end of Ault Field occurred between 1100 and 1500. Of these, all but two (1 Boeing 737 and 1 DC-9) were EA-18G Growlers engaged in FCLPs. The majority of overhead passes were a single aircraft, but passes with up to three aircraft simultaneously were observed. Seventeen events of Growlers taking off to the south were audible but not visible. On September 16, 83 passes or landings were observed during the same time period; of these, three were Boeing 737 and 10 were P-3s. The remaining 70 events were Growlers, with a maximum of two aircraft observed at any one time; 13 events of Growlers taking off were also audible but not visible.  [31], 50% chance of nest flushing [32], and 50% reduction in the probability of prey detection and hunting strikes [33]; humans, 67 dBA = 50% probability of awakening at night [29] and increases in nighttime blood pressure [28]; harlequin duck, 80 dBA = reduced courtship and increased vigilance and agonism [34]; marbled murrelet, 92 dBA = risk of disturbance in nesting marbled murrelets [30]; caribou, 98 ASEL (A-weighted sound exposure level) = interrupted resting bouts and increased activity [35]. (Note: The threshold for caribou was reported in ASEL which likely overestimates RL (dBA).) Underwater: common murre, 110 dB re 1 µPa = startle response and interrupted feeding [37]; killer whales, 116 dB re 1 µPa = evading noise from small boats [36]; harbor porpoise, 133 dB re 1 µPa = 50% probability of startle response to low-and mid-frequency up/downsweeps [38]; herring, 137 dB re 1 µPa = startle response to recorded boat noise [39]; California sea lions, 150 dB re 1 µPa = 50% probability of avoidance of area with a simulated mid-frequency tactical sonar signal [40].
LTSAs of in-air recordings show the pattern of FCLPs as 30-60 min periods of rapid consecutive flights interspersed with shorter intervals of reduced or no flights ( Figure S1). Underwater noise was detected on multiple dates and time periods when FCLPs were scheduled, with the same pattern of clustered activity (Figure 6a,b). Visual contrasts of underwater noise from FCLPs and routine takeoffs show the unique characteristics of sound from Growlers compared to vessels, and that received levels from Growlers are likely to exceed those associated with a range of typical vessel noise (Figure 6a-c). Figure 5. The distribution of received levels (RL) for (a) 23 in-air overflight events and (b) 10 flight events recorded under water relative to thresholds known to cause behavioral and physiological responses in humans and representative suites of terrestrial and marine wildlife. In-air: owls, 60 dBA = physiological stress responses [31], 50% chance of nest flushing [32], and 50% reduction in the probability of prey detection and hunting strikes [33]; humans, 67 dBA = 50% probability of awakening at night [29] and increases in nighttime blood pressure [28]; harlequin duck, 80 dBA = reduced courtship and increased vigilance and agonism [34]; marbled murrelet, 92 dBA = risk of disturbance in nesting marbled murrelets [30]; caribou, 98 ASEL (A-weighted sound exposure level) = interrupted resting bouts and increased activity [35]. (Note: The threshold for caribou was reported in ASEL which likely overestimates RL (dBA).) Underwater: common murre, 110 dB re 1 µPa = startle response and interrupted feeding [37]; killer whales, 116 dB re 1 µPa = evading noise from small boats [36]; harbor porpoise, 133 dB re 1 µPa = 50% probability of startle response to low-and mid-frequency up/downsweeps [38]; herring, 137 dB re 1 µPa = startle response to recorded boat noise [39]; California sea lions, 150 dB re 1 µPa = 50% probability of avoidance of area with a simulated mid-frequency tactical sonar signal [40].

Comparison of Sound Levels and Flight Activity with Prior Studies
When the three metrics of maximum received level, daily number of events, and daily duration > 100 dBA were contrasted with those in previous studies that assessed impacts of MLAF, the combined sound levels and flight activity associated with FCLPs exceeded those in most other studies (Figure 7). In studies related to people, some documented louder maximum received levels, but with fewer events and cumulative daily durations (Figure 7a). Similarly, cumulative daily duration was substantially exceeded in only one previous study; however, the received levels were lower (110 vs. 118 dBA). Overall, the sound levels and flight activity we describe in this study bear the strongest similarity to the most extreme areas around airfields on Okinawa, which were measured opportunistically between 1968 and 1972, and then systematically in 1998 (Table S1a). Contrasts with studies for wildlife show that when all three metrics are considered, sound levels and flight activity at NASWI are largely incomparable to most prior studies (Figure 7b). The taxonomic groups that have been evaluated for impacts of MLAF include ungulates (caribou, sheep, deer, and horse), one species of raptor, four species of ducks, two rodents, and one reptile ( Growlers, with a maximum of two aircraft observed at any one time; 13 events of Growlers taking off were also audible but not visible. When the three metrics of maximum received level, daily number of events, and daily duration > 100 dBA were contrasted with those in previous studies that assessed impacts of MLAF, the combined sound levels and flight activity associated with FCLPs exceeded those in most other studies (Figure 7). In studies related to people, some documented louder maximum received levels, but with fewer events and cumulative daily durations (Figure 7a). Similarly, cumulative daily duration was substantially exceeded in only one previous study; however, the received levels were lower (110 vs. 118 dBA). Overall, the sound levels and flight activity we describe in this study bear the strongest similarity to the most extreme areas around airfields on Okinawa, which were measured opportunistically between 1968 and 1972, and then systematically in 1998 (Table S1a). Contrasts with studies for wildlife show that when all three metrics are considered, sound levels and flight activity at NASWI are largely incomparable to most prior studies (Figure 7b). The taxonomic groups that have been evaluated for impacts of MLAF include ungulates (caribou, sheep, deer, and horse), one species of raptor, four species of ducks, two rodents, and one reptile (Table S1b).    (Table S1) related to (a) people and communities and (b) wildlife. Colored symbols reflect (a) geographic region and (b) broad taxonomic group, with the daily number of events represented by the size of the symbol. Maximum received levels in studies were typically reported as A-weighted sound pressure levels (dBA) or dBA could be calculated, (*) with the exception of 3 data points from wildlife studies (black dot inside symbol) that exclusively reported either C-weighted sound pressure levels or A-weighted sound exposure level (Table S1); the exceptions were included as these metrics are expected to overestimate RL (dBA).

Discussion
In this study, we measured noise from an infrequently studied source of MLAF, operating in close proximity to residential sites, recreational areas, and habitat for multiple sensitive marine species. Our goal was to evaluate potential impacts on people and wildlife, using thresholds of response that have been established in previous studies. We measured sound both in air and under water and compared received levels with species-specific audiograms to demonstrate the extent to which noise from Growlers is perceived by sensitive wildlife. We also place the measured noise (i.e., received levels, total daily duration, and number of events) in the context of studies that have assessed impacts of MLAF on people and wildlife. By adopting this integrated approach, our study is uniquely positioned to illustrate knowledge gaps that can undermine assessment of noise impacts.
When we considered noise as a totality of received level, frequency of events, and total daily duration, the sound levels and flight events exceeded those in most previous studies. This finding is critical because it indicates that assessments of impact (e.g., the EIS) are, by definition, based largely on studies that have evaluated responses of people and wildlife to fewer and quieter MLAF events. To find where similar sound levels are experienced by people, we would have to turn to industrial and occupational noise studies, including those for military personnel (e.g., [42]). However, extrapolating from these studies to a community noise context is largely inappropriate given  (Table S1) related to (a) people and communities and (b) wildlife. Colored symbols reflect (a) geographic region and (b) broad taxonomic group, with the daily number of events represented by the size of the symbol. Maximum received levels in studies were typically reported as A-weighted sound pressure levels (dBA) or dBA could be calculated, (*) with the exception of 3 data points from wildlife studies (black dot inside symbol) that exclusively reported either C-weighted sound pressure levels or A-weighted sound exposure level (Table S1); the exceptions were included as these metrics are expected to overestimate RL (dBA).

Discussion
In this study, we measured noise from an infrequently studied source of MLAF, operating in close proximity to residential sites, recreational areas, and habitat for multiple sensitive marine species. Our goal was to evaluate potential impacts on people and wildlife, using thresholds of response that have been established in previous studies. We measured sound both in air and under water and compared received levels with species-specific audiograms to demonstrate the extent to which noise from Growlers is perceived by sensitive wildlife. We also place the measured noise (i.e., received levels, total daily duration, and number of events) in the context of studies that have assessed impacts of MLAF on people and wildlife. By adopting this integrated approach, our study is uniquely positioned to illustrate knowledge gaps that can undermine assessment of noise impacts.
When we considered noise as a totality of received level, frequency of events, and total daily duration, the sound levels and flight events exceeded those in most previous studies. This finding is critical because it indicates that assessments of impact (e.g., the EIS) are, by definition, based largely on studies that have evaluated responses of people and wildlife to fewer and quieter MLAF events. To find where similar sound levels are experienced by people, we would have to turn to industrial and occupational noise studies, including those for military personnel (e.g., [42]). However, extrapolating from these studies to a community noise context is largely inappropriate given differences in the type and duration of exposure as well as occupational regulations such as time exposure limits, use of hearing protection, and testing [43].
In our review of MLAF studies for people, we found that comparable community or environmental noise has been studied in only one other region of the world, on Okinawa Island, Japan. From World War II until 1998, Okinawa Island had 39 U.S. military facilities (today there are 28), including two major bases of Kadena Air Base and Futenma Air Station. Noise from aircraft was measured opportunistically around these bases in 1968 and 1972 (during the Vietnam War), but were not measured systematically until 1998, when a multi-year study evaluated consequences for health and well-being. The study was launched because at the time it was estimated that 38% of Okinawa's population were living in conditions that exceeded the national standards for exposure to aircraft noise [6]. It is notable that the sound levels and flight activity we document around Whidbey Island are similar to Okinawan measurements during the Vietnam War, prior to passage and adoption of national noise regulations by Japan's Environment Agency in 1973 and the Defense Facilities Administration Agency in 1980.
The same trend was apparent when we compared the maximum received level, number of events, and total daily duration with studies related to wildlife. Only one wildlife study, conducted in a laboratory, exceeded the total daily duration that we measured. Although some studies evaluated exposure to stronger maximum received levels, cumulative daily duration was less. For example, one of the most comprehensive assessments conducted by Goudie and Jones (2004) examined behavioral responses of harlequin ducks to MLAF, finding reduced courtship and increased agonism at a threshold of 80 dBA, with recovery requiring about two hours [34,44]. Although Goudie and Jones (2004) recorded a higher maximum sound pressure level, the typical number of daily events was just 3% of the number we document in the current study. This example illustrates the difficulty in assessing impacts of increased Growler training on area wildlife. We face not only general research limitations in how noise impacts communication, behavior, foraging, and ultimately fitness of wildlife [8,16] but an added burden of extrapolating to a number of events, received levels, and cumulative daily duration that is largely unstudied.
We considered carefully whether sampling decisions or assumptions made during analysis could have inflated the number of events, received levels, or cumulative daily duration. Recordings of in-air noise were done over two days, which could be considered non-representative (e.g., if an extreme number of events were recorded). However, when we compiled the FCLP schedule for the past 4 years, we found that our sampling days, with two published active time frames, represented typical and moderate training activity for a single day. The number of flight events and received levels we recorded were also consistent with previous monitoring of Growlers on Whidbey Island. Between 2013 and 2019, noise and events from FCLPs have been measured periodically at 12 locations around Coupeville OLF. The daily number of flight events associated with FCLPs ranged from 69 to 239, easily encompassing our calculated average of 127 events per training day [45][46][47][48]. Similarly, maximum sound levels in previous sampling ranged from 97 to 121 dBA (depending on distance from the flight track in use), a range that also encompasses our maximum of 118 dBA.
Knowledge of the consequences for health and well-being of people experiencing these numbers of flight events and cumulative daily duration of noise exposure is necessarily limited, given that similar conditions have been rarely studied. The investigations by the Okinawa Prefecture in the mid-1990s offer the best available information on implications for public health. One of these associated studies found that noise from aircraft around the Kadena airbase was hazardous and sufficient to cause hearing loss among the population as a whole [49], while an epidemiological study identified individuals with noise-induced hearing loss that was likely due to living in proximity to the base [6,50]. In a different region of the world, Finnish Air Force investigations found that two claims of hearing loss from MLAF events were plausible based on measured exposures [51]. A unique laboratory study examining temporary threshold shift and stapedius reflex period concluded that 114 dBA (below our measured maximum) was a critical threshold, where repeated exposure to military aircraft noise above this was likely to result in noise-induced hearing loss [52].
Studies have also documented consequences for cardiovascular health within the noise exposures that we measured. Clear dose-response relationships existed between blood pressure and aircraft noise surrounding Kadena and Futenma airbases, with noise-exposed groups exhibiting a 30% increase over control groups [6]; risk of hypertension due to noise exposure was highest for older age groups [53]. Laboratory studies have demonstrated short-term increases in blood pressure following exposures to military aircraft noise, with suggested response thresholds ranging from 90 [54] to 106 dBA [55]. Other aspects of human health and well-being including annoyance, sleep disturbance, resident dissatisfaction, and even low birth weights have been studied and associated with high-intensity exposure specifically from military aircraft [6,41,[56][57][58] as well as at lower levels of community noise from civilian aircraft and other sources [4,59]. As a result of reviewing these and other studies, in 2017 the Washington State Department of Health recommended that the U.S. Navy conduct a health impact assessment as part of the EIS process [60]. Our results confirm the need for such an assessment to occur.
The strength of the noise from flight events resulted in another critical finding from this study, where we document sound from Growlers 30 m below the sea surface, and at levels known to trigger behavioral changes for aquatic wildlife. Sound levels between the hydrophone and the surface may have been stronger than those we measured (though complex noise fields arise, particularly in shallow water (see, e.g., Figure 2i-l in Erbe et al. 2017) [61]. For Endangered SRKW, received levels were above those associated with changes in call amplitude [62,63] and avoidance or changes in behavior [36]. Other Salish Sea marine mammals that have been shown to react with avoidance or startle responses to low-mid frequency sound in this range include harbor porpoise [38,64,65], harbor seals (Phoca vitulina) [66], and gray whales (Eschrichtius robustus) [67]. At lower levels, communication masking has been demonstrated for bottlenose dolphins (Tursiops aduncus) [68], and is likely for other species such as humpback whales (Megaptera novaeangliae) [69]. Although the number of studies that have looked at behavioral responses of fish to noise is small, our received levels overlapped or exceeded thresholds for herring and some other marine fish including sea bass (Dicentrarchus labrax) [39,70]. Studies of how noise may be perceived by and impact seabirds underwater are almost nonexistent. In 2011, an expert panel was convened to establish underwater thresholds for injury to Threatened marbled murrelet from pile driving noise, but injury was defined to include only permanent loss of cochlear hair cells or barotrauma [71]. However, two very recent studies have demonstrated startle, avoidance, and changes in foraging of common murre and Gentoo penguins (Pygoscelis papua) at 105-115 dB re 1 µPa, suggesting that marbled murrelet and other seabirds around Whidbey Island may be impacted by underwater (and in-air) noise from jet aircraft [37,72].
Our results indicate underwater impacts that have been unstudied, underestimated, or otherwise dismissed in the two relevant EIS [17,73] and corresponding Biological Opinion(s) for ESA-listed species [19,30]. Of chief concern in this region is SRKW. In the EIS, underwater noise from aircraft were deemed unlikely to adversely affect SRKW (and humpback whales), and to have no effect on critical habitat. The rationale for this conclusion included assertions that whales would have to be at the surface of the water and directly underneath low-altitude aircraft (<300 m), and that whales were already exposed to boat and ship noise that could "drown out or lessen" any noise from aircraft. Our results indicate instead that noise from Growlers is measurable at least 30 m under water, with sound levels known to impact whales. Furthermore, these sound levels are comparable to those documented by studies of noise that is experienced by SRKW from small and large vessels [74,75]. The reason that no effects on SRKW critical habitat are assumed is not due to evaluation of noise impacts, but rather to an exemption of waters within the boundaries of military installations from critical habitat designation (see Appendix C, Sections 4.1 and 3.2.5 in [17]). Lastly, the rationale for not considering impacts of aircraft flying higher than 300 m is based on an agreement with the National Marine Fisheries Service in 2015 to assume that underwater noise from any event where aircraft exceed this altitude will cause no reaction in marine mammals (see Appendix C and Section 4.1 in [17]). This is despite the fact that modeled underwater noise for aircraft at altitudes of 300-3000 m is 128-152 dB re 1 µPa (see Table 3.0-4 in [73]), exceeding known thresholds for behavioral reactions and adverse impacts on marine mammals, including SRKW ( Figure 5). A recent synthesis of underwater noise and vessel disturbance on SRKW by the Washington State Academy of Sciences recommended "defining every interaction with an SRKW as an opportunity to disturb a whale", due to the "fragile condition" of the population [76]. Collectively, we believe that our results create a case for revisiting these impact assessments as well as future inclusion of military aircraft noise in cumulative effects models for SRKW [77].
Evaluating the EIS process against our results further illuminates a problem wherein risk from noise effects is calculated based on the likelihood that individuals (e.g., an individual SRKW or marbled murrelet group) will be exposed to sound levels that result in physical damage (i.e., hearing damage or barotrauma) or direct changes in behavior such as foraging, breeding, or nesting. This does not account for the problem that the habitat itself is being impacted by noise, and becomes less hospitable [78], nor that noise may be added to other stressors [79,80]. In this scenario, the very rarity of the species becomes a factor in assuming impacts are discountable, negligible, or insignificant (see Appendix C and Section 4.1 in [17]). Our purpose in outlining these inconsistencies in environmental impact assessment is not to point fingers at federal oversight agencies or the U.S. military, but to exemplify how knowledge gaps [81], exemptions [5], and use of high noise thresholds for harm intersect to discount or underestimate noise impacts on wildlife, which are increasingly understood to include indirect effects on habitat, abundance and fitness of populations [16,82].
The above challenges are part of an evolving understanding of how to evaluate and mitigate growing noise pollution worldwide. However, our study reveals added challenges specific to noise from MLAF, which is a scarcity of studies resulting in large knowledge gaps with respect to impact. There are substantial logistical and bureaucratic hurdles in monitoring military operations; these have been pointed out in reviews [3,9] as well as experienced by the authors of this study. In particular, the fact that schedules are usually not available in advance and access to operational areas may be restricted increases time and costs of doing these studies. Despite the challenges, there is a strong need to close knowledge gaps, as increased noise from MLAF is predicted to become more common in the future due to base consolidation [2]. Other countries (i.e., Finland and Australia) have recently adopted the Growler platform and may find similar issues in locating training facilities. The problem is not limited to Growler aircraft; the new F-35 is also causing similar concerns and discomfort in areas around airfields [83,84]. And while the trend within the U.S. is toward consolidation, the building of new bases and the expansion of military aircraft activity continues worldwide [41,58,85].
In summary, our study suggests the need for underwater noise from Growlers to be included in cumulative effects models [77] and Biological Opinions for ESA-listed species [19,30], as well as more broadly evaluated outside of the immediate vicinity of Whidbey Island. Furthermore, our results show that sound levels and flight operations around NASWI are largely beyond those that have been previously evaluated, supporting calls for a comprehensive health assessment to evaluate consequences for human health and well-being [60]. Finally, we hope that this study stimulates consideration of how to evaluate impacts of intense noise exposure not only for the benefit of this region, but other areas that may face similar challenges now and in the future.
Supplementary Materials: The following are available online at http://www.mdpi.com/2077-1312/8/11/923/s1, Figure S1: LTSAs of in-air recordings, Table S1: Noise metrics and attributes extracted from MLAF studies, Table S2: Summary of FCLP days and time periods, 2015-2019, Data S1: Results of MLAF literature review. Funding: Multiple small grants and sources of funding made this work possible. The National Parks Conservation Association contributed funds and led a campaign to allow individual donors to contribute to the project. A project grant was also awarded from The Suquamish Foundation (Appendix X Award 2018Q226). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.