1. Introduction
Noise pollution, identified by the World Health Organization (WHO) as a significant environmental hazard, has emerged as a critical factor in environmental monitoring and public health evaluation [
1,
2]. Research has linked extended exposure to high noise levels to cardiovascular illness, cognitive decline, and persistent discomfort. Aircraft noise is recognized as a notably substantial factor in neighborhoods adjacent to major airports [
3,
4,
5]. Among transportation noise sources, aircraft noise stands out due to its elevated sound pressure levels, a wide frequency spectrum, and significant non-stationary characteristics during flyover occurrences. The continually shifting spectral–temporal structure presents substantial barriers to precise characterization in unattended long-term airport noise monitoring systems. Recent studies have further shown that reliable aircraft-noise characterization in operational environments requires explicit linkage between measured acoustic data and flight-tracking information, particularly when multiple aircraft types and complex traffic conditions coexist [
6,
7,
8].
Regulatory frameworks and policy evaluations increasingly highlight the significance of airport noise monitoring systems in evidence-based noise management and operational decision-making. EU Regulation 598/2014 stipulates noise-related operational restrictions at major airports and facilitates noise modeling via the Aircraft Noise and Performance (ANP) database, whereas the FAA’s 2021 Noise Policy Review underscores the growing dependence on real-time monitoring data in contemporary noise management practices [
9,
10,
11,
12]. Recent airport-focused studies also indicate that monitoring systems are increasingly being used not only for compliance reporting, but also for operational reconstruction, exposure mapping, and model validation under realistic traffic and meteorological variability [
13,
14].
In operational practice, airport noise monitoring systems primarily report regulatory energy-based indicators such as A-weighted sound level (
) and single-event exposure level (
) [
15,
16]. While these metrics quantify overall noise intensity, they are limited in representing the spectral–temporal characteristics of broadband, non-stationary acoustic signals [
17]. Traditional metrics prove inadequate for analyzing the refined spectral structure improvements delivered by modern high-bypass-ratio turbofan engines under real monitoring conditions. In practical monitoring scenarios, conventional energy-based metrics can provide a coarse separation between aircraft of different technological generations; however, their limited resolution results in overlapping energy distributions that obscure finer spectral and temporal differences in aircraft noise [
18,
19]. Merino- Martínez et al. [
19] demonstrated through field measurements at Amsterdam Airport Schiphol that psychoacoustic metrics such as loudness, tonality, and psychoacoustic annoyance revealed significant differences between the Airbus A320 ceo and neo variants that were not captured by conventional energy indicators, underscoring the limitations of energy-centric evaluation for differentiating aircraft with similar certification noise levels. More recent perceptual studies have likewise emphasized that annoyance and quality judgments for single flyover events are better captured when psychoacoustic descriptors are considered alongside conventional level-based indicators [
20,
21].
Psychoacoustic models based on critical-band processing and auditory masking mechanisms provide a perceptually based alternative for defining sound quality beyond energy summaries. Metrics include loudness, sharpness, roughness, fluctuation strength, and tonality effectively encapsulate aspects of auditory experience that are relevant to noise annoyance but are not preserved in scalar energy indicators [
22,
23]. More recently, Lotinga et al. developed psychoacoustic prediction models for noise annoyance responses to unmanned aircraft systems (UAS), demonstrating that tonality and sharpness serve as significant second-order predictors beyond overall loudness [
23]. In the domain of conventional aircraft, Soeta and Kagawa established a three-dimensional psychological evaluation structure relating aircraft noise perception to physical parameters [
12]. Felix Greco et al. applied sound quality assessment to medium-range aircraft with enhanced fan-noise shielding, revealing perceptual improvements not fully reflected in certification-level metrics [
18]. These studies collectively demonstrate the added value of psychoacoustic characterization; however, their application has been predominantly limited to laboratory simulations or short-duration field campaigns with controlled conditions. However, the stability and interpretability of these perceptual descriptors under continuous, unattended monitoring conditions—where multi-aircraft interference, trajectory ambiguity, and environmental contamination are prevalent—remain insufficiently examined [
19,
24,
25,
26].
Recent advances in sensor-based environmental noise monitoring—including IoT-enabled wireless acoustic sensor networks (WASNs), edge computing platforms, and multi-source data fusion architectures—have demonstrated the technical feasibility of continuous, large-scale noise data collection [
27,
28]. In the specific context of airport noise, Liu et al. proposed an IoT-based framework for airport noise perception monitoring combining multi-source data fusion with spatial distribution modeling [
29]. Fu et al. developed a hybrid CNN–BiLSTM–Attention model integrating meteorological and trajectory data for aircraft noise level prediction [
30]. Furthermore, psychoacoustic monitoring approaches using machine learning have been applied to building acoustic environments [
31]. Ascari et al. examined the influence of traffic input data quality on noise estimation accuracy, illustrating the critical importance of data validation in sensor-based monitoring pipelines—a challenge directly relevant to the airport noise monitoring context [
32]. Despite these advances in sensing infrastructure and predictive modeling, most systems remain focused on energy-level acquisition and do not extend the processing chain to perceptual-level feature extraction.
However, a systematic framework that integrates spatiotemporal data fusion, multidimensional data quality control, psychoacoustic feature extraction, and dimensionality reduction visualization within a continuous aircraft noise monitoring pipeline has not been established. Existing sensor-based monitoring studies primarily focus on energy-level acquisition and event detection, without extending the processing chain to include perceptual descriptors [
33]. Conversely, psychoacoustic studies of aircraft noise typically operate on curated laboratory recordings or small-sample field measurements, without addressing the data reliability challenges inherent in continuous operational environments—such as multi-aircraft interference, ADS-B trajectory ambiguity, and environmental outlier contamination. Furthermore, existing research has not systematically examined the decoupling between aircraft physical design parameters (e.g., bypass ratio, maximum thrust, fan diameter) and perceptual feature dimensions under real monitoring conditions, leaving unexplained why aircraft noise events with similar energy levels can exhibit markedly different spectral and perceptual characteristics.
To address the limitations of conventional energy-based indicators in resolving the spectral–temporal structure of aircraft noise, this study proposes a continuous monitoring-oriented sound quality analysis pipeline built upon long-term, real-world airport data. Unlike previous psychoacoustic aircraft studies that primarily relied on laboratory recordings or short-term controlled field measurements, the proposed framework is specifically designed for unattended operational monitoring environments. To ensure methodological robustness under real-world conditions, a spatiotemporal association strategy integrating time-window alignment and spatial vector consistency verification is developed to link acoustic events with ADS-B flight trajectories. In contrast to conventional threshold-based event detection approaches, the proposed cascaded screening mechanism jointly evaluates acoustic features and trajectory plausibility, reducing multi-aircraft interference and environmental contamination prior to perceptual feature extraction.
Furthermore, rather than applying psychoacoustic metrics as post hoc descriptors of isolated events, the framework embeds perceptually grounded features within a data reliability-controlled monitoring pipeline, enabling aircraft-type-level comparative analysis under comparable exposure constraints. By explicitly examining the statistical dispersion and separability between traditional energy indicators and multidimensional psychoacoustic descriptors in continuous operational data, this study moves beyond laboratory-based sound quality evaluation and establishes a monitoring-compatible perceptual characterization approach for aircraft noise.
3. Results
This section presents the analysis results derived from the validated aircraft noise event dataset obtained through the continuous monitoring system. The objective is to examine how different feature representations describe aircraft noise characteristics under operational conditions.
The results are organized into several aspects, including dataset consistency verification, comparative behavior between conventional energy-based metrics and psychoacoustic features, variability under similar energy levels, and the distribution patterns of aircraft-related acoustic attributes in the multidimensional feature space. These analyses aim to evaluate the descriptive capability of the adopted feature framework and its effectiveness in differentiating aircraft models.
3.1. Noise Event Matching and Dataset Construction
3.1.1. Spatiotemporal Coupling Matching for Heterogeneous Data
In continuous aircraft noise monitoring, reliable association between acoustic events and flight trajectories is essential for constructing a valid analysis dataset. Following the spatiotemporal matching strategy described in
Section 2.2, the algorithm was applied to data collected in a high-density operational environment.
During the monitoring period, a total of 1465 acoustic noise event files were recorded by the terminal. A unified time reference was established through the combined use of NTP synchronization and GPS timing, minimizing clock drift among sensing components. Because the acoustic terminal and ADS-B receiver transmit data through independent communication paths, timestamp offsets remain unavoidable. Therefore, each detected event with trigger time defined a symmetric search window of ±60 s, within which candidate trajectories were retrieved from the ADS-B database.
Using this temporal criterion, 1271 noise events were successfully associated with at least one potential aircraft trajectory, forming the initial temporally matched dataset. The remaining 194 events had no corresponding ADS-B trajectory within the defined time window and the predefined geographic query range centered on the monitoring site. In other words, no flight track data were available within the relevant spatiotemporal domain for these events, indicating that no aircraft trajectory was present within the effective matching range during the corresponding acoustic trigger interval.
However, temporal agreement alone cannot fully remove interference from aircraft operating in adjacent airspace, such as parallel runway approaches or non-target overflights. Therefore, spatial verification based on low-altitude flight segments (H < 300 m) was further performed. The motion direction of each candidate track was compared with the physical orientation of the runway. When the heading deviation satisfied , the association was accepted; otherwise, the event was rejected.
After applying this geometric filtering, the number of valid samples was reduced from 1271 to 1078, indicating that the spatial constraint effectively removed false associations while preserving the majority of relevant events.
To assess the robustness of the spatiotemporal matching strategy, a sensitivity analysis was conducted by varying the temporal window (±30 s, ±45 s, ±60 s, ±90 s) and the heading deviation threshold (15°, 30°, 45°). Using the manually verified set of 1056 aircraft events as reference, each configuration was evaluated in terms of precision and recall (
Table 1).
Restricting the temporal window to ±30 s resulted in reduced recall (79.55–90.91%), indicating under-coverage of valid overflight events despite high precision (≈95–96%). Expanding the window to ±45 s substantially improved recall (up to 97.54%) while maintaining precision above 94%. Under the ±60 s condition, the 30° angular threshold achieved full recall (100%) with high precision (97.96%), corresponding to only 22 false positives. Tightening the angular constraint reduced recall, whereas relaxing it increased false associations. When the window was extended to ±90 s, recall remained high but precision declined (down to 90.26% at 45°), reflecting increased incidental temporal overlap.
Overall, the ±60 s and 30° configuration provides the most balanced trade-off, preserving all 1056 validated events while limiting false matches. This setting was therefore adopted as the optimal spatiotemporal matching parameter combination in this study.
After manual verification to remove environmentally contaminated or trajectory-inconsistent cases, the final dataset consisted of 1056 validated aircraft overflight events, demonstrating strong consistency between the acoustic monitoring system and ADS-B trajectory data.
3.1.2. Outlier Removal
Among the 1078 spatially validated events, several aircraft categories contained only a small number of observations, which limited the reliability of statistical comparisons. To ensure stability in subsequent analyses, a minimum sample size requirement of was applied at the aircraft-type level.
After this screening, 15 major aircraft categories were retained, resulting in a working dataset of 738 events. The retained set covers the primary fleet operating in the monitored airspace and provides sufficient representation for comparative evaluation.
To further reduce the influence of abnormal measurements and environmental fluctuations, a model-specific multidimensional IQR filtering procedure was conducted. Within each aircraft category, statistical limits were computed for key metrics including
,
and
Figure 2a presents the distribution of samples in the feature space defined by these indicators. Observations exceeding the upper bound
were removed, as illustrated in
Figure 2b. After applying these steps, the remaining dataset exhibited improved compactness and consistency, providing a reliable basis for the subsequent comparative and structural analyses.
As shown in
Figure 2a, the rejected samples (red markers) are mainly distributed around the outer boundary of the main data cluster, indicating that these observations deviate from the dominant statistical structure of aircraft noise events.
Figure 2b summarizes the contribution of each acoustic metric to the outlier identification process. Among them,
shows the highest sensitivity, followed by
and
. This suggests that instantaneous peak levels are more likely to be influenced by sporadic disturbances, supporting the use of multidimensional constraints in improving dataset stability.
The overall effectiveness of the filtering strategy is further illustrated in
Figure 2c. By overlaying the excluded observations onto the model-wise distributions, it can be seen that abnormal points are removed while the internal statistical spread of each aircraft category remains largely preserved.
After this procedure, 53 records were excluded, leaving 685 valid samples for subsequent analysis. Detailed aircraft category information is summarized in
Table 2.
3.2. Associations Between Aircraft Configuration Parameters and Acoustic Metrics
To examine how aircraft configuration variables relate to acoustic observations, an association analysis was conducted between physical parameters and both traditional energy metrics and psychoacoustic descriptors.
Spearman’s rank correlation coefficient was adopted to quantify monotonic relationships, and only statistically significant links (p < 0.05) were retained. The analyzed parameters include bypass ratio, thrust, fuselage length, wingspan, maximum takeoff weight, and aircraft age.
Rather than implying causality, this analysis provides a comparative view of how different acoustic metrics respond to variations in aircraft characteristics, thereby offering contextual support for the necessity of multidimensional feature representations.
As illustrated in
Figure 3, the association results are presented as a correlation network. Nodes on the left correspond to aircraft configuration parameters, including thrust, bypass ratio, fuselage length, wingspan, maximum takeoff weight, and age. Nodes on the right represent acoustic descriptors, arranged from energy-based metrics toward psychoacoustic features.
The color and thickness of the connecting lines encode the sign and magnitude of Spearman’s correlation coefficient (), where red denotes positive association and blue denotes negative association. Bubble size and shading reflect statistical significance, with darker tones indicating lower p-values. The color bar summarizes the correlation range from −1 to +1.
The map reveals heterogeneous response patterns: some metrics show relatively strong associations with certain aircraft parameters, whereas others remain weakly related. This diversity indicates that different descriptors capture complementary aspects of aircraft noise, motivating the inclusion of psychoacoustic dimensions in addition to energy measures.
As illustrated in
Figure 3, aircraft configuration parameters exhibit clear associations with traditional energy-based metrics, while their relationships with psychoacoustic descriptors appear substantially weaker.
The bypass ratio shows consistent negative correlations with energy quantities such as , , and . These links are among the strongest observed in the network, suggesting that propulsion architecture remains a primary factor influencing the overall magnitude of radiated acoustic energy. In contrast, connections between bypass ratio and perceptual metrics (e.g., roughness, sharpness, tonality) are sparse and generally weak, implying that reductions in total sound energy do not necessarily translate into proportional changes in perceived sound character.
A different pattern is observed for thrust, maximum takeoff weight, and fuselage length. These parameters tend to show positive associations with energy indicators, reflecting the intuitive tendency for larger or more powerful aircraft to generate higher absolute sound levels. However, similar to the bypass ratio, their relationships with psychoacoustic quantities are limited in both density and magnitude. Only occasional weak correlations are observed for certain descriptors, and no uniform trend emerges across perceptual dimensions.
Wingspan and aircraft age demonstrate comparatively modest relationships with both metric families. Although some links with energy indicators are present, their explanatory contribution is secondary relative to propulsion-related parameters. In the perceptual domain, most correlations remain weak or statistically marginal.
Overall, a notable structural contrast emerges: physical parameters align more strongly with sound energy than with sound quality.
This mismatch indicates that while macroscopic design variables are effective predictors of “how loud” an aircraft is, they provide incomplete information regarding how the sound is perceived. Psychoacoustic metrics, which depend on spectral distribution, temporal modulation, and auditory filtering mechanisms, appear to capture dimensions that are not linearly governed by global geometric or performance parameters.
The collinearity matrix on the right further supports this interpretation. Strong dependencies are observed within different statistical representations of the same attribute, whereas cross-category coupling between energy and perceptual descriptors is comparatively limited. Such a structure suggests that energy and perception occupy related but non-identical subspaces of the acoustic feature domain.
These observations provide quantitative motivation for extending aircraft noise evaluation beyond single-number energy metrics toward multidimensional perceptual representations.
3.3. Comparative Analysis of Traditional Energy Metrics and Acoustic Quality Metrics
Following the construction of a validated dataset and the extraction of multidimensional acoustic descriptors, this section examines how different metric systems organize aircraft noise information in feature space.
Conventional assessment frameworks predominantly rely on energy-related quantities such as maximum sound level, exposure level, and integrated sound pressure measures. These indicators provide compact summaries of acoustic magnitude and are widely used in regulatory and operational contexts. However, the extent to which such scalar descriptors preserve the structural variability embedded in complex, non-stationary aircraft noise remains an open question.
To explore this issue, a manifold-learning approach based on Uniform Manifold Approximation and Projection (UMAP) is employed. The objective is not dimensionality reduction per se, but the visualization of intrinsic data geometry under different metric representations.
Two feature domains are constructed:
An energy-metric space, composed of , , and ;
A psychoacoustic feature space, containing perceptually motivated descriptors derived from auditory modeling.
By projecting each space onto a two-dimensional manifold, the method enables direct comparison of clustering compactness, overlap patterns, and relative separability among aircraft samples.
If a metric system captures meaningful structural differences, clearer aggregation patterns and more stable neighborhood relationships are expected to emerge in the embedded space. Conversely, strong overlap would suggest limited discriminative capacity.
The comparative visualization results are presented in
Figure 4.
As shown in
Figure 4, the manifold derived from traditional energy metrics was obtained using UMAP (n_neighbors = 15, min_dist = 0.1, Euclidean metric, fixed random seed). The embedding exhibits a dominant global gradient associated with overall sound level, while aircraft from different technical families remain largely intermixed.
Quantitative analysis confirms this structural concentration. As summarized in
Table 3, the first two principal components of the energy descriptors explain 98.38% of the total variance, and the participation ratio dimension is only 1.20. This indicates that the energy feature space is effectively constrained to a near one-dimensional manifold dominated by amplitude scaling, with limited preservation of higher-order spectral or temporal variability.
In contrast, the psychoacoustic manifold in
Figure 4 displays a more diversified spatial organization. Although local coherence is preserved, the data unfold along multiple independent directions.
Table 3 shows that the first two principal components account for only 54.98% of the variance, and the participation ratio dimension increases to 4.62, indicating substantially higher intrinsic dimensionality. These results suggest that psychoacoustic descriptors retain additional perceptual degrees of freedom related to spectral balance, modulation, and masking behavior.
To verify robustness against hyperparameter choices, n_neighbors (10–30) and min_dist (0.05–0.2) were systematically varied. Neighborhood stability was evaluated using mean Jaccard similarity of k-nearest neighbor sets. As reported in
Table 4, the energy embedding achieves 0.766 ± 0.027, while the psychoacoustic embedding yields 0.706 ± 0.033. Both values indicate stable topology across parameter configurations. The slightly lower stability for psychoacoustic features is consistent with their higher intrinsic dimensionality, reflecting increased structural flexibility rather than instability. Full sweep results are provided in
Supplementary Table S1.
Overall, the combined intrinsic dimensionality and robustness analyses demonstrate that psychoacoustic descriptors reorganize the dataset along multiple effective axes while maintaining stable manifold structure, providing a structurally richer representation than scalar energy metrics.
3.4. Experimental Comparison of Perceptual Feature Divergence Characteristics Under Equal-Energy Conditions
Following the manifold observations, this section formally examines whether psychoacoustic descriptors retain variability under equal-energy constraints. Conventional assessment frameworks frequently rely on integrated quantities such as the sound exposure level , implicitly assuming that events with comparable energy produce comparable auditory impressions. To evaluate this assumption, an equal-energy constrained analysis was performed.
The dataset was partitioned into three narrow
intervals, thereby limiting energy variability to a minimal range. For the representative B737-89L model (
Figure 5), each bin contains 23–56 events. As summarized in
Table 5, the residual dispersion of the controlling energy metric remains low, with coefficients of variation between 0.60% and 0.66%, confirming effective energy control within each interval.
Despite this stringent constraint, several psychoacoustic descriptors exhibit substantially larger variability. Fluctuation strength shows coefficients of variation between 42.3% and 44.7% (95% bootstrap CI approximately 31–56%), while tonality ranges from 20.8% to 23.3% (95% CI approximately 11–28%). Loudness and roughness remain between 10% and 18%, consistently exceeding the residual variation of
. The detailed dispersion values, confidence intervals, and sample sizes for each bin are reported in
Table 5.
To quantify relative sensitivity, we define Gain = CV_metric/CV_LSE. Gain ranges from approximately 11× (sharpness) to over 74× (fluctuation strength). Bootstrap ratio testing confirms that Gain is significantly greater than unity in all cases (p < 0.001), indicating that perceptual dispersion systematically exceeds residual energy dispersion under equal-energy control.
These results demonstrate that aircraft noise events with nearly identical integrated energy may still differ considerably in their fine temporal and spectral characteristics. Rather than contradicting energy-based evaluation, this finding highlights that psychoacoustic descriptors preserve complementary variability that is compressed when signals are reduced to a single scalar energy level.
To examine whether the divergence observed in the single-model analysis persists at a broader scale, the study further extends the equal-energy constrained evaluation to the aggregated dataset comprising multiple aircraft categories, as presented in
Figure 6.
As the sample size increases to the full dataset, the empirical distributions of psychoacoustic metrics become statistically smoother, and extreme values appear less pronounced. Nevertheless, substantial dispersion remains evident across all perceptual descriptors.
Table 6 summarizes the coefficient of variation (CV), bootstrap-based 95% confidence intervals, and dispersion gain (CV_metric/CV_LSE) under equal-energy constraints. Across the three energy bins (Low:
n = 73; Mid:
n = 407; High:
n = 195), the residual variability of the controlling energy metric L_SE remains low (CV_LSE ≈ 0.87–1.24%).
In contrast, psychoacoustic descriptors exhibit markedly higher dispersion. For example:
Fluctuation Strength shows CV values of 43.2% (Low), 44.2% (Mid), and 46.5% (High), corresponding to gain factors of 34.8×, 36.4×, and 53.8×, respectively.
Loudness maintains CV values between 17.8% and 32.7%, with gain ranging from 14.8× to 26.3×.
Tonality presents gains between 21.0× and 30.4×.
Even the most stable descriptor, Sharpness, yields gains between 8.6× and 9.5×.
Importantly, the 95% bootstrap confidence intervals of Gain do not overlap unity in any case. For instance, Fluctuation Strength (High bin) yields a gain of 53.8× with a 95% CI of [46.8×, 61.9×], while Loudness (Mid bin) produces a gain of 14.8× with a 95% CI of [12.8×, 16.7×]. One-sided bootstrap hypothesis tests (H0: Gain ≤ 1) return p < 0.001 for all metrics.
These results confirm that perceptual dispersion remains statistically and practically significant even under strict equal-energy constraints. Although averaging across heterogeneous aircraft types reduces absolute CV values compared with the single-aircraft case, the relative divergence between perceptual metrics and energy remains robust.
Overall, the persistence of dispersion gain across all bins demonstrates that auditory-model-based descriptors retain dimensions of variation that are systematically compressed when aircraft noise events are represented solely by integrated energy quantities.
To evaluate robustness against outlier filtering, the multi-dimensional IQR rejection factor was varied from 0.5× to 2.0×, resulting in retained sample sizes ranging from 487 to 695 events. As shown in
Table 7, the dispersion gain of psychoacoustic metrics remained consistently large across all configurations.
The minimum gain observed among all descriptors and energy bins ranged from 8.24× (IQR = 0.5×) to 8.81× (IQR = 1.5×). Importantly, the minimum lower bound of the 95% bootstrap confidence interval remained between 7.13× and 7.83× across all IQR settings. In every case, the confidence intervals did not cross unity, and all one-sided bootstrap tests rejected the null hypothesis Gain ≤ 1 (p < 0.001).
Although stricter IQR filtering reduces sample size, the magnitude and statistical significance of dispersion gain remain stable. These results confirm that the observed perceptual variability is not driven by specific outlier rejection choices but represents an intrinsic structural property of the dataset.
3.5. Temporal Consistency and Divergence Between SPL and Sound Quality Metrics
Figure 7 presents the temporal evolution of several psychoacoustic descriptors during a representative flyover event and compares them with the corresponding sound pressure level (SPL) profile.
At a global scale, loudness generally follows the overall trend of SPL, as reflected by the high correlation coefficient (). This behavior is expected because loudness is strongly influenced by total acoustic magnitude. Nevertheless, minor deviations can still be observed around the rising and falling edges of the event, indicating that spectral distribution and auditory weighting also contribute to its dynamics.
In contrast, roughness shows a weaker correspondence with the SPL contour (). Although the metric increases during the approach phase, its decline occurs earlier than the SPL peak, suggesting that modulation-related characteristics evolve differently from energy accumulation.
Sharper discrepancies appear in sharpness and tonality. For sharpness (), elevated values occur in later segments of the event where the SPL has already decreased. This indicates a shift in spectral emphasis toward higher-frequency components rather than simple dependence on overall level.
Tonality () exhibits the most pronounced independence from the energy envelope. Distinct peaks emerge during the trailing phase, implying that tonal prominence may become more noticeable even as total intensity diminishes.
Taken together, these observations suggest that while some perceptual descriptors retain partial correlation with energy variation, they also reveal temporal structures that are not strictly governed by the SPL trajectory. This intra-event behavior aligns with the statistical findings presented earlier, where perceptual variability persists under constrained energy conditions.
4. Discussion
The results obtained in this study provide several complementary perspectives on how aircraft noise characteristics are represented under continuous monitoring conditions. By integrating spatiotemporal event matching, stratified data validation, manifold visualization, and equal-energy constrained analysis, the findings collectively illustrate the structural differences between traditional energy-based descriptors and perceptually motivated acoustic features.
4.1. Interpretation of Energy-Perception Representation Differences
Energy-based metrics such as , , and related exposure indicators remain effective descriptors for representing the overall magnitude of acoustic emissions. Their statistical association with aircraft configuration parameters—particularly thrust and bypass ratio—confirms that these indicators largely reflect macroscopic physical characteristics of propulsion systems. In practical monitoring systems, such scalar quantities therefore provide a stable basis for regulatory reporting and long-term environmental noise statistics.
However, the analyses performed in this study indicate that energy descriptors capture only a limited subset of the acoustic variability present in operational aircraft noise. When events are examined under constrained energy intervals, perceptual descriptors continue to exhibit substantial dispersion. The equal-energy analysis shows that the coefficient of variation of psychoacoustic metrics exceeds that of by one order of magnitude or more, with dispersion gains ranging from approximately 8× to over 50× depending on the metric. These results demonstrate that signals with nearly identical integrated energy can still exhibit substantial differences in spectral balance, temporal modulation, and tonal structure.
Such divergence highlights a fundamental limitation of scalar energy representations. Energy integration compresses the multidimensional spectral–temporal structure of aircraft noise into a single amplitude-dominated parameter. As a result, variations associated with modulation behavior, harmonic prominence, and auditory masking processes are largely suppressed.
4.2. Mechanisms Behind the Observed Physical–Perceptual Decoupling
A notable observation in the correlation analysis is the weak relationship between aircraft configuration parameters and psychoacoustic descriptors. While parameters such as bypass ratio show consistent correlations with energy metrics, their associations with perceptual quantities—including roughness, sharpness, and tonality—remain sparse and comparatively weak.
This apparent decoupling does not imply that aircraft design has no influence on perceptual sound characteristics. Instead, it reflects the multi-stage transformation between source generation, propagation, and auditory perception.
First, propulsion configuration primarily determines the total acoustic power radiated by the aircraft. Parameters such as thrust level, fan diameter, and bypass ratio directly influence jet turbulence noise and fan tonal components, which explains their strong relationship with integrated energy indicators.
Second, the spectral and modulation structure perceived at the monitoring point is shaped by several additional processes that are not directly encoded in global aircraft design variables. These include:
Propagation effects, such as Doppler shift, atmospheric absorption, and ground reflections, which dynamically reshape spectral energy distribution during the flyover.
Operational variability, including thrust setting, climb angle, and flight trajectory relative to the monitoring station.
Temporal modulation mechanisms, arising from rotor–stator interaction tones and aerodynamic turbulence fluctuations.
Because psychoacoustic metrics are derived from auditory models incorporating critical-band filtering, temporal integration, and masking effects, they are sensitive to these fine structural variations. Consequently, perceptual descriptors reflect a combination of source physics and propagation-induced modulation processes rather than simple scaling with aircraft design parameters.
This interpretation is consistent with the manifold visualization results. The UMAP embeddings demonstrate that energy descriptors organize the dataset along a dominant amplitude gradient, whereas psychoacoustic features reveal additional orthogonal axes of variation associated with spectral and modulation characteristics. These additional dimensions correspond to perceptual attributes that are not linearly governed by macroscopic aircraft configuration parameters.
4.3. Practical Deployment and Computational Considerations for Continuous Monitoring
Beyond the analytical differences between energy-based indicators and psychoacoustic descriptors, an important question concerns the practical feasibility of deploying such perceptual feature extraction within continuous aircraft noise monitoring systems. Because unattended monitoring terminals operate for extended periods and must process large volumes of acoustic data, computational efficiency and storage requirements become critical considerations for real-world implementation.
In the proposed framework, psychoacoustic feature extraction is based on standard auditory modeling procedures, including short-time spectral analysis, critical-band filtering, and temporal modulation analysis. These operations rely primarily on fast Fourier transform (FFT) processing and filterbank operations, which scale approximately linearly with signal length. For typical aircraft noise events with durations of approximately 40–50 s and a sampling rate of 48 kHz, the computational workload remains moderate and can be efficiently handled by embedded computing platforms commonly used in environmental monitoring systems. In practical tests, the processing time required to compute the complete set of psychoacoustic descriptors is in the order of seconds on standard single-board computers, indicating that the computational burden is compatible with operational monitoring pipelines.
Furthermore, continuous psychoacoustic analysis does not necessarily require strict real-time processing of the entire audio stream. In ISO 20906-compliant monitoring systems, acoustic data processing is typically event-driven, meaning that detailed feature extraction is triggered only after a valid aircraft noise event has been detected. Because the temporal spacing between consecutive aircraft movements in typical runway operations often exceeds several tens of seconds, sufficient processing time exists between events to perform psychoacoustic calculations without interfering with the acquisition of subsequent measurements. This event-based processing strategy significantly reduces the effective computational load compared with continuous full-stream analysis.
Data storage requirements also remain manageable within this framework. Rather than storing high-resolution waveform data for long-term analysis, the monitoring pipeline retains compact feature descriptors derived from the acoustic signals. The dimensionality of the psychoacoustic feature vectors is several orders of magnitude smaller than that of the original audio recordings, enabling efficient storage and long-term archival of monitoring data. In addition, the reduced data volume facilitates large-scale statistical analysis and cross-event comparisons without imposing excessive storage overhead.
Taken together, these considerations indicate that the integration of psychoacoustic feature extraction into continuous aircraft noise monitoring systems is computationally feasible and operationally practical. The proposed framework can be incorporated into existing monitoring infrastructures with limited additional processing or storage requirements, while providing enhanced analytical resolution for characterizing aircraft noise beyond conventional energy-based indicators. Future work may further investigate which quantitative descriptors are most suitable for real-time monitoring applications, with particular emphasis on identifying a compact subset of perceptually informative indicators that balance computational efficiency, interpretability, and monitoring robustness.
5. Conclusions
This study presented a data-driven investigation of aircraft noise characteristics based on real-world continuous monitoring data collected at an operational airport environment. A complete analytical workflow was established, including spatiotemporal association between acoustic events and ADS-B trajectories, stratified data validation using model-specific IQR filtering, and multidimensional feature extraction based on psychoacoustic models. This framework enables the systematic analysis of aircraft noise characteristics beyond conventional energy-based monitoring approaches.
The comparative analyses reveal several key findings. First, traditional energy-based indicators effectively describe the overall magnitude of aircraft noise and show clear associations with macroscopic aircraft configuration parameters such as thrust and bypass ratio. However, these scalar descriptors tend to compress the spectral–temporal structure of aircraft noise into a nearly one-dimensional representation dominated by amplitude scaling. Second, psychoacoustic descriptors—including loudness, roughness, sharpness, fluctuation strength, and tonality—retain significant variability even under strict equal-energy constraints. The dispersion of these perceptually motivated features substantially exceeds the residual variation of the controlling energy metric, indicating that aircraft noise events with similar exposure levels may still differ considerably in their fine spectral and modulation characteristics. Third, manifold visualization further demonstrates that different metric systems organize the same dataset in distinct ways: energy descriptors produce highly concentrated embeddings, whereas psychoacoustic features reveal a higher-dimensional structure that preserves additional perceptual information.
Overall, the results indicate that psychoacoustic representations provide complementary dimensions of information beyond integrated energy levels. Integrating perceptually grounded descriptors into continuous aircraft noise monitoring systems therefore has the potential to enhance the analytical resolution and interpretability of monitoring data, particularly in situations where conventional metrics show limited discriminative capability.
Future research may further expand this framework in several directions. First, multi-site monitoring datasets could be incorporated to evaluate the generalizability of the proposed methodology under different airport configurations and operational conditions. Second, integrating psychoacoustic features with subjective response data and annoyance studies would help clarify the perceptual relevance of the extracted descriptors. Finally, further work may focus on identifying a compact subset of perceptually informative indicators that are most suitable for real-time monitoring applications, balancing computational efficiency, interpretability, and robustness in large-scale operational monitoring systems.