Artificial Neural Network for Air Pollutant Concentration Predictions Based on Aircraft Trajectories over Suvarnabhumi International Airport

Kamsing, Patcharin; Cao, Chunxiang; Boonpook, Wuttichai; Boonprong, Sornkitja; Xu, Min; Boonsrimuang, Pisit

doi:10.3390/atmos16040366

Open AccessArticle

Artificial Neural Network for Air Pollutant Concentration Predictions Based on Aircraft Trajectories over Suvarnabhumi International Airport

by

Patcharin Kamsing

^1,*

,

Chunxiang Cao

²,

Wuttichai Boonpook

³

,

Sornkitja Boonprong

⁴

,

Min Xu

²

and

Pisit Boonsrimuang

⁵

¹

International Academy of Aviation Industry, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand

²

State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

³

Department of Geography, Faculty of Social Sciences, Srinakharinwirot University, Bangkok 10110, Thailand

⁴

Faculty of Social Sciences, Kasetsart University, Bangkok 10900, Thailand

⁵

School of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand

^*

Author to whom correspondence should be addressed.

Atmosphere 2025, 16(4), 366; https://doi.org/10.3390/atmos16040366

Submission received: 12 February 2025 / Revised: 19 March 2025 / Accepted: 21 March 2025 / Published: 24 March 2025

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

Download

Browse Figures

Versions Notes

Abstract

Air pollutant concentration prediction is essential not only for effective air quality management but also for planning aircraft and ground vehicle route networks in terminal areas. In this work, an artificial neural network (ANN) is used to predict the concentration levels of four types of air pollutants (CO, NO₂, PM_2.5, and PM₁₀) at Suvarnabhumi International Airport. By leveraging Automatic Dependent Surveillance-Broadcast (ADS-B) historical data, aircraft trajectory pattern clustering is implemented by using K-means and Gaussian mixture model (GMM) clustering algorithms. Then, those trajectory patterns are inputted together with other flight data into ANN computation processes, resulting in an effective air pollutant prediction model for each kind of focus pollutant. The results demonstrate that the mean square errors (MSEs) of the predicted models for CO and PM_2.5 have acceptable values of 51.7622 and 53.9682, respectively, while the predicted model for NO₂ and PM₁₀ has MSEs of 139.6674 and 124.2517, respectively. This study contributes to the advancement of air pollutant prediction methodologies, facilitating better decision-making processes, proactive air quality management, and route network planning at airports. Although some prediction models for focused air pollutants have slightly high MSEs, further study is needed to enhance the prediction model capacity.

Keywords:

clustering; air traffic management; aviation emission; ADS-B

1. Introduction

With the rapid development of the aviation industry worldwide, Thailand is considered an important country for the aviation industry. Compared with the financial year of 2022, in 2023, Suvarnabhumi International Airport in Thailand showed a 56.21% increase in the total aircraft movement for both international and domestic aircraft and a 137.84% increase in the total number of international and domestic passengers [1]. This numerical number reflects the rapid growth of the aviation industry in Thailand. The numerical growth in aircraft movement at Suvarnabhumi International Airport increases environmental impacts such as air pollution, noise, and carbon emissions. Environmental impact assessments use methodologies like noise modeling, air quality monitoring, and carbon footprint analysis to quantify these effects. Moreover, the development of various infrastructures is crucial not only to expand airport capacity or enhance facilities but also to improve operational effectiveness and safety in airport management. In the terminal area, air traffic control services are provided for the departure and arrival of aircraft. It is a key area in air transportation. Route network planning in the terminal area may inadvertently facilitate or contribute to air pollution emissions from aircraft or ground vehicles in this area. This can seriously affect environmental and public health. The trade-off between the development of the aviation industry and environmental protection, which is related to organizations such as the National Air Quality Standards; aviation emissions regulations; and the implementation of the ICAO’s Carbon Offsetting and Reduction Scheme for International Aviation (CORSIA) should be carefully considered [2]. Research has shown that air pollution can impact both domestic and inbound tourism in China. Compared with other kinds of air pollution, PM_2.5 has a significant negative impact [3].

Several researchers have studied trajectory clustering. For example, in [4], the deep autoencoder and Gaussian mixture mode (GMM) method was proposed. Via this method, a deep autoencoder was trained to extract feature representations from historical trajectory data, and the result was inputted into GMM for clustering. This approach was implemented by using Automatic Dependent Surveillance-Broadcast (ADS-B) data over the terminal airspace of Guangzhou Baiyun International Airport in China. In [5], the t-distributed stochastic neighbor embedding (t-SNE) method and density peak clustering approach (DPCA) were implemented to improve K-means clustering for flight trajectories. In [6], aircraft trajectory patterns were presented by using K-means and a GMM with ADS-B history data over Suvarnabhumi International Airport during peak congestion. In addition, the three-dimensional Hail trajectory clustering method was mentioned in [7] for detailed analyses of trajectories. This technique is designed to group trajectories that have similar updraft-relative structures and orientations and then uses a modified density-based spatial application with noise (DBSCAN) method for the clustering task. The prediction of airport surface taxi time is presented in [8]. This method has high dynamism, stochasticity, and uncertainty and also affects air pollution emissions over the airport. Clustering methods differ in accuracy, computational efficiency, and real-world applicability. K-means is fast but less accurate with complex data, DBSCAN handles noise better but is computationally intensive, and hierarchical clustering offers high accuracy but is slow for large datasets. Therefore, the selection of clustering methods should depend on the environment and the balance between accuracy and the cost of computation.

Deep learning techniques offer several advantages in air quality prediction over traditional statistical models, including the ability to handle large and complex datasets, capture non-linear relationships, and improve accuracy by learning from high-dimensional features. They can also adapt to dynamic environmental conditions and integrate diverse data sources more effectively. One review article mentioned the use of artificial neural networks (ANNs) for predicting air pollution [9]. There are several methods, such as multilayer perceptron (MLP), random forest, ANN, and adaptive neuro-fuzzy models, that relate the period-time to prediction, such as short-term, air quality daily forecasts, or monthly forecasts. The study shows that ANN techniques can forecast air contaminants more precisely than other methods because they can handle a wide range of input meteorological parameters. In [10], an enhanced air pollution prediction method is presented that uses a transfer learning technique to forecast different air pollutants. Therefore, it generalizes over a large set of air pollutants at the same air monitoring station. The results show that their proposed method significantly improved accuracy with less fine-tuning data, overcame the issue of limited labeled data, and contributed to air quality management. ANNs have also been implemented for air quality assessment and pollution forecasting in metropolitan Lima, Peru [11]. MLP and the recurrent memory network LSTM have been applied for the hourly prediction of PM₁₀, together with the past values of this pollutant and some meteorological parameters obtained from physical monitoring stations. Two types of prediction models for determining air pollution indices via neural networks have been proposed for industrial cities [12]. There are temporal or short-term forecasts of the air pollution index (API), which are used for the nearest areas, and spatial forecasts or forecasts of the atmospheric pollution index at any point in a city. Via these models, sufficiently reliable forecast models were obtained, which efficiently analyzed and predicted the behavior of the dynamics of air pollution in an industrial city. In another study of industrial sites [13], a nonlinear autoregressive model (NARX)-based ANN was used. NARX, a recurrent dynamic network with feedback connections enclosing several layers of the network [14], has been extensively applied in various applications [15,16]. The proposed method can be utilized to predict the concentrations of two pollutants (NO_x and CO); RMSE and the mean absolute error (MAE) are employed to determine its performance. Moreover, an online system for air pollution forecasting using a neural network with cumulative predicted values of previous days [17] has been proposed. Using this method, the size of the training database used in the model can be optimized. The resulting model can achieve the minimum error rate by using 3–15 historical days in the training dataset. In [18], the integration of visual sensor data such as images from cameras together with data from other kinds of sensors was proposed to measure and label PM_2.5 and PM₁₀. Both image and labeled data were input into four different architectures of deep neural networks. The results show that the custom pretrained inception model, combined with image and nonimage data, yields significant results in terms of accuracy and better performance than the other implemented methods. ANNs have also been applied with nine factors to predict air pollution over Ahvaz, Iran [19]. In this study, the researcher implemented 30 neurons in the hidden phase to obtain one final output. The model predicts the air quality index (AQI) and air quality health index (AQHI) every hour. The results indicate that ANNs can be applied by decision makers to estimate spatial–temporal profiles of pollutants and air quality indices. The neural network (NN) method has been implemented for air quality forecasting in Jakarta city, Indonesia [20]. The differences in hyperparameters, such as the number of neurons in hidden layer 1, the number of training cycles, and the learning rate, are included in the experiment to find the best accuracy for the dataset. NN and deep learning (DL) have also been used for air pollution and water pollution prediction in the Indian context [21]. These are examples of applied machine learning in many cities to predict air pollution. However, machine learning models for air pollution prediction may struggle with dynamic and heterogeneous urban environments due to limited adaptability; insufficient data accuracy; and difficulty capturing complex, real-time environmental changes. Therefore, the generalizability of the cited ANN-based air pollution forecasting methods should be tested with diverse datasets from multiple regions, validating models with cross-validation techniques and accounting for variations in climate conditions and local factors during training.

To improve its performance, LSTM has been enhanced to LSTM-extended (LSTME) for air pollution concentration prediction [22]. LSTME inherently considers spatiotemporal correlation, which most existing methods of air pollutant concentration prediction neglect. Therefore, LSTME can better express RSME measurements [22] than the comparison methods, namely, spatiotemporal deep learning (STDL), time delay neural network (TDNN), autoregressive moving average (ARMA), and support vector regression (SVR). In [23], the use of ANNs for predicting indoor air quality (IAQ) in schools is reviewed. Researchers have summarized the types and sources of indoor air pollutants (IAPs) and their indicators. The experiment follows a systematic evaluation of ANNs as predictive models of IAQ. The results indicate that ANNs are the most frequently used methods in air quality prediction and that recurrent neural networks (RNNs) can analyze time-series issues such as IAQ better. NNs have also been deployed for predicting PM₁₀ at construction sites [24]. In this paper, the construction influence index was applied, time-series data of air pollutants were decomposed into wavelet representations, and wavelet coefficients were predicted by using three types of neural network models. The results show that the proposed method results in a lower RMSE than the original method does. Artificial and wavelet neural networks have also been deployed for air pollution forecasting [25]. Researchers have studied many network topologies for ANN and wavelet-ANN (WANN) models with many meteorological factors to analyze the correlation of the data. The results indicate that WANNs are effective in short-term API forecasting because of their ability to recognize historical patterns and identify nonlinear relationships between input and output variables.

Analysis tools [26], such as the atmospheric dispersion modeling system (ADMS); the emission and dispersion modeling system (EDMS) and its successor, the aviation environment design tool (AEDT); and the Lagrangian simulation of aerosol transport for airports (LASPORT), have been used to assess the influence of flight trajectories on air pollution. For instance, in [26], researchers employed an aviation environment design tool (AEDT) with different departure routes from the Warsaw Chopin airport and compared their impacts on air quality; nitrogen oxide (NOx), carbon monoxide (CO), and particulate matter (PM)10 were included in the study. The research in [27] presents an emissions calculation framework based on actual flight trajectories, including both ground and airborne emissions. The overall performance of the proposed method demonstrates that the great circle route leads to an overestimation of 56.8% of pollutant emissions compared with the method based on actual routes. Research has aimed to reduce the environmental impact of aircraft operation in the terminal area by using the route optimization method. The objective function that is implemented considers two perspectives, namely, minimizing air pollution emission and minimizing noise [2]. The optimized route network in the terminal area decreases emissions by 51.4% and noise influence by 21.5% better than the original route network does.

In the experiments conducted herein, air pollutant concentrations were predicted on the basis of related aircraft trajectory data, which is greatly beneficial for the development and management of air traffic control and environmental public health management. The main contributions of this work are summarized as follows.

(1) A method for the prediction of air pollutant concentrations according to related aircraft trajectory data is proposed.

(2) Aircraft trajectory patterns over Suvarnabhumi International Airport are analyzed, benefiting the air traffic management of the airport.

(3) The proposed method is tested with respect to both the clustering process and the regression prediction process by implementing a silhouette score and three evaluation metrics, namely, mean absolute error, mean squared error, and R-squared.

The remainder of this paper is organized as follows. In Section 2, the related works that briefly explain the K-means method, GMM, Silhouette score, and three evaluation metrics are presented. Section 3 describes the utilized materials and the proposed method, which involves integrated trajectory clustering via the K-means method and GMM with simple regression model calculations. The results are illustrated in Section 4. Conclusions and suggestions for future work are presented in Section 5.

2. Related

2.1. K-Means

K-means clustering, also referred to as Lloyd’s algorithm, mainly deploys the iteration technique and proceeds by alternating between two steps: the assignment step and the update step [6,28,29,30]. Given the initial set of

k

means

m_{1}^{(1)}, . . ., m_{k}^{(1)}

, in the first step, each observation is assigned to the cluster whose mean has the least-squared Euclidean distance

(d)

as (1), which is intuitively the “nearest” mean:

d (p, q) = d (q, p) = \sqrt{{(q_{1} - p_{1})}^{2} + . . . + {(q_{n} - p_{n})}^{2}}

(1)

where

p = (p_{1}, p_{2}, . . ., p_{n})

and

q = (q_{1}, q_{2}, . . ., q_{n})

are two points in Euclidean

n

space. Then, the observation

S_{i}^{(t)}

is computed by:

S_{i}^{(t)} = \{x_{p} : | | x_{p} - m_{i}^{(t)} | |^{2} ⩽ | | x_{p} - m_{j}^{(t)} | |^{2} \forall_{j}, 1 \leq j \leq k\}

(2)

where each

x_{p}

is assigned to exactly one

S_{i}^{(t)}

, even if it could be assigned to two or more of them.

The next step is updating by calculating the new means or centroids of the observations in the new clusters by (3):

m_{i}^{(t + 1)} = \frac{1}{|S_{i}^{(t)}|} \sum_{x_{j} \in S_{i}^{(t)}} x_{j}

(3)

The iteration processes converge when the assignments no longer change.

2.2. Gaussian Mixture Model

The Gaussian mixture model (GMM) [6,31,32,33,34] is a type of mixture density model that assumes that each component has a Gaussian distribution. GMM assumes a mixture model composed of

c

Gaussian density components with parameters

θ_{k} = \{u_{k}, Σ_{k}\}

in the

k

^th component. The probability density of

x_{i}

is formulated in GMM as follows:

p (x_{i} | π, θ) = \sum_{k = 1}^{c} π_{k} p (x_{i}, θ)

(4)

where

θ = \{θ_{1}, θ_{2}, . . ., θ_{c}\}

represents the parameters of all the components, and

π_{k}

represents the mixing weight of the

k

^th component, satisfying

π_{k} \geq 0

and

\sum_{k = 1}^{c} π_{k} = 1

. The

k

^th Gaussian is denoted by the following:

p (x_{i} | θ_{k}) = \frac{1}{\sqrt{(2 π) |Σ_{k}|}} e x p (- \frac{{(x_{i} - u_{k})}^{T} \sum_{k}^{- 1} (x_{i} - u_{k})}{2})

(5)

where

u_{k}

and

Σ_{k}

are the mean and the covariance matrix, respectively. The parameters

\{θ, π\}

can be estimated via an iteration method such as the K-means clustering algorithm by maximizing the likelihood function using the expectation-maximization (EM) algorithm in terms of the following updates:

ω_{i}^{k (t)} = \frac{π_{k}^{(t)} p (x_{i} | θ_{k}^{(t)})}{\sum_{j = 1}^{c} π_{j}^{(t)} p (x_{i} | θ_{k}^{(t)})}

(6)

π_{k}^{(t + 1)} = \frac{\sum_{i = 1}^{N} ω_{i}^{k (t)}}{N}

(7)

u_{k}^{(t + 1)} = \frac{\sum_{i = 1}^{N} (ω_{i}^{k (t)}) x_{i}}{\sum_{i = 1}^{N} ω_{i}^{k (t)}}

(8)

\sum_{k}^{(t + 1)} = \frac{\sum_{i = 1}^{N} ω_{i}^{k (t)} (x_{i} - u_{k}^{(t + 1)}) {(x_{i} - u_{k}^{(t + 1)})}^{T}}{\sum_{i = 1}^{N} ω_{i}^{k (t)}}

(9)

2.3. Silhouette Score

The silhouette score [35] is used to identify the goodness of fit of a clustering technique that is implemented. The silhouette score ranges between −1 and 1, where a score of 1 indicates that clusters are clearly separated. If the score equals 0, the clusters are not clearly distinguished or the distance between clusters is not significant. A silhouette score equal to −1 means that clusters are assigned incorrectly. The silhouette score is calculated by using two kinds of distances, namely, the average intra-cluster distance (a) and the average inter-cluster distance (b), via Equation (10):

S i l h o u e t t e s c o r e = (b - a) / m a x (a, b)

(10)

For example, regarding intra-cluster distance (a)—or the average distance between points within the same cluster, for points A (1, 2), B (2, 3), and C (3, 4)—the average distance is about 2.20. Regarding inter-cluster distance (b)—or the average distance between points from different clusters, for points from two clusters—the average distance between A, B and points D (5, 5), E (6, 6) is 5.

2.4. Evaluation Metrics

The following evaluation metrics for air pollutant prediction based on aircraft trajectories in the experiment are used—mean absolute error (MAE), mean squared error (MSE), and R-squared (R²):

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(11)

MAE measures the average absolute difference between the predicted and actual values, providing a clear indication of the average prediction error. MSE is used to measure the average squared difference, penalizing large errors more heavily, as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(12)

The last evaluation metric that is used in the experiment is R², which measures the proportion of variance explained by the model, indicating overall fit:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(13)

where

y_{i}

,

{\hat{y}}_{i}

, and

\bar{y}

represent the actual values, predicted values, and mean values, respectively.

3. Materials and Methods

3.1. Pollution Emission Data

The pollution emission data used in this experiment were downloaded from the official website of the Airports of Thailand Public Co., Ltd. Bangkok, Thailand [36]. There are two monitoring sites for air quality monitoring systems (AQMSs), namely, (1) AQMS-NORTH, which is located in the northern part of the Suvarnabhumi airport, and (2) AQMS-SOUTH, which is located in the southern part of the airport, as shown in Figure 1 and Figure 2.

Four kinds of pollution emission data were considered in the experiment. The first two considered are PM_2.5 and PM₁₀. PM_2.5 has units of

μ g / m^{3}

. PM_2.5 refers to dust that is smaller than 2.5 micrometers (

μ m

), and PM₁₀ is dust with a diameter less than 10

μ g / m^{3}

originating from burning from vehicles or agricultural or industrial processes. PM_2.5 has the ability to access lung alveoli, causing respiratory tract disease. If it is inhaled in large quantities or for a long period, these particles will accumulate in the lungs, causing the lungs to function less efficiently, likely causing bronchitis and asthma. The next is CO. CO is a colorless, odorless, and tasteless gas that is produced by the incomplete combustion of fuels that contain carbon. This gas can accumulate in the body and combines with hemoglobin in red blood cells approximately 200–250 times more efficiently than oxygen does. When inhaled, this gas competes with hemoglobin in the blood to form carboxyhemoglobin (CoHb), which reduces the transport of oxygen to cells in the body. As a result, the body becomes weak, and the heart has to work harder. NO₂ is the remaining pollutant that was implemented in the experiment. It is also a colorless and odorless gas that is slightly soluble in water. It is found in nature or is produced by human activities, such as the burning of various fuels, and some industries. This gas affects the visual system as well as individuals with asthma or respiratory diseases. There is also research work that mentions types of diseases caused by PM_2.5 and NO_x [37], which highlights the importance of this research.

3.2. Trajectory Data

Figure 3 shows the runways of Suvarnabhumi International Airport. In 2023, there were two active runways during the study period. AIP Thailand [38] conveyed information about a redesigned third runway at the Suvarnabhumi airport effective from 2 October 2024 at 1900 UTC to 3 October 2024 at 1000 UTC; this communication indicated that three runways were available.

The ADS-B trajectory data were used in this experiment. ADS-B has two different services, namely, ADS-B out and ADS-B in. The ADS-B data that were implemented in the experiment were transmitted at 1090 MHz using the mode-S extended squitter of the SSR transponder, with a bandwidth of approximately 50 kHz. The data query from the website “opensky-network.org” [6,39], from 1 January 2023 to 31 December 2023, involved grabbing only the time between 9.00 P.M. and 12.00 A.M., which is considered the experimental time. An example of data extraction is shown in Figure 4. Each ADS-B column data point shows the related trajectory data of the aircraft. Some of the data are directly inputted as a feature for training the regression model for each type of air pollutant. Some of the data are processed to extract important features for training the regression model, such as the actions of aircraft, for instance, landing; take-off; or other, as defined in Equation (14).

3.3. Methods

The workflow of the experiment is shown in Figure 5. It begins with loading trajectory data. This is the process of extracting callsign data, which refers to the flight over the study area. Additionally, there is a step to convert the time from Unix time to UTC time and extract the month, date, and hour data from the UTC time. Monthly data represent seasonal information related to temperature, humidity, and wind characteristics of the study area. It is the first feature used to train the regression model for each type of air pollutant.

ADS-B works by broadcasting information about an aircraft’s GPS location, altitude, ground speed, and other data to ground stations and other aircraft once per second [40]. Therefore, one flight or one callsign will have various transactions of ADS-B data. Since the actions of aircraft, such as landing, take-off, or flying, result in different pollutant emission levels because of differences in engine working levels, in the experiment, three kinds of actions are separated from the trajectory data, namely, (1) landing (labeled “0”); (2) take-off (labeled “1”); and (3) others (labeled “2”). Callsign data are the important data used to group transactions of trajectory data and identify actions. The predefined conditions for the action of landing, take-off, and others are illustrated in Equation (14) as follows:

T_{a c t i o n} = \{\begin{matrix} 0, (l a t [- 1] > 13.6) a n d (l a t [- 1] < 13.930) a n d \\ (l o n [- 1] > 100.7) a n d (l o n [- 1] < 100.8) a n d \\ (b a r o a l t i t u d e [- 1] < b a r o a l t i t u d e [0]) \\ 1, (l a t [0] > 13.6) a n d (l a t [0] < 13.930) a n d \\ (l o n [0] > 100.7) a n d (l o n [0] < 100.8) a n d \\ (b a r o a l t i t u d e [- 1] > b a r o a l t i t u d e [0]) \\ 2, O t h e r w i s e \end{matrix}

(14)

The index [−1] indicates the last transaction or record of ADS-B data of the focus callsign that is presented in the downloaded data over the study area, whereas the index [0] refers to the first transaction of it. The action of each transection (

T_{a c t i o n})

is the second feature for training the regression model of each type of air pollution.

To find the pattern of a trajectory, the action of each transection (

T_{a c t i o n})

that identifies as landing and take-off will be further processed by normalization and separation for training and testing datasets with a ratio of 80/20. Unsupervised clustering is computed by using K-means and GMM with the number of clusters equal to 4. This is the maximum possible method for take-off or landing at the airport, which had two available runways during the experimental period.

The location of K-means calculations representing latitude, longitude, and baro altitude are the spatial points in this study. It involves iterative processes of K-means and GMM with silhouette scores to obtain silhouette scores greater than the preset threshold. According to a trial method with data of various characteristics, the silhouette score thresholds of both landing and take-off for the K-means method are set at 0.33 and 0.25 for the GMM clustering method. The result of the clustering process is a trajectory pattern that is obtained from both K-means and GMM. There are four trajectory patterns for the landing action and four trajectory patterns resulting from the GMM clustering method. Additionally, the K-means clustering method produces four trajectory patterns for the landing action and four trajectory patterns for take-off. To train the regression model of each type of air pollutant, feature numbers 3 and 4 are the pattern numbers that originate from GMM and K-means, respectively. The ICAO24 number, velocity, heading, and baro altitude are deployed as feature numbers 5–8 for training the regression model of each type of air pollutant.

The labeled data for training the regression model of each type of pollutant are the concentrations of the air pollution data. The air quality index data are extracted with the match year, month, date, and hours. Therefore, four types of air pollutant concentration data are used as labeled data, namely, CO, NO₂, PM_2.5, and PM₁₀.

The process involves training the regression model for each type of air pollutant. All the datasets are preprocessed by removing rows that are missing values or duplicate, and normalized by standardized features by removing the mean and scaling to unit variance, and then separated into training and testing datasets with ratios of 70/30, which is proper because the dataset is large, and 70% is sufficient for training the model while leaving an adequate portion for testing. The 30% allocated for testing ensures the model is evaluated on a substantial amount of data, providing a more reliable assessment of its generalizability. The regression model is implemented one by one for the labeled data. Therefore, four regression models are used, where each model represents the relationship of one type of air pollutant (CO, NO₂, PM₁₀, or PM_2.5), and the eight features mentioned above are related to the trajectory data of aircraft. The regression architecture implemented in the experiment is shown in Figure 6.

The regression model runs with 100 epochs and uses a batch size of 10, with an Adam optimizer and learning rate of 0.0001, since these values proper to reduce the error as presented in Section 4. However, this number can be adjusted to obtain better performance based on the applied environment. As shown in Figure 6, the pyramid structure of the neural network is chosen since it is simple and convenient to upgrade in the future to serve more complex data. This structure is used to decrease the number of neurons in each hidden layer as the network progresses to the output. The number of input features is fixed, but many neurons are set on the first hidden layer, and the number of input features in the subsequent layers is gradually reduced. Because it has only one target for each type of air pollution, the final layer should output only one value. Here, the number of input layers is eight, which is the same as the number of implemented features. Consequently, the first fully connected layer has 48 nodes connected to the input layer with ReLU activation. The second fully connected layer applies for 24 nodes connected to the previous layer, followed by another ReLU activation. The third fully connected layer subsequently deploys 12 connected nodes, with ReLU activation. The next hidden layer has 6 nodes followed by the ReLU activation function. The last layer is the output layer identified with one node that outputs the result. This regression architecture is simple to implement and can scale the layer and node of neurons conveniently. The use of fully connected layers with ReLU activation functions is to enable the model to learn complex relationships between input features and outputs. Each fully connected layer allows for the modeling of interactions between all input features, with the number of nodes determining the capacity of the model to capture these interactions. The decreasing number of nodes likely reflects the reduction in the complexity of the features as the network moves deeper, allowing the model to gradually abstract and distill the most relevant information. ReLU activation functions are used because they introduce non-linearity, allowing the model to learn non-linear patterns in the data. ReLU is computationally efficient and helps avoid the vanishing gradient problem, allowing the model to learn more effectively during training [41,42,43].

The last process of the experiment is to validate or evaluate the results by using the evaluation metrics in some of the data of the four regression models. Additionally, the performance of the K-means and GMM clustering results, which are inputted as one feature of the regression model, is evaluated by using the silhouette score.

4. Results

4.1. K-Means and GMM Clustering Results

After pre-data processing, there are 18,214 and 12,214 callsigns for the actions of landing and take-off, respectively. A ratio of 80/20 is used to separate the data into training and testing data. For the landing action, there are 14,572 callsigns for the training dataset and 3642 callsigns for the testing dataset. For take-off, the training and testing datasets have 9772 and 2442 callsigns, respectively. As mentioned in the previous section, to obtain flight or trajectory patterns, K = 4 is implemented for K-means and GMM clustering. Table 1 shows the silhouette score and total number of callsigns for all four groups of both landing and take-off actions. All the silhouette scores are greater than zero, and the trend increases to 1, which means that the data are clustered. K₁, K₂, K₃, and K₄ are the numbers of callsigns that are clustered into groups 1 to 4.

Figure A1, Figure A2, Figure A3 and Figure A4 in Appendix A demonstrate the patterns resulting from K-means and GMMs for both the landing and take-off actions. The landing action presented in Figure A1 and Figure A2 for K-means and GMM clustering, respectively, may imply that there are two main groups of trajectory patterns, namely, group pattern number 3 and the others. Corresponding to the take-off action, Figure A3 and Figure A4 show the graphs from K-means and GMM clustering, respectively, which have two main groups of patterns that gain entry from the left and right sides of the study area. Although the graphical results of the pattern of the aircraft trajectory in this study are complicated because of the large number of flights over the study area and focus duration time, the results are confirmed by the numerical silhouette score. This score represents the effectiveness of clustered processing by K-means and GMMs, which suggests that this approach is acceptable for further experimental processes.

4.2. Regression Model Results

The regression model of each type of air pollution explains the relationship between the concentration level of air pollution and related trajectory data of aircraft, which has eight features and a total of 7,853,480 records after pre-data processing. The datasets are separated into training and testing datasets with ratios of 70 and 30, respectively. Therefore, the training dataset contains 5,497,436 records, and the testing dataset contains 2,356,044 records.

Figure 7 represents the MSE in each epoch of the regression model, which is predefined with 100 epochs. The MSE of CO is the smallest, whereas the MSE of NO₂ is the highest, as illustrated in Table 2. This shows that the regression model of each kind of air pollution result presents some errors in predicting the air pollution for CO, NO₂, PM₁₀, and PM_2.5 by using the related trajectory data extracted from the ADS-B out of aircraft. High values of MSEs are present for NO₂ and PM₁₀, at 139.6674 and 124.2517, respectively. These are acceptable levels for application to route network planning in the terminal area and environmental and public health management.

Moreover, in the experiments, 2,356,044 samples from the training dataset were tested by using three evaluation metrics. The results are presented in Table 3. The regression model for PM₁₀ is the worst, and the results for NO₂ are similar to those for PM₁₀. However, the R² of PM₁₀ is better than that of NO₂. In addition, NO₂ and PM₁₀ are presented as high in MAE and MSE. Further investigation has been conducted by implementing the SHAP value to gain deeper understanding of the impact of each feature on the regression model. Table 4 illustrates the feature number and its representation, which is related to the expression SHAP value in Figure 8. Figure 8 expresses the SHAP value and means SHAP value for all regression models. The average SHAP value of all regression models reflects that month, and the pattern number from K-mean clustering has a high impact on the regression model, while

T_{a c t i o n}

and the pattern number from GMM are next in affecting the model. ICAO24, which is used for identifying the airplane, has the lowest impact on the models. These results imply that the air traffic pattern over the airport is a crucial factor that will affect the level of air pollution, which is in the form of K-mean and GMM clustering in this experiment.

T_{a c t i o n}

that identifies take-off, landing, or other airplane processes also impacts all kinds of air pollution regression models.

5. Conclusions

The terminal area is complex because it is used for departure and arrival aircraft services. Good planning and management of both aircraft and ground vehicles are needed by considering safety, time limitations, and the generation level of air pollutants, which affect public health. In this experiment, the air pollutant concentration was predicted by using an ANN. This network separately generates prediction models for CO, NO, PM_2.5, and PM₁₀. The historical data used in the study are ADS-B data collected at Suvarnabhumi International Airport between 9.00 P.M. and 12.00 A.M. every day during the year 2023. Each record of ADS-B data has been assigned an activity, namely, take-off, landing, or other, which uses predefined conditions. The predefined method is based on only ADS-B data, which may lead to misclassifications, oversimplification, and bias in data analysis, directly affecting the regression models. Therefore, incorporating additional flight phase data from other sources could further enhance the accuracy through improved implementation. To identify the number of clusters, the process of pattern clustering (K-means and GMM) uses the maximum number of possible approaches that aircraft employ to take off or land. A K equal to 4 is implemented in this study. The number of clusters resulting from K-means and GMM clustering are used as features to be input into the training regression model, together with casting to the integer of the ICAO24 number, velocity, heading, baro altitude, month, and type of action extracted from the ADS-B data. The regression models are evaluated in terms of the MSE. The results indicate that the air pollutant concentration prediction models for CO and PM_2.5 are similar at MSEs of 51.7622 and 53.9682, respectively. The prediction models of NO₂ and PM₁₀ have MSE values of 139.6674 and 124.2517, respectively. Three evaluation metrics, namely, the MAE, MSE, and R², are used to evaluate the regression model with the testing dataset. Moreover, the average of the SHAP value is applied to investigate the impact of each feature inputted for building the regression model for each of the air pollutants. The results show that month, the pattern of trajectories created by GMM and K-mean, and the action (

T_{a c t i o n}

) of airplanes play an important role in every regression model. The overall results are acceptable and can contribute to the further study of air pollutant concentration predictions by using ANN methodologies. It can be applied for real-time air traffic control scenarios to predict the level of air pollution by adding the regression model as the backend of the web application system for convenience monitoring. However, geographic characteristics, climate characteristics, and regulatory constraints are the crucial factors that must be considered when it is in real-world application and different airports.

Future research has three main parts that should be considered. First, other factors—such as the effects of ground vehicles, which can emit air pollution, and high vehicle traffic and congestion in terminal areas, leading to inefficient operations and more emissions—will be added as an input feature to the regression model calculation to improve the precision of the model. In addition, the data from the remote sensing technique can be applied as a feature to improve air pollution regression models [44,45,46]. Second, other architectures of the ANN or regression method will be tested under the condition of convenience to users, who may use the predicted model. Third, we will apply research from other disciplines to contribute to the digital airport concept, such as applying the concept of airplane pose estimation [47,48] for autonomous air traffic planning.

Author Contributions

Conceptualization, P.K., C.C., W.B., S.B. and P.B.; methodology, P.K., C.C., W.B., S.B. and P.B.; software, P.K., C.C., W.B., S.B. and P.B.; validation, P.K., C.C., W.B., S.B. and P.B.; formal analysis, P.K., C.C., W.B., S.B. and P.B.; investigation, P.K., C.C., W.B., S.B. and P.B.; resources, P.K., C.C., W.B., S.B. and P.B.; data curation, P.K., C.C., W.B., S.B. and P.B.; writing—original draft preparation, P.K., W.B., S.B. and P.B.; writing—review and editing, P.K., C.C., W.B., S.B. and P.B.; visualization, P.K., C.C., W.B., S.B., M.X. and P.B.; supervision, P.K., C.C., W.B., S.B. and P.B.; project administration, P.K.; funding acquisition, P.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research is the result of the project entitled “Impact of air traffic on air quality by Artificial Intelligent Grant No. RE-KRIS/FF67/012” by King Mongkut’s Institute of Technology Ladkrabang, which has been received funding support from the National Science Research and Innovation Fund (NSRF).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

The authors would like to acknowledge the administrative and technical support received from Satang Space Company Limited, the Air-Space Control Optimization and Management Laboratory (ASCOM-LAB), the International Academy of Aviation Industry, and King Mongkut’s Institute of Technology Ladkrabang, which contributed suggestions and infrastructure during the experiments.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. K-means clustered pattern number one (a), number two (b), number three (c), and number four (d) of landing actions.

Figure A2. GMM clustered pattern number one (a), number two (b), number three (c), and number four (d) of landing actions.

Figure A3. K-means clustered pattern number one (a), number two (b), number three (c), and number four (d) of take-off actions.

Figure A4. GMM clustered pattern number one (a), number two (b), number three (c), and number four (d) of take-off actions.

References

Airports of Thailand Public Co., Ltd. Air Transport Statistic. Available online: https://investor.airportthai.co.th/transport.html (accessed on 24 January 2025).
Tian, Y.; Wan, L.; Ye, B.; Yin, R.; Xing, D. Optimization Method for Reducing the Air Pollutant Emission and Aviation Noise of Arrival in Terminal Area. Sustainability 2019, 11, 4715. [Google Scholar] [CrossRef]
Xu, X.; Dong, D.; Wang, Y.; Wang, S. The Impacts of Different Air Pollutants on Domestic and Inbound Tourism in China. Int. J. Environ. Res. Public Health 2019, 16, 5127. [Google Scholar]
Zeng, W.; Xu, Z.; Cai, Z.; Chu, X.; Lu, X. Aircraft Trajectory Clustering in Terminal Airspace Based on Deep Autoencoder and Gaussian Mixture Model. Aerospace 2021, 8, 266. [Google Scholar] [CrossRef]
Wang, Z.-S.; Zhang, Z.-Y.; Cui, Z. Research on Resampling and Clustering Method of Aircraft Flight Trajectory. J. Signal Process. Syst. 2023, 95, 319–331. [Google Scholar] [CrossRef]
Kamsing, P.; Torteeka, P.; Yooyen, S.; Yenpiem, S.; Delahaye, D.; Notry, P.; Phisannupawong, T.; Channumsin, S. Aircraft Trajectory Recognition Via Statistical Analysis Clustering for Suvarnabhumi International Airport. In Proceedings of the 2020 22nd International Conference on Advanced Communication Technology (ICACT), Phoenix Park, Republic of Korea, 16–19 February 2020; pp. 290–297. [Google Scholar]
Adams-Selin, R.D. A Three-Dimensional Hail Trajectory Clustering Technique. Mon. Weather. Rev. 2023, 151, 2361–2375. [Google Scholar] [CrossRef]
Yin, J.; Zhang, M.; Ma, Y.; Wu, W.; Li, H.; Chen, P. Prediction and Analysis of Airport Surface Taxi Time: Classification, Features, and Methodology. Appl. Sci. 2024, 14, 1306. [Google Scholar] [CrossRef]
Yadav, V.; Yadav, A.K.; Singh, V.; Singh, T. Artificial Neural Network an Innovative Approach in Air Pollutant Prediction for Environmental Applications: A Review. Results Eng. 2024, 22, 102305. [Google Scholar] [CrossRef]
Jairi, I.; Ben-Othman, S.; Canivet, L.; Zgaya-Biau, H. Enhancing Air Pollution Prediction: A Neural Transfer Learning Approach across Different Air Pollutants. Environ. Technol. Innov. 2024, 36, 103793. [Google Scholar] [CrossRef]
Cordova, C.H.; Portocarrero, M.N.L.; Salas, R.; Torres, R.; Rodrigues, P.C.; López-Gonzales, J.L. Air Quality Assessment and Pollution Forecasting Using Artificial Neural Networks in Metropolitan Lima-Peru. Sci. Rep. 2021, 11, 24232. [Google Scholar] [CrossRef]
Rahman, P.A.; Panchenko, A.A.; Safarov, A.M. Using Neural Networks for Prediction of Air Pollution Index in Industrial City. IOP Conf. Ser. Earth Environ. Sci. 2017, 87, 042016. [Google Scholar] [CrossRef]
Djebbri, N.; Rouainia, M. Artificial Neural Networks Based Air Pollution Monitoring in Industrial Sites. In Proceedings of the 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–5. [Google Scholar]
Araghinejad, S. Data-Driven Modeling: Using MATLAB^® in Water Resources and Environmental Engineering; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
Leontaritis, I.J.; Billings, S.A. Input-Output Parametric Models for Non-Linear Systems Part I: Deterministic Non-Linear Systems. Int. J. Control 1985, 41, 303–328. [Google Scholar] [CrossRef]
Simpkins, A. System Identification: Theory for the User, 2nd Edition (Ljung, L.; 1999) [On the Shelf]. IEEE Robot. Autom. Mag. 2012, 19, 95–96. [Google Scholar] [CrossRef]
Kurt, A.; Gulbagci, B.; Karaca, F.; Alagha, O. An Online Air Pollution Forecasting System Using Neural Networks. Environ. Int. 2008, 34, 592–598. [Google Scholar] [CrossRef]
Kalajdjieski, J.; Zdravevski, E.; Corizzo, R.; Lameski, P.; Kalajdziski, S.; Pires, I.M.; Garcia, N.M.; Trajkovik, V. Air Pollution Prediction with Multi-Modal Data and Deep Neural Networks. Remote Sens. 2020, 12, 4142. [Google Scholar] [CrossRef]
Maleki, H.; Sorooshian, A.; Goudarzi, G.; Baboli, Z.; Tahmasebi Birgani, Y.; Rahmati, M. Air Pollution Prediction by Using an Artificial Neural Network Model. Clean Technol. Environ. Policy 2019, 21, 1341–1352. [Google Scholar] [CrossRef] [PubMed]
Kristiyanti, D.; Purwaningsih, E.; Nurelasari, E.; Hairul Umam, A. Implementation of Neural Network Method for Air Quality Forecasting in Jakarta Region. J. Phys. Conf. Ser. 2020, 1641, 012037. [Google Scholar] [CrossRef]
Nandi, B.P.; Singh, G.; Jain, A.; Tayal, D.K. Evolution of Neural Network to Deep Learning in Prediction of Air, Water Pollution and Its Indian Context. Int. J. Environ. Sci. Technol. 2024, 21, 1021–1036. [Google Scholar] [CrossRef]
Li, X.; Peng, L.; Yao, X.; Cui, S.; Hu, Y.; You, C.; Chi, T. Long Short-Term Memory Neural Network for Air Pollutant Concentration Predictions: Method Development and Evaluation. Environ. Pollut. 2017, 231, 997–1004. [Google Scholar] [CrossRef]
Dong, J.; Goodman, N.; Rajagopalan, P. A Review of Artificial Neural Network Models Applied to Predict Indoor Air Quality in Schools. Int. J. Environ. Res. Public Health 2023, 20, 6441. [Google Scholar] [CrossRef]
Feng, Q.; Wu, S.; Du, Y.; Huaiping, X.; Xiao, F.; Ban, X.; Li, X. Improving Neural Network Prediction Accuracy for PM10 Individual Air Quality Index Pollution Levels. Environ. Eng. Sci. 2013, 30, 725–732. [Google Scholar] [CrossRef]
Guo, Q.; He, Z.; Li, S.; Li, X.; Meng, J.; Hou, Z.; Liu, J.; Chen, Y. Air Pollution Forecasting Using Artificial and Wavelet Neural Networks with Meteorological Conditions. Aerosol Air Qual. Res. 2020, 20, 1429–1439. [Google Scholar] [CrossRef]
Jasiński, R. Analysis of the Applied Flight Trajectory Influence on the Air Pollution in the Area of Warsaw Chopin Airport. J. Ecol. Eng. 2024, 25, 294–305. [Google Scholar] [CrossRef]
Wang, J.; Zhang, H.; Yu, J.; Lu, F.; Li, Y. Modeling Civil Aviation Emissions with Actual Flight Trajectories and Enhanced Aircraft Performance Model. Atmosphere 2024, 15, 1251. [Google Scholar] [CrossRef]
Kanungo, T.; Mount, D.M.; Netanyahu, N.S.; Piatko, C.D.; Silverman, R.; Wu, A.Y. An Efficient K-Means Clustering Algorithm: Analysis and Implementation. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 881–892. [Google Scholar]
Wilkin, G.A.; Huang, X. K-Means Clustering Algorithms: Implementation and Comparison. In Proceedings of the Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007), Iowa City, IA, USA, 13–15 August 2007. [Google Scholar]
Qi, J.; Yu, Y.; Wang, L.; Liu, J. K*-Means: An Effective and Efficient K-Means Clustering Algorithm. In Proceedings of the 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), Atlanta, GA, USA, 8–10 October 2016. [Google Scholar]
Cai, W.; Lei, L.; Yang, M. A Gaussian Mixture Model-Based Clustering Algorithm for Image Segmentation Using Dependable Spatial Constraints. In Proceedings of the 2010 3rd International Congress on Image and Signal Processing, Yantai, China, 16–18 October 2010; pp. 1268–1272. [Google Scholar]
Li, Y.; Zhang, J.; Ma, Z.; Zhang, Y. Clustering Analysis in the Wireless Propagation Channel with a Variational Gaussian Mixture Model. IEEE Trans. Big Data 2020, 6, 223–232. [Google Scholar] [CrossRef]
Sun, H.; Liu, Z.; Kong, L. A Document Clustering Method Based on Hierarchical Algorithm with Model Clustering. In Proceedings of the 22nd International Conference on Advanced Information Networking and Applications—Workshops (AINA Workshops 2008), GinoWan, Japan, 25–28 March 2008; pp. 1229–1233. [Google Scholar]
Zhao, Y.; Shrivastava, A.K.; Tsui, K.L. Regularized Gaussian Mixture Model for High-Dimensional Clustering. IEEE Trans. Cybern. 2019, 49, 3677–3688. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Airports of Thailand Public Co., Ltd. Online Air Quality Monitoring Report. Available online: https://air4suvarnabhumiairport.com/data.php (accessed on 10 May 2024).
Mak, H.W.; Ng, D.C. Spatial and Socio-Classification of Traffic Pollutant Emissions and Associated Mortality Rates in High-Density Hong Kong via Improved Data Analytic Approaches. Int. J. Environ. Res. Public Health 2021, 18, 6532. [Google Scholar] [CrossRef]
Thailand, A. Re-Designation of Runway 01R/19L and 01L/19R at Suvarnabhumi International Airport (VTBS). Available online: https://aip.caat.or.th/2024-10-02/html/eSUP/VT-eSUP-24-41-A-en-GB.html (accessed on 24 January 2025).
Schäfer, M.; Strohmeier, M.; Lenders, V.; Martinovic, I.; Wilhelm, M. Bringing up Opensky: A Large-Scale Ads-B Sensor Network for Research. In Proceedings of the 13th International Symposium on Information Processing in Sensor Networks, Berlin, Germany, 15–17 April 2014; pp. 83–94. [Google Scholar]
Federal Aviation Administration. Automatic Dependent Surveillance—Broadcast (ADS-B). Available online: https://www.faa.gov/about/office_org/headquarters_offices/avs/offices/afx/afs/afs400/afs410/ads-b#:~:text=ADS-B%20Out%20works%20by,other%20aircraft%2C%20once%20per%20second (accessed on 24 January 2025).
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Xavier, G.; Antoine, B.; Yoshua, B. Deep Sparse Rectifier Neural Networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
Nair, V.; Hinton, G. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; Volume 27, pp. 807–814. [Google Scholar]
Mak, H.W.; Laughner, J.L.; Fung, J.C.; Zhu, Q.; Cohen, R.C. Improved Satellite Retrieval of Tropospheric NO₂ Column Density via Updating of Air Mass Factor (AMF): Case Study of Southern China. Remote Sens. 2018, 10, 1789. [Google Scholar] [CrossRef]
Cheng, L.; Tao, J.; Valks, P.; Yu, C.; Liu, S.; Wang, Y.; Xiong, X.; Wang, Z.; Chen, L. NO₂ Retrieval from the Environmental Trace Gases Monitoring Instrument (EMI): Preliminary Results and Intercomparison with OMI and TROPOMI. Remote Sens. 2019, 11, 3017. [Google Scholar] [CrossRef]
Clerbaux, C.; Hadji-Lazaro, J.; Payan, S.; Camy-Peyret, C.; Wang, J.; Edwards, D.P.; Luo, M. Retrieval of Co from Nadir Remote-Sensing Measurements in the Infrared by Use of Four Different Inversion Algorithms. Appl. Opt. 2002, 41, 7068–7078. [Google Scholar] [CrossRef] [PubMed]
Kamsing, P.; Cao, C.; Zhao, Y.; Boonpook, W.; Tantiparimongkol, L.; Boonsrimuang, P. Joint Iterative Satellite Pose Estimation and Particle Swarm Optimization. Appl. Sci. 2025, 15, 2166. [Google Scholar]
Fu, D.; Han, S.; Li, W.; Lin, H. The Pose Estimation of the Aircraft on the Airport Surface Based on the Contour Features. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 817–826. [Google Scholar] [CrossRef]

Figure 1. Locations of AQMS-NORTH and AQMS-SOUTH [36].

Figure 2. AQMS of (a) NORTH and (b) SOUTH [36].

Figure 3. Runways of the Suvarnabhumi airport [38].

Figure 4. Example of data downloaded.

Figure 5. Workflow of the experiment.

Figure 6. Regression architecture implementation.

Figure 7. MSEs for epochs of the regression model for (a) CO, (b) NO₂, (c) PM_2.5, and (d) PM₁₀.

Figure 8. Mean of SHAP value for (a) CO regression model, (b) NO regression model, (c) PM_2.5 regression model, and (d) PM₁₀ regression model.

Table 1. Silhouette score and number of callsigns resulting from two-type clustering.

	Silhouette Score	K₁	K₂	K₃	K₄
Landing clustered by K-means	0.33063	1750	2303	1752	1507
Landing clustered by GMM	0.25064	453	121	2403	4335
Take-off clustered by K-means	0.35264	1405	1029	1004	1444
Take-off clustered by GMM	0.27440	2254	538	570	1520

Table 2. The lowest MSE of training regression model for each type of air pollutant.

	MSE
CO (ppm)	51.7622
NO₂ (ppb)	139.6674
${PM}_{2.5} (μ g / m^{3}$ )	53.9682
${PM}_{10} (μ g / m^{3}$ )	124.2517

Table 3. Results of three evaluation metrics tested with the evaluation dataset.

	MAE	MSE	R²
CO (ppm)	2.6373	83.7389	0.4946
NO₂ (ppb)	9.3116	149.8641	0.3339
${PM}_{2.5} (μ g / m^{3}$ )	6.5713	70.6187	0.4762
${PM}_{10} (μ g / m^{3}$ )	8.3571	124.2517	0.5594

Table 4. Feature numbers and their representation.

Feature Number	Representation
0	Month
1	Conversion of ICAO24 to integer
2	Velocity
3	Heading
4	Baro altitude
5	$T_{a c t i o n}$
6	Pattern numbers that originate from K-means
7	Pattern numbers that originate from GMM

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kamsing, P.; Cao, C.; Boonpook, W.; Boonprong, S.; Xu, M.; Boonsrimuang, P. Artificial Neural Network for Air Pollutant Concentration Predictions Based on Aircraft Trajectories over Suvarnabhumi International Airport. Atmosphere 2025, 16, 366. https://doi.org/10.3390/atmos16040366

AMA Style

Kamsing P, Cao C, Boonpook W, Boonprong S, Xu M, Boonsrimuang P. Artificial Neural Network for Air Pollutant Concentration Predictions Based on Aircraft Trajectories over Suvarnabhumi International Airport. Atmosphere. 2025; 16(4):366. https://doi.org/10.3390/atmos16040366

Chicago/Turabian Style

Kamsing, Patcharin, Chunxiang Cao, Wuttichai Boonpook, Sornkitja Boonprong, Min Xu, and Pisit Boonsrimuang. 2025. "Artificial Neural Network for Air Pollutant Concentration Predictions Based on Aircraft Trajectories over Suvarnabhumi International Airport" Atmosphere 16, no. 4: 366. https://doi.org/10.3390/atmos16040366

APA Style

Kamsing, P., Cao, C., Boonpook, W., Boonprong, S., Xu, M., & Boonsrimuang, P. (2025). Artificial Neural Network for Air Pollutant Concentration Predictions Based on Aircraft Trajectories over Suvarnabhumi International Airport. Atmosphere, 16(4), 366. https://doi.org/10.3390/atmos16040366

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Neural Network for Air Pollutant Concentration Predictions Based on Aircraft Trajectories over Suvarnabhumi International Airport

Abstract

1. Introduction

2. Related

2.1. K-Means

2.2. Gaussian Mixture Model

2.3. Silhouette Score

2.4. Evaluation Metrics

3. Materials and Methods

3.1. Pollution Emission Data

3.2. Trajectory Data

3.3. Methods

4. Results

4.1. K-Means and GMM Clustering Results

4.2. Regression Model Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI