Particulate Matter (PM 2.5 ) Concentration Forecasting through an Artificial Neural Network in Port City Environment

: This study aims to analyze maritime traffic ’ s effect on air quality through multiple regression analysis using recurrent neural networks (RNN), allowing to forecast the daily concentration of PM 2.5 . The data set used the hourly average of the pollutant concentration levels and meteorological factors from 1 May 2021, to 31 January 2022, and the entry and exit of cargo ships and petroleum tankers to the port area in the same range of dates. The regression model based on the ANN reaches an acceptable accuracy with a root-mean-squared error (RMSE) of 5.9554 and a mean absolute error (MAE) of 4.5732.


Introduction
Urbanization has brought with it the conglomeration of the population into small spaces where various anthropogenic activities are carried out. Metropolitan areas share socioeconomic activities and the demand for services increases, such as transport, housing, food, sources of employment, industry, and businesses. Therefore, the emission of pollutants into the air also increases, reducing the quality of the environment and the health of the inhabitant [1]. According to the WHO (2022), 99% of the population breathes polluted air. Particulate matter (PM) is considered one of the pollutants we should be concerned about because particles with diameters less than 2.5 µ m (PM2.5) can penetrate deep into the lungs and reach the bloodstream. The impact it has on other organs has recently been evaluated. PM2.5 has been reported to cause serious health concerns not only in long-term exposures but also in short-term exposure [2]. Studies have reported the association between respiratory problems the exposure to PM2.5 [3]. It is known to reduce the life expectancy at birth by around 1.2-1.9 years; it was observed in highly polluted areas such as Asia and Africa [4]. Moreover, PM2.5 can cause allergies in children and is associated with nasal, ocular, and skin symptoms [5]. The association between hypertension and death of cardiometabolic diseases and PM2.5 levels is greater in places where pollution concentration levels are very high [6,7] The introduction should briefly place the study in a broad context and define the purpose of the work and its significance.
Ambient particulate matter with an aerodynamic diameter of 2.5 (PM2.5) is mainly emitted from traffic combustion, road dust, soil, and mass burning [8][9][10]. The chemical composition of the particles varies according to the source and the environment in which it is produced; for instance, exhaust emissions were found to be the higher contributor to traffic PM2.5 emissions showing a diurnal variation [8]. Ports are a gateway to world trade.
It is a multifunctional area that provides services to industries and businesses, so we can find a combination of transportation that connects supplies to the main consumption centers. [11] Although the contribution of emissions from ships in a city port has not been extensively studied, there is concern about the impact they have on air quality, mainly because of the proximity that the ports have to the population [12][13][14]. However, the correlation between air pollution and port traffic has been trouble mainly for the scanty information related to port management, granting a degree of uncertainty to the correlation coefficients [15,16]. Moreover, the sample size of PM2.5 data to run the models in some harbors is not large enough to improve prediction [17].
Maritime transport emissions are mostly NOx, SOx, VOC (Volatile Organic Compounds), PM1, PM2.5, PM10, BC (Black Carbon), and some trace elements bounded in PM1 and PM2.5 [18]. However, it has been observed that the primary source of PM1 and PM2.5 are ship exhaust emissions emitted by the combustion of heavy oil [5,11,15,19]. Another PM source is dust generated by coal and iron ore handling and dispersed by wind [20]. Therefore, ship emission is established as a primary pollutant emitted directly from the source or through the precursor formation. Moreover, ship emissions are the main contributor to this formation [21]. Weather in ports plays an essential role in the dispersion and distribution of pollutants; air quality data from specific areas should be considered, including the effects of breeze circulation, the formation of the boundary layer, and the impact of humidity. Primary emissions have been reported during the cold season and show low diurnal variation between port sites and city monitoring stations, suggesting no change in the air mass [14,19,22]. Moreover, air pollution mass correlated in open sea traffic correlated with PM measures on the coast, observing the influence of air transportation [19]. Air pollution in ports has also been correlated to the size of containers as ships spend more time loading and unloading the goods, as the contents are weightier [12]. The efficiency of the port management has also been evaluated, such as port logistics, infrastructure, and policies, as well as the efforts to reduce air emissions. Some ports have adopted green technologies and energy programs, including the use of technologies that improve the operational efficiency of engines resulting in fewer air emissions [20,23,24]. This paper presents an air quality analysis with data from 9 months, with average hourly concentrations of pollutants (PM1, PM2.5, PM10, CO, and O3) and meteorological factors (temperature and relative humidity) in a port city. On the one hand, we analyze the relationship between variables considered in the study using Spearman's correlation coefficient to measure the effect of cargo ship traffic and oil tankers on the concentration of pollutants in the port area. On the other hand, a recurrent neural network (RNN) is implemented to perform a multivariate linear regression analysis to predict the daily concentration of PM2.5.

Area of Study
Tampico is one of the main ports on the east coast of Mexico, serving as imports and exports for mining products, petrochemicals, steel, minerals, agricultural bulk, and large structures, among other industrial products. It has a total of 11 berth positions with 2147 linear meters in its public terminals and specialized equipment for handling various merchandise. The Port of Tampico offers extensive regular shipping line services that connect it with more than 100 countries worldwide. The port city is part of the metropolitan area along with four other towns situated on the banks of the Panuco River. Tampico city has a population of 297,562, an annual average temperature range of 22-26 °C, and a yearly rainfall average of 900 to 1100 mm. The climate is classified as a warm sub-humid climate with rains in summer and average humidity (100%). The land use is divided into Agriculture (8%), the body of water (16%), and urban zone (45%), within an approximate area of 114.5 Km2. The site registers stronger winds from October to June, averaging 16.6 km/h. while from June to October, the wind speed decreases, recording an annual average of 12.3 km/h.

Data Collection and Analysis
Concentration levels of pollutants in the air and meteorological factors were statistically analyzed using the Kolmogorov-Smirnov Lilliefors test to determine the normality of the data. The datasets were identified as non-parametric distribution, for which it was determined to apply a Spearman correlation analysis. Spearman's correlation coefficient was used to examine the relationship between variables considered in our study. A linear regression analysis was performed using a recurrent neural network architecture for predicting the daily concentration of particulate matter (PM2.5) using an hourly average of the pollutant concentration levels, meteorological factors, and the departures and exit of ships to the port area on the same dates. Moreover, a descriptive analysis was performed by calculating the median, interquartile range (IQR), minimum and maximum values for the continuous variables of concentrations of air pollutants and meteorological factors.

Descriptive Analysis
The concentration levels obtained from the continuous monitoring of air pollutants in the port of Tampico, Tamaulipas, were analyzed to forecast the levels of fine mass in the air (PM2.5), using a database of 9 months of monitoring from 1 May 2021, to 31 January 2022. Table 1 shows the descriptive analysis of the monthly concentrations of particulate matter. According to the results, a higher concentration was observed during December 2021 for ultrafine particles (PM1) with a median (IQR) of 17.06 (8.65) µ g/m 3 , 24.56 (14.1) µ g/m 3 , for fine particles (PM2.5) and 29.35 (18.1) µ g/m 3 , for coarse particles (PM10). However, particulate matter was detected at high levels during May (2021) and January (2022). According to previous studies, PM has been reported to be higher during spring and winter [18,25]. The maximum median value was observed during December, with 34.6, 51.97, and 63.36 µ g/m 3 for PM1, PM2.5, and PM10, respectively. PM2.5 levels were below the detection limits established by Mexican normativity, considering the acute exposure, which set concentration is 45 µ g/m 3 . However, the median concentration was above the standard value of 12 µ g/m 3 reported as an exposure limit for chronic exposition (annual). Meteorological conditions showed a steady behavior of temperature and relative humidity throughout the study period, with median temperature ranging between 28 to 30 °C, observing the higher values during September, and during May temperature decrease to 20 °C. On the other hand, relative humidity (RH) fluctuates between 80% and 100%; however, during December, RH drops to 73%. Favorable weather conditions will result in a good dispersion or dilution of pollutants; these conditions can also affect air quality so emissions can fluctuate [25]. Several studies have reported a possible association between relative humidity and PM2.5, and their finding suggests that the abrupt increase in PM2.5 concentrations strongly correlated with fluctuations in relative humidity. In contrast, environments with constant humidity do not contribute to the increase in PM2.5 levels [26]. According to our results, relative humidity in Tampico was steady, and the higher PM2.5 concentration was observed during December when RH decreased to 70%.

Correlation Analysis
Spearman's analysis revealed a strong positive correlation between the smallest fraction of particles (PM1, PM2.5 y PM10), showing a correlation coefficient (r) of 1 with a pvalue less than 0.05 (see Figure 1). Moreover, a correlation between temperature and RH is strong (r = 0.82); these variables do not correlate with the concentration levels of the particulate material displaying values of r < 0.20 (see Figure 1). However, a good correlation with an r = −0.73 and r = −0.58 between RH with CO, and RH with O3, respectively. This way, temperature presents a correlation of r = −0.52 and r = 0.70 with CO and O3. In all cases, a p-value < 0.05 shows high statistical significance. Although gases (CO and O3) were not correlated with PM2.5, it is essential to consider that temperature and relative humidity, alongside other pollutants such as CO and O3, are responsible for PM2.5 formation. The correlation between gases and PM had season variation, then a temporal scale should be considered [25]. Furthermore, the correlation analysis shows a negative moderated association between PM2.5 and cargo ships (CS) with r = −0.-49 (see Figure 1). According to the inventory report, the CS has been reported to be the main PM contributor, followed by containers and tankers [21]. Moreover, there is no association between the particulate matter and the entry and exit of the petroleum tankers (−0.14 < r < −0.16). Finally, moderated negative association of cargo ships and petroleum tankers (PT)with CO was r = −0.51 and r = −0.60, respectively. Petroleum tankers and O3 were correlated with an r = 0.061.

Regression Analysis
The predictive model is evaluated with the remaining 20% of the dataset instances. The RNN predictive model obtains a RMSE value of 5.9554, indicating the concentration level of the data in the regression line has an acceptable fit, with a minimum distance from the data points of the regression line. Furthermore, the difference between the predicted and actual values is moderate, with a MAE of 4.5732, indicating that the average forecast is acceptable. Regarding the MAPE metric, a value reached 39.9516 suggests that the average difference between the predicted and current values is high. Figure 2 shows the daily prediction of the concentration of PM2.5 generated by the model based on the proposed RNN. The prediction line follows the pattern defined by the real data but with a significant lag, which causes the error metrics to be moderate. The difference between the current and predicted data could be observed to be smaller, but this is because the concentration levels of PM2.5 in the monitored area are low. These data would have a high error percentage, as shown by the value obtained in the MAPE metric. An important aspect to consider is that the relative humidity is high (close to 100%) in most of the days considered in the study, for which this variable does not support the prediction of the contaminant.

Conclusions
This study analyzed the concentration levels of particulate matter, CO, O3, and meteorological factors in the port of Tampico, Mexico. Correlation coefficient analysis confirmed a very high relationship between the three types of particulate matter. In addition, the contaminant CO presents a high negative association with relative humidity. In the case of cargo ships show a moderate negative relationship with PM1, PM2.5, PM10, and CO. The petroleum tankers have a moderately negative relationship with CO (r = −0.60). Finally, linear regression analysis generated by the RNN prediction model obtains acceptable RMSE and MAE values. However, with a high MAPE metric, the daily prediction of PM2.5 concentration should be considered with performance and accuracy moderate.