Diagnosis and Assessment of Pre-Fog in the Mainland Portuguese International Airports: Statistical and Neural Network Models Comparison

: The prediction of fog is a challenging task in operational weather forecast. Due to its dependency on small-scale processes, numerical weather models struggle to deal with under scale features, resulting in uncertainties in the fog forecast. Unawareness of the onset time and the duration of fog leads to disproportionate impact on open-air activities, especially in aviation. Nevertheless, in a small sized country such as Portugal mainland, the fog varies greatly. The trafﬁc of the two busiest Portuguese international airports of Porto and Lisbon is affected by the occurrence of fog at different times of the year. The fog occurrence at Porto is a predominant winter phenomenon and a summer one at Lisbon. Observational variables and their trend are local indicators of favouring conditions to the fog’s onset, such as cooling, water vapour saturation and turbulent mixing. A dataset corresponding to 17 years of half-hourly METAR from the airports of Porto and Lisbon is used to diagnose the pre-fog conditioning. Two diagnostic models are proposed to assess pre-fog conditions. The ﬁrst model is adapted from the statistical method proposed by Menut et al. (2014), which performs a diagnosis from key variables trend, such as temperature, wind speed and relative humidity. Thresholds are deﬁned from the METAR samples in the 6 h period prior to the formation of fog. Due to the local character of fog, the presented thresholds are the most appropriate ones for each airport. The predictability of fog is then assessed using observations. The second approach consists of neural networks such as a fully connected (FC) network and a recurrent neural network (RNN), which are especially well suited for time series. By experimenting with different types of neural networks (NN), we will try to capture the connection between the temporal evolution of measured variables in the dataset and the fog onset. These experiments will include different time windows to measure its inﬂuence on prediction performance.


Introduction
Among the weather phenomena that affect the visibility range near the Earth's surface, fog is the one that constrains human activities with most economic impact, and sometimes even jeopardizes human lives. Due to its great incidence on aviation, climatological studies and field campaigns have been focused on busy airports around the world [1][2][3][4][5][6][7].
The World Meteorological Organization defines fog as the visibility restriction due to water droplets in the lower atmosphere that reduces the horizontal visibility to less than 1 km near to the ground [8]. The knowledge of local weather conditions that culminate in fog plays a major role in operational forecast [1]. A study carried out by Policarpo et al. [9] has shown that a large artificial lake of 250 km 2 and its irrigated area become a local important source of moisture, that increases the availability of water vapour, favouring the formation of fog over the lake and its surroundings. Egli et al. [10] described the relation between the terrain characteristics, the predominant weather situations and the classification of fog patterns.
Later, Guerreiro et al. [7] studied the fog characteristics at the Portuguese international airports, the conditions prior to the onset, at large and local scales, and the classification of fog into advection, radiation, cloud base lowering, precipitation and evaporation types.
The connection between the prior conditions to the onset and the fog forecast was proposed by Menut et al. [11] using data from a single instrumental site (SIRTA laboratory), and a statistical methodology to estimate the probability of existing meteorological conditions favourable to radiative fog formation. Later, Róman-Cascón et al. [12] had applied the same methodology using data from two research facilities, one in the Netherlands (CESAR) and the other in Spain (CIBA), to evaluate the radiation fog forecast based on the observational data and Weather and Research Forecast model (WRF) output. In the present study, the same methodological basis is used to diagnosis conditions extended to all types of fog using METAR data from the international Portuguese airports.
Given that it is usually difficult to model the behaviour of weather phenomena, datadriven methods are attractive to use. NN, in particular, have two theoretical advantages over conventional statistical methods. The first advantage is that no distribution needs to be assumed; the second is that theoretically, it can approximate any smooth function. Considering the advantages, neural networks have been used in atmospheric sciences for several decades [13].
One of the first successful types of NN was the multilayer perceptron (MLP). This type of network consists of an arrangement of artificial neurons organized into layers. Neurons between layers are connected by weights and the output of each neuron is a function of the sum of the inputs, modified by a non-linear activation function. Because each neuron in a layer connects to every neuron in the following, MLPs are also usually designated as fully connected (FC) or densely connected networks. One of the main drawbacks of using an MLP is that the number of weights increases rapidly with the increasing number of neurons and the inputs vectors with higher dimensionality. This makes the adjustment of the weights (commonly called training phase) extremely difficult for problems with many variables.
The limitations of MLP prevented its successful application to image-like data. For this application, an architecture with shared weights was proposed, that performs convolutions of kernels across the input vector. This type is designated as convolutional neural network and is especially suited to explore the spatial structure of the data. This not only allowed super-human performance in classification of common images, but was used in several weather prediction tasks [14].
While both types of the presented networks are considered feedforward networks, there is also another family of networks called recurrent neural networks. These are especially adequate to deal with time sequences. In this family, there are two types which are used in the context of atmospheric sciences: Long Short-Term Memory (LSTM) layers [15,16] and Gated Recurrent Units (GRU) [17]. These units are composed of a cell and gates. The cell has an internal state that retains information over several time instants and the gates control the flow of information into and out of the unit. Other approaches, such as those proposed by Shi et al. [18], have explored methods that have both a convolutional structure and are recurrent. This makes these networks able to deal with a time series of image-like data; however, in our context we will not consider image-like data, so we will focus on FC feedforward networks and on LSTMs.
Therefore, the main goal of this study is the assessment of two extended diagnosis methods of favourable conditions for fog formation using aerodrome routine meteorological reports (METAR) data and compare both performances.
In this article, the airport's locations, the used data and the diagnosis methods are described in Section 2. In Section 3, the pre-fog conditions and the main features of fog events are presented. The two diagnosis methods are assessed using the airports METAR data and their performance is presented. In Section 4, results from both methods are discussed and compared. Finally, conclusions are presented in Section 5.

Materials
Seventeen years of half-hourly METAR data from the Portuguese mainland international airports were used to identify fog events. The dataset was subjected to quality control after retrieved from the METAR code form, which was generated using observational practices following the WMO Manual on Codes, Volume I, Part A [9]. From January 2002 to December 2018, the data availability is 99.68% for Porto, 99.9% for Lisbon, and 99.92% for Faro.
Fog occurrences are identified in the dataset by gathering observations of prevailing horizontal visibility less than 1000 m, associated with the report of significant weather phenomenon as fog, in the site or in the vicinity of the airport. The results presented by Guerreiro et al. [7] have shown that fog seldom occurs at Faro. In the period of 2002-2018, the daily fog occurrences were less than 1% (0.97%), while in Porto and Lisbon were 9.68% and 5.15%, respectively. Therefore, Faro airport is discarded from this study, and only the 2019 data from Porto and Lisbon will be used in the methods' assessment.

Forecast Score
At both airports, the classification of fog into precipitation, advection, cloud base lowering, radiation, and evaporation types was performed by Guerreiro et al. [7]. Each type associates an observed parameter to a primary mechanism that triggers the formation of fog. From the METAR data, the wind drives the advection and the turbulent mixing, decreasing of temperature measures the cooling rate, and dew point is used to compute relative humidity, that quantifies the amount of water vapor available for condensation.
The departure point of the diagnosis method is the statistical method proposed by Menut et al. [11] used to estimate the probability to detect observed radiation fog events. The forecast score computed using meteorological key parameters reflects the relationship between the meteorological conditions and the formation of fog. Since the method is focused on the detection of radiation fog, the pre-fog conditions are characterised by the key variables of relative humidity measured at 2 m height, 2 m temperature tendency, 10 m wind speed and infrared radiation budget. The METAR does not include infrared radiation data. Therefore, in this study, the key variables are relative humidity (RH), 3 h temperature tendency (∆T), and wind speed U(t), regarding any type of fog.
The forecast scores α kv , between 0 and 1, and they are dependent on the distance between the observed values of the key variables and the respective thresholds, following a Gaussian distribution [12]. These reference values are set by the key variables average in the fog events period. The scores are computed as where kv are the key variables and th kv the respective thresholds. At a specific observed moment t, the forecast scores α kv are computed regarding the relative humidity RH(t), the wind speed U(t), and the 3-h temperature tendency The diagnosis of the fog formation is finally computed as For α ≥ 0.9, the forecast score method states that meteorological conditions, at time t, will favour the formation of fog, later in the following 6-h period (t + 6 h), designated as pre-fog period as well [11,12]. The forecast score is computed for each time step in the pre-fog period of the fog events detected in 2019. Since α diagnoses the formation of fog up to six hours from each time step, it means that its performance varies from the moment of the fog onset, the worst estimation, up to six hours prior to the onset, the best estimation. Therefore, when the estimates are α ≥ 0.9, we consider that a pre-fog episode is present; if α ≤ 0.9, then we consider that there are no pre-fog conditions.

Neural Networks
Neural networks offer the theoretical advantage of not assuming any a priori information about the statistical distribution of the input data. Nonetheless, the performance of a given network depends on the type of data and on correctly formatting the data. One of the factors is whether the input is a single data point (e.g., one image at a time) or a temporal sequence of data.
The dataset consists of sequential observations of different variables, therefore, we decided to use a network based on LSTM units, as presented in [19], to explore the time dependence. To assess if there was really any advantage, we compared the performance with a neural network with similar complexity but based on FC neurons. We denote the input data for the time instant t as are the different features in that time instant. As presented in Figure 1a, the input vectors for each time instant are supplied sequentially to the LSTM-based NN to create an estimateŷ t=t . This estimate is whether there will be fog onset in the interval t = t up to t = t + 6 h. For the case of the FC NN, all the input data is stacked and provided to the NN in one shot, as depicted in Figure 1b For α ≥ 0.9, the forecast score method states that meteorological conditions, at time t, will favour the formation of fog, later in the following 6-h period (t + 6h), designated as pre-fog period as well [11,12]. The forecast score is computed for each time step in the prefog period of the fog events detected in 2019. Since α diagnoses the formation of fog up to six hours from each time step, it means that its performance varies from the moment of the fog onset, the worst estimation, up to six hours prior to the onset, the best estimation. Therefore, when the estimates are α ≥ 0.9, we consider that a pre-fog episode is present; if α ≤ 0.9, then we consider that there are no pre-fog conditions.

Neural Networks
Neural networks offer the theoretical advantage of not assuming any a priori information about the statistical distribution of the input data. Nonetheless, the performance of a given network depends on the type of data and on correctly formatting the data. One of the factors is whether the input is a single data point (e.g., one image at a time) or a temporal sequence of data.
The dataset consists of sequential observations of different variables, therefore, we decided to use a network based on LSTM units, as presented in [19], to explore the time dependence. To assess if there was really any advantage, we compared the performance with a neural network with similar complexity but based on FC neurons. We denote the input data for the time instant ′ as = … , where , up to are the different features in that time instant. As presented in Figure 1a, the input vectors for each time instant are supplied sequentially to the LSTM-based NN to create an estimate . This estimate is whether there will be fog onset in the interval = ′ up to = + 6 ℎ . For the case of the FC NN, all the input data is stacked and provided to the NN in one shot, as depicted in Figure  1b.  Another factor is the range of values assumed by the input, since NN are usually designed to deal with normalised data. Additionally, the visibility features (one of the most important in our problem) have most of the occurrences, with very high values and occasionally decrease one order of magnitude, which correspond to fog. In general, the normalisation approach that was followed consists of a subtraction of the average value Lastly, despite not explicitly selecting key variables as in the previous section, some caution must be devoted to curate the data before providing it to the network. For instance, the dataset had 25 parameters, but we combined them into 13. These combinations consist of transforming some features into new ones which are more meaningful to the fog occurrence. For instance, we combined the month and day of the month into only one variable with a periodic behavior by computing.
x time o f the year = 2π 12 cos month + day o f the month number o f days in the month .
The main idea of this operation is that, for example, the last days of December will have a similar value to the first days of January, and thus are expected to have a similar contribution to the prediction of weather phenomena. We did a similar operation with the hours and minutes, creating a signal with a period corresponding to 24 h.
x time o f the day = cos (60 × hour + minutes) × 2 π 1440 (4) We also transformed the wind intensity and direction into two components, one aligned with North-South, and another aligned with East-West. Lastly, we did not consider the data regarding cloud layers above the lowest layer.
After the data curation, a given network configuration must be designed. The main building block for the recurrent network is the LSTM cell. This cell contains a cell state c, receives an input z and computes an outputŷ. The LSTM cell also contains gates that allow or block the flow of information into and out of the cell and the update of the cell state. The amount of information allowed by input gate i, output gate i and forget gate f at time instant t is defined as and where W represent weight matrices that connect two quantities (e.g., W ZO connects the input of the cell with the output gate) and b correspond to bias. With this building block, one can build arbitrarily complex network, with several cells in parallel and multiple consecutive layers. However, to assess the impact of the network design choices, we started with a simple network with only two LSTM layers. The first layer contains several cells in parallel and receives the input X t . The second layer only has one cell, receives the output of the first one and calculates the network's final output. We also compared the resulted obtained with an FC layer, which also only had two layers and the same number of neurons as the number of cells in the LSTM-based network. As mentioned previously, the FC network deals with a limited number of time instants. Additionally, during training, the LSTM-based network is presented with multiple samples, each one comprising observed features from several time instants. We have designated this amount as time horizon. To understand, the impact of the time horizon on the performance, we have also tested several values.
The task considered in this work is to predict the first fog observation (fog onset) in the next six hours. Therefore, for the training of the neural network, we have framed this as a classification problem. With this framework, at training time, we not only provide the observations X from METARs but also the Y, which is a binary value. We have used a common loss function for this type of problem, the binary cross-entropy.

Performance Indicators
Using classical forecast score parameters, such as hit rate, false alarm, miss and correct reject, the model performance is evaluated from the following contingence table (Table 1), according to the model output and the fog observations. To simplify the interpretation of the results, we normalise the previously mentioned score parameters. Thus, we will present hit rate, which corresponds to the frequency of correct prediction of fog over the total number of fog occurrences, false alarm rate, which corresponds to the frequency of incorrect fog prediction over the total number negative samples (no observed fog). We also present miss rate and true negative rate, which are the complementary values of hit rate and false alarm rate, respectively.

Results
Results obtained for both methods were obtained by using the METAR data corresponding to the period from 2002 to 2018 as training samples. Additionally, both methods were also applied in two different locations independently: Porto and Lisbon.
After the training stage, both the forecast score method and NN-based method were tested in METAR data gathered during 2019. Lisbon's dataset contains 17,519 samples, while there are two fewer samples in Porto's dataset. From the fog observations, 56 episodes were identified in Porto and 25 in Lisbon. Since our goal is to identify the pre-fog conditions, we want to classify as 1, each of the six hours before a fog episode. All other estimates outside the pre-fog period (fog episodes observations and no fog observations) should be classified as 0. This yields 663 ground truth samples classified as 1 and the remainder classified as 0.

Forecast Score
In the forecast score method, the diagnosis of fog starts after the third hour from the first observation due to the 3 h temperature tendency. Therefore, the estimation of pre-fog, as well as the absence of fog, is performed over 17,511 observations in Porto and 17,513 in Lisbon.
The following contingency table (Table 2) reflects the performance of the forecast score method through the indicators from Table 1. The method's performance at Porto presents a HR of 17.19%, and 5.57% at Lisbon. At both airports, the FAR is considerably low, being residual at Lisbon (0.84%), due to many true negative diagnoses. The MR indicator reveals the method's weakness of forecasting fog, with 82.8% at Porto and 94.43% at Lisbon.

Neural Network-Based Method
The evaluation of the NN-based method had several objectives. The first was to assess the benefits of using an LSTM over an FC NN. The second was to measure the impact of using input sequences of variable length. Thirdly, we wanted to determine the sensitivity of the NN to the different input variables (observations in the METAR corresponding to temperature, dew point and visibility). Lastly, we wanted to compare with the forecast score method.
In Table 3, we present the cross-entropy loss as well as other that was obtained for each configuration for both Porto and Lisbon airports. Each of the network configurations were fed with sequences of input data of variable length (6,12,24,48 and 72 time instants). As shown in Table 3, LSTM-based network achieved higher HR than the fully connected network for both locations. Another interesting aspect is that the performance depends on the number of past METAR observations (sequence length) which are provided to the NN. The results show that the sequence length that produces best results is an intermediate value. This value was 24-time instants for the LSTM-based NN.

Discussion
The low HR of 17.19% for Porto and 5.57% for Lisbon are followed by an MR of 82.8% and 94.43%, respectively. Despite of low FAR (6.34% and 0.84%), the method is strongly penalized by high MR values These results are far from the diagnosis effectiveness of 87% obtained by Menut et al. [11] and Róman-Cascón et al. [12] in radiation fog forecasting. The diagnosis method applied to the airport's observational data is constrained by the meteorological variables provided by METAR, where the relative humidity, the wind speed and the 3 h temperature tendency are the key variables.
The results obtained with the NN-based methods showed that the recurrent network (based on LSTM) achieved a better result than the FC network. This indicates that the recurrent network is more suited to problems with sequential data, where there is a temporal dependence between the data. While the FC network also receives the same amount of data, its performance suffers as more time instants are considered. When 72-time instants are used, the number of weights in the FC network increases significantly and becomes harder to train the NN. For the case of the LSTM, the performance decreases slightly. We believe that this was caused mainly by the difficulty of propagating the error over such a long sequence (during the training stage) and not because of an increase in the number of weights.
One aspect highlighted by the results of the NNs is that there is an optimal value for the sequence length. This value was 24-time instants for the LSTM-based NN. This value corresponds to approximately 12 h of observations, which is a time interval relevant to the formation of fog.
When considering the HR, FAR, MR and TNR obtained by the neural networks, the results are encouraging, especially if we consider that the FAR is relatively low. Even though the HR is not very high (and consequently the MR is higher than what would be desirable), we must consider that for each fog episode there are hourly estimates for a period of six hours before the episode.

Conclusions
The forecast score's method evaluation shows that the performed diagnosis of the pre-fog period barely captures the pre-fog conditions that favour the formation of fog. The METAR's key variables are unable to explicit the primary mechanisms of the fog onset.
The NN-based methods showed an interesting performance, outperforming the forecast score's method. These results are especially encouraging because very simple networks were considered, with only two layers (FC and LSTM). The results that were obtained also showed that the recurrent networks are preferable, over fully connected, to deal with this type of problem. In contrast, NN's results also have some limitations. Much more combinations of layers, architectures and parameters should be considered in the future to fully characterise the potential of NN to perform predictions based on METAR data. This characterisation should also include an ablation study to determine the sensitivity of the network to different parameters such as temperature or humidity.