1. Introduction
Prediction is one of the basic goals of Data Mining [
1], and weather forecasting plays a significant role in meteorology [
2]. Artificial Neural Networks (ANNs) belong to non-linear and non-parametric tools for modeling actual processes and phenomena, i.e., problems that are difficult to solve using classical methods. ANNs are widely used in engineering practice with the possibility of their effective modeling in software. They are an extremely useful alternative to traditional statistical modeling techniques in many scientific disciplines [
3].
This makes it possible to use neural networks widely, not only in research on brain functions but further to analyze data in areas as diverse as economics [
4,
5,
6], automation [
7], the energy industry [
8,
9,
10,
11], the natural sciences [
12], and medicine [
13,
14]. ANNs are a tool used in machine learning. They have great possibilities for recording and presenting complex relationships between input and output data [
15].
ANNs are parallel computational models, comprising interconnected adaptive data processing units. The adaptive nature of networks, where ”learning by example replaces programming”, makes ANN techniques widely used to solve highly non-linear phenomena [
16]. The advantage of neural networks is that they can represent both linear and non-linear relationships that exist between data.
1.1. The Essence, Significance, and Complexity of Weather Forecasting
Weather forecasting is the use of science and technology to predict the state of the atmosphere and the associated meteorological phenomena, concerning a specific area and time period. Weather forecasts are performed based on scientific knowledge of atmospheric processes and historical quantitative data as typical meteorological conditions. The chaotic and complex nature of the atmosphere, the enormous computing power required by computers to solve atmospheric equations, and an incomplete understanding of atmospheric processes make predictions less accurate as the range of timely predictions increases [
17].
Meteorologists use different methods to forecast the weather. Lewis Fry Richardson proposed the possibility of creating the first numerical weather prediction models in 1922 [
18]. The practical use of numerical models began in 1955 due to the development of programmable electronic computers [
19]. Data for forecasting come from observations of atmospheric pressure, temperature, wind speed, wind direction, humidity, and precipitation. Trained observers make these observations close to the ground surface or automatic weather stations are used for this purpose [
16].
At present, it is common and established to use numerical models for weather forecasting, which are uncompetitive but still not sufficiently effective at this stage of research regarding weather forecast models [
20]. The basic condition for the usefulness of a weather forecast is its high verifiability. Users expect accurate local forecasts, broken down into hourly data and a very dense grid, i.e., for specific geographical points.
Weather forecasting is one of the essential tools for planning, risk management, and decision making in sectors of the economy and everyday life. The sectors exposed to weather risk are energy, agriculture, food industry, construction, entertainment and tourism, transport, and defense—together, the lion’s share of the national economy [
21,
22].
Therefore, despite the colossal costs generated by weather forecasting, technological infrastructure for testing alternative methods and creating better forecasting models is still being developed. These expenses are justified by ensuring the safety of people and their property from natural disasters. However, precise weather forecasting is extremely difficult. Weather is a chaotic phenomenon, characterized by temporal and spatial irregularity in which successive states vary and are difficult to predict [
23].
New technologies can change this situation. The world of meteorology is exploring Artificial Neural Networks for the generation of non-linear and non-parametric tools for modeling processes and genuine phenomena, i.e., problems that are difficult to solve with classical methods. Such networks have some brain properties. They learn from examples and apply this knowledge to problem solving, i.e., they can generalize. They are useful for tasks that are not very precisely defined formally.
They can function properly at a certain level of damage and despite partially incorrect input. They also have a relatively high speed of operation (information processing). Their current level of development means that they are not yet competitive with numerical models. However, their speed is predicted to compete with standard supercomputers, whose current speed reaches
elementary operations per second [
24,
25,
26].
Currently, models based on machine learning using Artificial Neural Networks and based on historical data are being tested. According to global reports, these models produce forecast results that far exceed the precision and efficiency of numerical models based on current weather data. These models improve the forecasting efficiency by 40–50% for air temperature and 129–169% for precipitation [
27].
1.2. Research Gap
To find and implement an efficient solution with low computational complexity for predicting selected weather indicators while ensuring high verifiability of forecasts it is necessary to maintain a balance between the level of complexity of the model architecture and the type and number of methods used to improve the accuracy. The verifiability of modern numerical weather prediction (NWP) models is good; however, it requires considerable computing resources. This is important for making probabilistic beam forecasts, in which the models repeatedly (several dozen times) generate the course of future meteorological conditions. Models using machine learning may prove less demanding in this respect [
28].
The question was, therefore, asked whether Artificial Neural Networks can be applied to weather forecasting—a phenomenon that is extremely complicated and chaotic in nature. Achieving this goal was quite a challenge as the model was planned to be applied in difficult forecasting conditions, which are characterized by high variability, closely related to the specificity of the climate occurring in Poland. In addition, the city of Szczecin is an area with a specific climate, which causes difficulties in effective weather forecasting, which is described in the following part of the study.
1.3. Aim of the Study
The purpose of this study was to obtain the basis for the statement that a multilayer Artificial Neural Network (or, to be more precise, a Multilayer Perceptron), even one with an uncomplicated structure and implementation, can effectively predict basic weather conditions and whether the MLP model can be used as an adjunctive and corrective tool for forecasting weather conditions at the local scale. We intended to create a unidirectional multilayer Artificial Neural Network model trained on a regional dataset and used to forecast selected weather conditions for the city of Szczecin (Poland) and then to compare the effectiveness of model predictions with forecasts available in the archives of the internet weather service using statistical metrics selected based on a literature review.
We assumed that, with a reliable and sufficiently large weather dataset; properly constructed, established, and proven neural network model; and properly prepared computer environment it would be possible to effectively forecast selected weather indicators. This survey also covered the practical contribution, which is the building of an MLP model for a specific location and characterized by high climatic complexity affecting the difficulty of weather forecasting, which is significant from a pragmatic point of view.
The rest of the report is organized as follows:
Section 2 contains a description of the materials and methods used in the study.
Section 3 introduces the results of the study. A discussion of the results obtained and the influencing factors are given in
Section 4. In
Section 5, we present the conclusions and future directions of our workshop.
2. Literature Review
Our literature review contains studies of machine learning methods and their applicability to weather data along with their relevant statistical properties. Numerical Weather Prediction (NWP) models play a key role in operational weather forecasting, especially for longer prediction times. Utilizing NWP with deep learning to improve the accuracy of weather forecasting systems is a fruitful avenue to consider. An article published in 2020 presents results from the Ensemble of Spatial-Temporal Attention Network and Multi-Layer Perceptron (E-STAN-MLP) for forecasting meteorological elements to predict the surface temperature, humidity, wind speed, and direction at 24 automatic meteorological stations in Beijing.
This research can be generalized to local weather prediction in other regions [
29]. Multilayer Perceptrons (MLPs) belong to the common type of feed-forward networks used for the future predictions of rainfall and temperature [
30]. MLP has been continuously used for many years to solve problems, such as the prediction of weather conditions. MLP is important for real-life problems [
31]. In the study described in the article, in 2020, it was used as a predictive model for wind speed forecasting (wind power) in Villonaco alongside a Long Short-Term Memory (LSTM) and Convolutional Neural Network model (CNN) [
32].
The continued popularity and usefulness of the MLP is also demonstrated by its use as a prediction model for time series of meteorological tsunamis next to Evolved Radial Basis Function (ERBF) in Evolved Neural Networks (ENN) in 2020 for Accurate Meteorological Forecasting Applications in Vietnam [
33]. In a paper published in 2021, F. Dupuy and others presented an application of the ANN technique in the form of an MLP model as a tool to correct and complement a numerical model for forecasting the wind speed and direction at the local scale, including the Cadarache valley, which is localized in southeastern France).
As a measure of network accuracy, the Mean Absolute Error (MAE) was taken [
34]. The satisfactory results obtained by the authors confirmed the usefulness of this tool in weather condition prediction. A Multilayer Perceptron has been successfully used as a predictive model for various natural conditions, such as soil temperature [
35], compared to the Support Vector Machine (SVM) for rainfall prediction [
36,
37]. This recent literature showed that the use of a Multilayer Perceptron for predicting weather conditions was confirmed and, thus, indicated the possibility of its further use.
To research weather condition predictions performed in Poland and worldwide as well as the methods used, we reviewed the literature. The subject of the work by M. Hayati and Z. Mohebi from 2007 was the application of an Artificial Neural Network (Multilayer Perceptron) in forecasting the mean temperature for the next day for Kermanshah in Iran [
15]. The authors of the study trained and tested the MLP model using previous meteorological data. The chosen weather data were divided into two randomly selected groups: the training group, corresponding to 67% of the patterns, and the test group, corresponding to 33% of the patterns.
was used as a measure of the network accuracy. The training data were selected as the mean temperature, wind speed, humidity, pressure, and sunshine. The dataset was normalized by converting its values to a range between −1 and 1.
In a paper from 2009, Y. Radhika and M. Sashi projected the maximum temperature for the next day [
38]. The MLP prediction was compared with SVM. For MLP, an algorithm of backpropagation was used. A training dataset from 5 years and test data from 1 year were used. In the case of SVM, the Radial Basis Function Kernel (RBF), which is a popular kernel function used in various kernelized learning algorithm functions, was used.
There are three layers in the ML: input, hidden, and output, as well as an algorithm for the backpropagation of errors. A sigmoid function was selected as the activation function. Before the test, the data were normalized, and the measure of the accuracy of the models was the Mean Squared Error MSE. In this study, SVM performed better than MLP. A comparison of the results of air temperature forecasting using a MLP neural network model for prediction using the SVM model is the subject of many studies [
39]. This shows that models with such structures are useful in forecasting weather variables.
In the largest of the studies available in the literature, the most frequently predicted conditions were the mean, minimum, and maximum temperature. The research concerned a specific place in the world. S. S. Baboo and K. Shereef, in their work from 2010, used an Artificial Neural Network with a backpropagation algorithm to predict the mean temperature [
40], as did M. Hossain and others in a study conducted in 2015 [
41].
Among the studies using Artificial Neural Networks for the prediction of the mean temperature, we paid attention to the work of an author from Poland, I. Białobrzewski from 2005 [
42]. Another study on temperature prediction using an Artificial Neural Network model is presented in the article [
43]. By far, the largest number of studies described mean temperature predictions using ANNs, and one of the most frequently used network learning algorithms was the backpropagation algorithm [
44,
45,
46,
47].
In the literature, there were additionally studies regarding forecasting the maximum temperature using ANNs [
48,
49] as well as the minimum temperature [
50]. This problem is discussed in an article regarding the prediction of the maximum and minimum temperature with linear regression [
51].
A study comparing the effectiveness of the Support Vector Machine Regressor (SVR) and
k-Nearest Neighbors in predicting the wind power used in renewable electricity production is presented in the article [
52]. Wind speed is also the subject of research presented in the Polish study [
53]. This study predicted wind speed as one determinant of the energy consumption in buildings. The tests were carried out on neural network models based on Multilayer Perceptron architecture (MLP), Generalized Regression Networks (GRNN), and networks with Radial Base Functions (RBFs).
Another study with the aim of determining the annual mean wind speed by using a Multilayer Perceptron and backpropagation algorithm for training the network in foreign literature [
54], showed the popular use of this network model for wind speed forecasting. Many types of neural network models can be used in forecasting weather conditions.
For this purpose, networks with relatively low complexity, such as MLP, and networks with a more complex architecture, i.e., deep networks [
55,
56], Recurrence Neural Networks (RNN), Conditional Restricted Boltzmann Machine models (CRBM), and Convolutional Neural Networks (CNN) are suitable [
57]. The application of a gated recurrent unit neural network (GRUNN) modification of RNNs to forecast wind power, which is one of the largest renewable energy sources, is described in a paper published in 2019 by M. Ding and others [
58].
LSTM network models for predicting the precipitation based on meteorological data from 2008 to 2018 in Jingdezhen City were deployed in the research described by J. Kang and others in 2020. LSTM is a special kind of RNN, capable of learning long-term dependence. It was introduced by Hochreiter and Schmidhuber [
59]. RNNs are special ANNs that are connected in a feedback structure between units of an individual layer. They are called recurrent because they perform the same operation on all elements in the sequence.
They make it possible to model data with time-series characteristics by complementing the limits of non-recurrent ANNs, which independently assume the relationship between inputs [
60]. In this article, a study involving temperature prediction published by B. Kwon and others in 2020 is presented. The difference between LSTM and RNN is that it adds a ”processor”, which is called the cell state, to the algorithm to judge whether the information is useful or not. A network where information enters the LSTM can be judged by rules. Only the information that accords with the authentication will be left behind, and the discrepant information is forgotten by the forget gate [
61].
In addition to ANNs, many other approaches are frequently used in weather forecasting, such as multiple regression, SVM, decision trees, and the k-Nearest Neighbor model. The disadvantage of MLP and deep learning models over other methods is that they are time-consuming during the training process. Although SVM requires intensive training and experimentation on different kernel functions as well as other parameters, it is significantly faster compared to MLP and deep learning models [
62].
The SVM algorithm was developed by Vapnik and is based on statistical learning theory [
63]. SVM includes efficient algorithms for a wide range of regression problems because they not only take into account the approximation of the error to the data but also provide a generalization of the model—that is, its ability to improve the prediction of the data when a new data set is evaluated by it [
39].
ANNs, including Multilayer Perceptron, deep ANNs, and other machine learning models, are constantly being improved and widely used for the forecasting of air temperature [
64], rainfall [
36,
37,
65], cloudiness [
66], and wind speed [
67], which proves that these are forward-looking models that are worthy of constant research and improvement for forecasting purposes. To summarize, the most frequently forecasted condition in the above works was temperature; several studies have similarly found the use of models for wind speed prediction, and the most frequently used machine learning model for this purpose was a multilayer Artificial Neural Network. Other models chosen by researchers include
The most commonly used learning algorithm for neural networks is the backpropagation algorithm [
2]. Datasets are usually large and contain data from many previous years—optimally 10 years. Before starting the training and prediction, the data are normalized. The authors typically verify the accuracy of the model predictions using measures, such as MAE and MSE.
The research results presented in the analyzed papers motivated us to implement a Multilayer Perceptron as the prognostic model and to investigate its effectiveness using, among others, MAE and the Mean Squared Error (MSE) as measures of the model effectiveness. As the network learning algorithm, the backpropagation algorithm was chosen. We decided that the dataset would contain data from 10 years. the maximum and minimum temperatures, atmospheric pressure, wind speed, and daily precipitation for the next day were planned as the forecast conditions.
2.1. The Complexity of Weather Forecasting in Poland
In Poland, the Institute of Meteorology and Water Management is statutorily responsible for preparing weather forecasts, using numerical models. Their verifiability is on average: for short-term forecasts—90–95%, and for medium-term forecasts—70–75%. On the other hand, due to the different methods of preparing and presenting long-term forecasts, it is difficult to discuss their verifiability, as these forecasts rather show the trend of thermal changes or the probability of precipitation. The analysis of long-term weather forecasts in terms of their verifiability is not currently being carried out (
https://forum.mazury.info.pl/viewtopic.php?t=15409 accessed on 23 November 2020).
The nature of the weather in Poland was described by the atmosphere physicist Prof. Teodor Kopcewicz: ”Poland lies in a zone of moderate climate, with unmoderated weather changes”. Poland is one of the more difficult places in the world to forecast the weather. This is because of Poland’s climate, which is characterized by high weather variability and significant fluctuations in the course of the seasons in subsequent years. The physical and geographical location of the country means that various masses of air crash over its area, which influences the weather and the climate of Poland.
Frequently moving atmospheric fronts, attrition, and the exchange of various air masses (hot and cold) cause the weather to change frequently and creating great problems with weather forecasting [
68]. We can find confirmation of the difficulties in inaccurate weather forecasting in Poland in a recent paper that draws attention to the seasonality of the weather in Poland throughout the year [
69].
The atmospheric circulation in this part of Europe is characterized by relatively high annual variability, which causes significant temperature and precipitation fluctuations during the year [
70]. There is a paper presenting a workshop on the occurrence of tornadoes in Poland, which confirmed the wide spectrum of weather phenomena observed in Poland, which makes it difficult to accurately predict them [
71].
2.2. The Specific Climate of Szczecin and Its Influence on Weather
The weather of Szczecin is reflected in the specific climate of Szczecin’s Climatic Land (VI and X), which is influenced by its location near the sea, many lakes, a large river basin, landforms, large forest areas, parks and meadows, street greenery, and relief (valleys and hills as illustrated on the Map in
Figure 1). The climate of Szczecin and its surroundings is shaped primarily by the advection of polar and sea air masses. The proximity of large water reservoirs, i.e., the Baltic Sea and the Szczecin Lagoon, results in the formation of local breeze circulation affecting the course of the weather.
The Baltic Sea and the Szczecinski Lagoon have a warming effect in the winter, and cause cooling in the summer. Important climatic factors include the latitude, terrain, and elevation above sea level. From the southwest to the northeast, through the center of the province extends the front moraine shaft, which clearly differentiates the spatial distribution of sunshine, temperature, precipitation, and wind speed on its northwestern and southeastern side. The main part of the baric systems moves from western directions. The shifting lowers with atmospheric fronts, which causes weather changes and strong and stormy winds. Spring and summer blades, although many, are less active and strong storm and glare winds.
The Skagerrad lowers formations because of a wave disorder on a front approaching Norway—they typically move in a southeastern direction. Deepening rapidly, they cause storm weather. These occur most often in the winter and spring. Atmospheric circulation is formed under the influence of the Icelandic lower (especially in winter) and the Azores higher (mainly in summer). The climatic conditions in winter are significantly influenced by the strong seasonal Siberian High. Due to the northwestern extension, time differences between the Baltic coast and the southern ends of Poland are clearly visible. In summer, the day is more than an hour longer than in the uplands of southern Poland. In winter, the day lasts equally as long. The shift of about 40 min likewise occurs between the eastern and western parts of the country [
72,
73].
Due to the described specificity of the climate in the area of the city of Szczecin, which causes difficulties in exact weather forecasting, the authors of the work decided that it is worth checking how the task of prediction of selected weather conditions for this area will perform with a MLP model, whose use for the forecasting of selected weather conditions was presented in many research articles cited in the literature review.
The location where the study was performed and the number of different predicted weather conditions included in the study are novel compared to the previous studies described in the literature that used machine learning (ML) methods. It is also a new and interesting approach of the authors to study the performance of the implemented MLP model for multiple different weather conditions, whereas the reviewed publications on this issue usually presented studies on a single weather parameter.
3. Materials and Methods
The Multilayer Perceptron (MLP) is a popular and commonly used model among neural networks. The literature study presented in the previous section enabled us to identify many papers from recent years that confirmed the actuality, usefulness, and popularity of using this model in forecasting various weather conditions for different locations around the world. This type of neural network is referred to as a supervised network [
15].
Supervised learning comprises training the model using a training dataset, i.e., a set of samples containing known expected output signals. One type of supervised learning used in forecasting results with continuous values is regressive analysis. The model described contains explanatory (training) and explained (predicted) variables. Both types of variables are continuous values. The purpose of this type of network is to create a model that correctly maps the input data to the output data using historical data so that the model can then predict the output data when the desired output data is not known [
15]. It enables detecting relationships between variables and predicting future results [
74,
75].
A multilayer neural network can approximate any function with continuous values between input and output vectors of data by picking the appropriate set of weights. The described properties of neural networks allow solving the problems of forecasting phenomena occurring in the natural world based on collected historical data, such as weather phenomena. The fundamentals of Artificial Neural Networks and the Multilayer Perceptrons with descriptions of their structures and explanations of the principles of operations, including mathematical formulae, are contained in
Appendix A.
3.1. Overview of the Available Datasets and Selection of a Training Dataset
The dataset from the meteorological station with code 205 located at Szczecin was taken from the archives of Institute of Meteorology and Water Management (IMGW) (
https://dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/dobowe/synop/ accessed on 27 April 2021). This source of meteorological data was considered reliable due to the status of IMGW as a state research unit.The real-time forecasting of meteorological phenomena is one of the basic tasks of modern meteorological services. In Poland, official weather forecasts are prepared and developed by the Institute of Meteorology and Water Management-National Research Institute (IMiGW-PIB)—a state unit supervised by the minister in charge of water management. The mission of IMiGW-PIB is to inform society and organizations about the weather—meteorological and hydrological, climate change, and all factors influencing the current weather in Poland.
The major task of the IMGW-PIB is to provide meteorological cover for Poland. For this purpose, the IMGW-PIB Monitor was created—a service for all national operational services and administrative bodies. As part of its statutory activities, the IMGW-PIB prepares and delivers meteorological forecasts, warnings against dangerous phenomena occurring in the atmosphere, and dedicated announcements and bulletins. IMGW is responsible for collecting, storing, processing, and making available domestic and foreign measurements and observation materials.
The Institute develops and distributes weather forecasts and warnings. IMGW is a member of many international organizations. It represents Poland on the forum of WMO (World Meteorological Organisation) or Eumetsat (European Organisation for the Exploitation of Meteorological Satellites). IMGW stores, and makes available in its database, measurement and observation data collected since 1960 for 2112 measuring stations in Poland (
https://danepubliczne.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/ accessed on 27 April 2021).
A review of the literature on the use of multilayer neural networks in the prediction of selected weather conditions demonstrated that, for many studies, ten years of previous meteorological data were used [
74,
75]. We, therefore, decided to use meteorological data from 10 years (2011–2020). In this study, daily data from 7 years (2011–2017) were used as a training dataset with 10 training features for the 3 days before the day for which the forecast was performed.
The test dataset included data from 3 years (2018–2020). One sample contained selected weather conditions for each day. Due to the many samples in the dataset, we chose to apply a simple validation (train/test split) to check the effectiveness of the model. The dimensions of the datasets used for model training and testing are presented in
Table 1.
The ten features selected to create a training dataset, together with their respective units, are listed in the
Table 2.
These variables from 1, 2, and 3 days before the prediction date were the inputs for the MLP model. The time horizon of 3–5 days before the day for the forecast was used, among others, by Rasp and others in a study published in 2020 [
76].
Table 3 contains the conditions planned for forecasting using the MLP model in this study. They are the outputs of the MLP model.
We selected five weather conditions that are interesting for people from the point of view of watching the weather forecast and are most often provided in weather forecasts, i.e., the maximum and minimum temperature, pressure, wind speed, and precipitation. We further chose these because, in the weather’s archive service, which was our point of reference, these conditions were given. The service here presented draws from the OpenWeather service; therefore, it was useful to treat it as a reference point locally for the city of Szczecin.
3.2. Data Preprocessing
The input data were normalized, and the five selected daily weather conditions were forecasted for the full 2018, 2019, and 2020 years. The study includes daily short-term forecasts for the next day. MLP was used as a prediction model. To evaluate the prediction accuracy, the results were compared with the values of measurements made available by the Institute of Meteorology and Water Management (IMGW). One stage of data preparation, preceding the use in machine learning, was the preprocessing.
With data concerning the daily precipitation sum, the lack of precipitation was recorded in files downloaded from IMGW as the lack of a value, which is interpreted in Python as a NaN value (Not a Number—value not being a number). It was, therefore, necessary to take measures to obtain the correct results of the implemented models with the data used.
Missing values, with precipitation data, were completed with 0. No missing data were found for the other parameters available in the dataset. However, if some weather data were missing, a suitable solution to fill them in would be to use the k-Nearest Neighbors algorithm for the specificity of successive weather variables in time. In the next step, the dataset was divided into the training and test dataset.
3.3. Design and Implementation of the Multilayer Perceptron Model
All stages of the study (data preparation, implementation of the Multilayer Perceptron model, and model testing) were performed in Python programming language.
3.3.1. The MLP Parameters and Their Values
The structure of the neural network (the number of layers, number of neurons in particular layers, type of activation function, and topology of networks) as a tool for modeling proper objects was determined in relation to the problem to be resolved. As a starting point, which is as good as any other solution, a network with one hidden layer can be adopted. The best results are obtained by selecting the number of layers and neurons in the layers empirically. This choice is properly arbitrary and depends on the model creator. A grid search including cross-validation was used to determine the values of the ML model hyperparameters [
81]. The range of parameter values for the cross-validation procedure was selected based on [
75].
l2—the L2 weight adjustment parameter to reduce model overtraining, making the model more simple and less susceptible to over-matching. The regularization consisted of applying penalties to the network parameters.
Number of epochs—the number of iterations of the algorithm across a set of the training dataset.
—the network learning rate was used for updating the weights.
—a parameter for momentary learning that defines the part of the value of the previous gradient added to the updated weights to speed up the learning of the network.
decrease_const—a constant of reduction d as part of an adaptive learning rate that decreases in subsequent epochs for higher convergence.
3.3.2. Methods Implemented in MLP Model
The most important implemented procedures that were run sequentially when training and testing the MLP network are presented in
Figure 2 [
75]. The methods implemented in the Multilayer Perceptron model are listed below:
Initialization of weights.
Sigmoid activation function.
The derivative of the sigmoid activation function.
Adding a bias as a vector of ones to the first column or first row.
Forward propagation.
Backpropagation including regularization procedure.
Prediction.
Fit–network training. In this method, the following procedures were initiated iteratively in subsequent epochs:
- -
Forward propagation.
- -
Backpropagation.
- -
Calculation of the error made by the network during learning.
- -
Updating of weights.
The weights were initialized randomly, with the assumption of a uniform distribution in the numerical range .
3.3.3. Regularization—Preventing Over-Adjustment of the Model
Excessive fitting or over-training of the model is one of the most common problems that appears in machine learning. It occurs when the model works well for the training data but does not generalize the learned rule sufficiently for the unknown test data. One technique used to prevent model over-training is to adjust the complexity of the model by regularization. This is based on introducing additional data and penalizing large scale values. The most common type of regularization is L2 regularization, which is also called the decomposition of weights. This regularization is defined by the Formula (
3):
where
is the adjustment parameter. Adjustment is the reason why scaling the features (e.g., normalization) is important. To carry out the adjustment correctly, all features must be adjusted to a uniform scale [
74,
75]. After the prediction is made, the data is denormalized to present the results of the model with the appropriate values and units for the predicted condition.
3.3.4. Architecture of the Multilayer Perceptron Model
A network architecture was designed with inputs in the input layer, whose role was performed by the most suitable variables selected from 10 weather indicators (features) for the training dataset from 1, 2, and 3 days prior to the day for which a given weather condition is predicted. The decision to use the values of the training features from the 3 days preceding the forecast date was taken based on a review of studies on the forecast of the selected weather parameters available in the literature and online sources [
82].
We considered this variant of the training dataset containing the values of each of the training parameters from 1, 2, and 3 days before the day on which the prediction will be made to be sufficient and reasonable. The selected weather parameters for the training dataset are presented in
Table 2.
The number of neurons of the hidden layer was established experimentally after implementing the neural network model, by applying the cross-validation method, individually for each predicted parameter. In the case of a neural network, a smaller number of hidden units was beneficial if it did not adversely affect the accuracy of the results generated by the network, as this reduces the learning time, making the network more efficient. The output layer contains a vector with expected values for the input data.
3.4. Sensitivity Analysis
The procedure of sensitivity analysis was used to investigate the effect of each parameter on the outputs. Sensitivity analysis using the change of MSE ranked the input variables in a given dataset according to the change of MSE when each input was deleted from the dataset in the training phase. Therefore, the variables that made the largest change in the MSE were considered as the most important [
83].
3.5. Methods to Evaluate the Effectiveness of Regressive ML Models and to Measure the Correlations between Variables
To objectively assess the performance of the implemented predictive model, we used two measures of the error committed by the network, such as the MAE, MSE, and R
, applied in the evaluation of the prediction accuracy of regressive models. To answer the question of why some weather variables were predicted better and others worse, we examined the relationships between the variables included in the model, since ML models are data-driven, and consequently the prediction accuracy naturally depends on the strength of these correlations [
18].
We used Pearson’s correlation coefficient to examine the correlation between weather conditions. The foundations of the methods used to assess the effectiveness of the regressive models and Pearson’s correlation coefficient are presented in
Appendix B.
5. Discussion
The study described in this report aimed to show the possibility of using a unidirectional multilayer neural network to forecast selected weather indicators and to compare the results achieved with other forecasting models. We assumed that, with a reliable and sufficiently large set of weather data, a properly constructed neural network model, and appropriate software, it would be possible to effectively forecast selected weather indicators.
The results of an application using ANN as presented in this report confirm that Artificial Neural Networks can be useful as a tool for forecasting weather indicators. Although a simple construction of the MLP model—comprising three layers of neurons—was used in the study, the results obtained were satisfactory.
The surprising effectiveness of our implementation of the MLP model with comparable or, in most cases, higher effectiveness compared with the other forecasting models proves that this forecasting model was properly designed and implemented and is a model suitable for the assumed purpose, i.e., forecasting selected weather parameters, and that the data used to train the model were properly prepared.
However, there were some difficulties in applying the proposed weather forecasting model. As reported in the literature [
18], neural networks can be susceptible to learning false relationships between data. A pure data-based weather forecasting model may fail to respect basic physical principles and, thus, generate false forecasts because it does not take into consideration that every atmospheric process is affected by physical laws.
There are specific properties of weather data for which classical ML concepts (which work for typical problems solved by ML, such as computer vision and speech recognition) are not effective enough in a complete weather prediction system. The reason for this is the necessity for the model to handle the complexity of the meteorological data and feedback processes to provide accurate prediction results. Another difficulty encountered when using MLPs for weather forecasting is because ANNs are good interpolators but poor extrapolators. The dataset used to train the ANN must, therefore, contain numerous and heterogeneous examples to cover the widest range of cases that the ANN is expected to predict [
34].
A separate training and testing dataset containing regional data would be required for each location where the method would be applied. The training dataset needs to be updated regularly due to occurring climate changes, and the network model needs to be re-trained due to the changing climate patterns in the world [
85]. Another disadvantage of MLPs and deep learning models over other methods is that their training process takes a long time [
62].
Forecasting time series, which include a prediction of the weather parameters changing over time, is an important area of machine learning. The time component provides useful information that is used in the construction of the machine learning model, but it also brings with it problems that make it difficult to accurately predict certain variables. If a time series’ data are correlated over time, it is much easier to obtain an accurate prediction because the model uses historical values in the machine learning process and then generates a forecast for the future from these.
When data values change randomly over time, the model cannot predict future changes based on historical events with great accuracy [
82]. The implications of this are highly accurate prediction results for weather conditions that show a high correlation with the other variables included in the prediction model and poor results for weather conditions for which no such correlations exist. In most of the analyzed cases, the MLP model achieved higher or comparable results to the forecasts from the internet weather service.
The highest prediction efficiency was observed for the maximum and minimum temperature. These are conditions for which there is a strong correlation in time with many other weather indicators. Thus, there is less risk of them adopting random values. Therefore, these are the parameters that are particularly suitable for prediction by artificial neural networks.
Our conclusion is that it is worthwhile to build models of machine learning for the prediction of time series with values that are strongly correlated with each other because then there is a high probability of obtaining an accurate prediction. These models will be useful, for example, as a correcting tool to forecast the weather conditions at the local scale when numerical modeling is insufficient due to specific local conditions—for instance, as described earlier in the paper, where numerical modeling is performed at a resolution too high to account for the influence of the local topography of the Cadarache valley localized in southeastern France) [
34].
In contrast, the overestimation of the predictive capacity of the model for certain parameters results in its low competitiveness compared to the possibilities offered by alternative methods [
86]. Our investigation confirmed the facts available in the literature review, in which most of the research concerned the use of machine learning methods to forecast temperatures for which the predictions are accurate because of their strong correlation with other weather indicators.
6. Conclusions and Future Research Directions
This study presents the use of an Artificial Neural Network with a Multilayer Perceptron (MLP) architecture regressive model to forecast selected weather parameters. The MLP model was built, and its effectiveness was compared with the forecasts available in the website archive. The developed model is a lightweight solution and can be an element of any application that would require forecasting of the above weather parameters. The most important aims achieved by us are listed below:
This study presents a successful attempt to use an application based on a model of a unidirectional multilayer Artificial Neural Network to forecast selected weather conditions for a selected location, i.e., the city of Szczecin.
The application used in this survey was successful for a local scale study.
We obtained satisfactory results, i.e., accurate forecasts, with the simple design of the MLP model, which comprised three layers of neurons.
We confirmed that this forecasting model was properly designed and implemented, that this is a model suitable for forecasting selected weather parameter, and that the data used to train the model were properly selected and prepared.
We analyzed and explained the reasons for the different forecasting accuracy results for the different weather parameters.
Our approach in this study was limited to using data and performing forecasts of the selected weather conditions for the local area of Szczecin city in Poland. The weather data set limited to the territory of the city of Szczecin was a small set. As our study was focused on a regional territory and dataset, the usefulness for a larger area would be limited. The achieved results encourage future investigations to generalize whether the satisfactory scores obtained by this application working on local data will be repeatable and useful in the case of other regions of Poland.
The next direction of further work is to compare the performance of this application for a larger dataset including Poland and the European area. It would also be desirable for the direction of future work to include comparing the effectiveness of our method with applications working on other, newer, and more advanced machine learning models to see if the results obtained with our proposed method working in a weather forecasting application can be improved.
The MLP provided slightly more accurate predictions of temperature and atmospheric pressure compared with the LSTM and SVR. This indicates that an appropriate choice of data set, learning features, and parameters for the MLP training can provide results comparable to more complex ML models. For the wind speed prediction, LSTM and SVR showed a slight advantage, and for the precipitation sum prediction, LSTM. Thus, we concluded that, for weather conditions that are more difficult to predict, more complex ML models, especially LSTM, represent an opportunity to obtain more accurate predictions, and it is worthwhile to focus on research aimed at adjusting their structure to further improve the prediction performance.
As the MLP model did not achieve equally effective results for all the predicted weather conditions, further work is needed to improve the accuracy and expand the model. To improve the accuracy of the prediction of the weather parameters with the MLP model, more hidden layers in the network model can be used, i.e., a deep network model can be used because the quality of the multilayer network is higher compared to traditional simple MLP networks with only a few hidden layers. For more accurate forecasts, the precision of the information provided by the training data can be increased by using more densely spaced samples in the training dataset, e.g., every hour instead of every 24 h.
To obtain more accurate forecasts, more training samples can be used for the day that the weather parameters are forecast, e.g., as a training sample of data from ten days ago. Further work includes testing the effectiveness in the problem of weather parameter prediction for more advanced models, such as deep networks, CNN models, LSTM models, RNN models, and SVR models, as well as comparing the accuracy of the results obtained by these with the results of the implemented MLP. Such a study, in the opinion of the authors, could provide interesting results and enable empirical evaluation of the effectiveness of the MLP model from this work.