An ANN Model Trained on Regional Data in the Prediction of Particular Weather Conditions

: Artiﬁcial Neural Networks (ANNs) have proven to be a powerful tool for solving a wide variety of real-life problems. The possibility of using them for forecasting phenomena occurring in nature, especially weather indicators, has been widely discussed. However, the various areas of the world differ in terms of their difﬁculty and ability in preparing accurate weather forecasts. Poland lies in a zone with a moderate transition climate, which is characterized by seasonality and the inﬂow of many types of air masses from different directions, which, combined with the compound terrain, causes climate variability and makes it difﬁcult to accurately predict the weather. For this reason, it is necessary to adapt the model to the prediction of weather conditions and verify its effectiveness on real data. The principal aim of this study is to present the use of a regressive model based on a unidirectional multilayer neural network, also called a Multilayer Perceptron (MLP), to predict selected weather indicators for the city of Szczecin in Poland. The forecast of the model we implemented was effective in determining the daily parameters at 96% compliance with the actual measurements for the prediction of the minimum and maximum temperature for the next day and 83.27% for the prediction of atmospheric pressure. lower for the MLP, and the MSE had a value of 9.3824 less for the MLP than for the weather service. The determination factor R 2 score used to interpret the model effectiveness was 0.1133 higher for the MLP compared with for the weather service. These results show the much higher effectiveness of the MLP model over the weather service in forecasting the minimum temperature.


Introduction
Prediction is one of the basic goals of Data Mining [1], and weather forecasting plays a significant role in meteorology [2]. Artificial Neural Networks (ANNs) belong to non-linear and non-parametric tools for modeling actual processes and phenomena, i.e., problems that are difficult to solve using classical methods. ANNs are widely used in engineering practice with the possibility of their effective modeling in software. They are an extremely useful alternative to traditional statistical modeling techniques in many scientific disciplines [3].
This makes it possible to use neural networks widely, not only in research on brain functions but further to analyze data in areas as diverse as economics [4][5][6], automation [7], the energy industry [8][9][10][11], the natural sciences [12], and medicine [13,14]. ANNs are a tool used in machine learning. They have great possibilities for recording and presenting complex relationships between input and output data [15].
ANNs are parallel computational models, comprising interconnected adaptive data processing units. The adaptive nature of networks, where "learning by example replaces programming", makes ANN techniques widely used to solve highly non-linear phenomena [16]. The advantage of neural networks is that they can represent both linear and non-linear relationships that exist between data.

The Essence, Significance, and Complexity of Weather Forecasting
Weather forecasting is the use of science and technology to predict the state of the atmosphere and the associated meteorological phenomena, concerning a specific area and time period. Weather forecasts are performed based on scientific knowledge of atmospheric processes and historical quantitative data as typical meteorological conditions. The chaotic and complex nature of the atmosphere, the enormous computing power required by computers to solve atmospheric equations, and an incomplete understanding of atmospheric processes make predictions less accurate as the range of timely predictions increases [17].
Meteorologists use different methods to forecast the weather. Lewis Fry Richardson proposed the possibility of creating the first numerical weather prediction models in 1922 [18]. The practical use of numerical models began in 1955 due to the development of programmable electronic computers [19]. Data for forecasting come from observations of atmospheric pressure, temperature, wind speed, wind direction, humidity, and precipitation. Trained observers make these observations close to the ground surface or automatic weather stations are used for this purpose [16].
At present, it is common and established to use numerical models for weather forecasting, which are uncompetitive but still not sufficiently effective at this stage of research regarding weather forecast models [20]. The basic condition for the usefulness of a weather forecast is its high verifiability. Users expect accurate local forecasts, broken down into hourly data and a very dense grid, i.e., for specific geographical points.
Weather forecasting is one of the essential tools for planning, risk management, and decision making in sectors of the economy and everyday life. The sectors exposed to weather risk are energy, agriculture, food industry, construction, entertainment and tourism, transport, and defense-together, the lion's share of the national economy [21,22]. Therefore, despite the colossal costs generated by weather forecasting, technological infrastructure for testing alternative methods and creating better forecasting models is still being developed. These expenses are justified by ensuring the safety of people and their property from natural disasters. However, precise weather forecasting is extremely difficult. Weather is a chaotic phenomenon, characterized by temporal and spatial irregularity in which successive states vary and are difficult to predict [23].
New technologies can change this situation. The world of meteorology is exploring Artificial Neural Networks for the generation of non-linear and non-parametric tools for modeling processes and genuine phenomena, i.e., problems that are difficult to solve with classical methods. Such networks have some brain properties. They learn from examples and apply this knowledge to problem solving, i.e., they can generalize. They are useful for tasks that are not very precisely defined formally.
They can function properly at a certain level of damage and despite partially incorrect input. They also have a relatively high speed of operation (information processing). Their current level of development means that they are not yet competitive with numerical models. However, their speed is predicted to compete with standard supercomputers, whose current speed reaches 2.8 × 10 14 elementary operations per second [24][25][26].
Currently, models based on machine learning using Artificial Neural Networks and based on historical data are being tested. According to global reports, these models produce forecast results that far exceed the precision and efficiency of numerical models based on current weather data. These models improve the forecasting efficiency by 40-50% for air temperature and 129-169% for precipitation [27].

Research Gap
To find and implement an efficient solution with low computational complexity for predicting selected weather indicators while ensuring high verifiability of forecasts it is necessary to maintain a balance between the level of complexity of the model architecture and the type and number of methods used to improve the accuracy. The verifiability of modern numerical weather prediction (NWP) models is good; however, it requires considerable computing resources. This is important for making probabilistic beam forecasts, in which the models repeatedly (several dozen times) generate the course of future meteorological conditions. Models using machine learning may prove less demanding in this respect [28].
The question was, therefore, asked whether Artificial Neural Networks can be applied to weather forecasting-a phenomenon that is extremely complicated and chaotic in nature. Achieving this goal was quite a challenge as the model was planned to be applied in difficult forecasting conditions, which are characterized by high variability, closely related to the specificity of the climate occurring in Poland. In addition, the city of Szczecin is an area with a specific climate, which causes difficulties in effective weather forecasting, which is described in the following part of the study.

Aim of the Study
The purpose of this study was to obtain the basis for the statement that a multilayer Artificial Neural Network (or, to be more precise, a Multilayer Perceptron), even one with an uncomplicated structure and implementation, can effectively predict basic weather conditions and whether the MLP model can be used as an adjunctive and corrective tool for forecasting weather conditions at the local scale. We intended to create a unidirectional multilayer Artificial Neural Network model trained on a regional dataset and used to forecast selected weather conditions for the city of Szczecin (Poland) and then to compare the effectiveness of model predictions with forecasts available in the archives of the internet weather service using statistical metrics selected based on a literature review.
We assumed that, with a reliable and sufficiently large weather dataset; properly constructed, established, and proven neural network model; and properly prepared computer environment it would be possible to effectively forecast selected weather indicators. This survey also covered the practical contribution, which is the building of an MLP model for a specific location and characterized by high climatic complexity affecting the difficulty of weather forecasting, which is significant from a pragmatic point of view.
The rest of the report is organized as follows: Section 2 contains a description of the materials and methods used in the study. Section 3 introduces the results of the study. A discussion of the results obtained and the influencing factors are given in Section 4. In Section 5, we present the conclusions and future directions of our workshop.

Literature Review
Our literature review contains studies of machine learning methods and their applicability to weather data along with their relevant statistical properties. Numerical Weather Prediction (NWP) models play a key role in operational weather forecasting, especially for longer prediction times. Utilizing NWP with deep learning to improve the accuracy of weather forecasting systems is a fruitful avenue to consider. An article published in 2020 presents results from the Ensemble of Spatial-Temporal Attention Network and Multi-Layer Perceptron (E-STAN-MLP) for forecasting meteorological elements to predict the surface temperature, humidity, wind speed, and direction at 24 automatic meteorological stations in Beijing.
This research can be generalized to local weather prediction in other regions [29]. Multilayer Perceptrons (MLPs) belong to the common type of feed-forward networks used for the future predictions of rainfall and temperature [30]. MLP has been continuously used for many years to solve problems, such as the prediction of weather conditions. MLP is important for real-life problems [31]. In the study described in the article, in 2020, it was used as a predictive model for wind speed forecasting (wind power) in Villonaco alongside a Long Short-Term Memory (LSTM) and Convolutional Neural Network model (CNN) [32].
The continued popularity and usefulness of the MLP is also demonstrated by its use as a prediction model for time series of meteorological tsunamis next to Evolved Radial Basis Function (ERBF) in Evolved Neural Networks (ENN) in 2020 for Accurate Meteorological Forecasting Applications in Vietnam [33]. In a paper published in 2021, F. Dupuy and others presented an application of the ANN technique in the form of an MLP model as a tool to correct and complement a numerical model for forecasting the wind speed and direction at the local scale, including the Cadarache valley, which is localized in southeastern France).
As a measure of network accuracy, the Mean Absolute Error (MAE) was taken [34]. The satisfactory results obtained by the authors confirmed the usefulness of this tool in weather condition prediction. A Multilayer Perceptron has been successfully used as a predictive model for various natural conditions, such as soil temperature [35], compared to the Support Vector Machine (SVM) for rainfall prediction [36,37]. This recent literature showed that the use of a Multilayer Perceptron for predicting weather conditions was confirmed and, thus, indicated the possibility of its further use.
To research weather condition predictions performed in Poland and worldwide as well as the methods used, we reviewed the literature. The subject of the work by M. Hayati and Z. Mohebi from 2007 was the application of an Artificial Neural Network (Multilayer Perceptron) in forecasting the mean temperature for the next day for Kermanshah in Iran [15]. The authors of the study trained and tested the MLP model using previous meteorological data. The chosen weather data were divided into two randomly selected groups: the training group, corresponding to 67% of the patterns, and the test group, corresponding to 33% of the patterns. MAE was used as a measure of the network accuracy. The training data were selected as the mean temperature, wind speed, humidity, pressure, and sunshine. The dataset was normalized by converting its values to a range between −1 and 1.
In a paper from 2009, Y. Radhika and M. Sashi projected the maximum temperature for the next day [38]. The MLP prediction was compared with SVM. For MLP, an algorithm of backpropagation was used. A training dataset from 5 years and test data from 1 year were used. In the case of SVM, the Radial Basis Function Kernel (RBF), which is a popular kernel function used in various kernelized learning algorithm functions, was used.
There are three layers in the ML: input, hidden, and output, as well as an algorithm for the backpropagation of errors. A sigmoid function was selected as the activation function. Before the test, the data were normalized, and the measure of the accuracy of the models was the Mean Squared Error MSE. In this study, SVM performed better than MLP. A comparison of the results of air temperature forecasting using a MLP neural network model for prediction using the SVM model is the subject of many studies [39]. This shows that models with such structures are useful in forecasting weather variables.
In the largest of the studies available in the literature, the most frequently predicted conditions were the mean, minimum, and maximum temperature. The research concerned a specific place in the world. S. S. Baboo and K. Shereef, in their work from 2010, used an Artificial Neural Network with a backpropagation algorithm to predict the mean temperature [40], as did M. Hossain and others in a study conducted in 2015 [41].
Among the studies using Artificial Neural Networks for the prediction of the mean temperature, we paid attention to the work of an author from Poland, I. Białobrzewski from 2005 [42]. Another study on temperature prediction using an Artificial Neural Network model is presented in the article [43]. By far, the largest number of studies described mean temperature predictions using ANNs, and one of the most frequently used network learning algorithms was the backpropagation algorithm [44][45][46][47].
In the literature, there were additionally studies regarding forecasting the maximum temperature using ANNs [48,49] as well as the minimum temperature [50]. This problem is discussed in an article regarding the prediction of the maximum and minimum temperature with linear regression [51].
A study comparing the effectiveness of the Support Vector Machine Regressor (SVR) and k-Nearest Neighbors in predicting the wind power used in renewable electricity production is presented in the article [52]. Wind speed is also the subject of research presented in the Polish study [53]. This study predicted wind speed as one determinant of the energy consumption in buildings. The tests were carried out on neural network models based on Multilayer Perceptron architecture (MLP), Generalized Regression Networks (GRNN), and networks with Radial Base Functions (RBFs).
Another study with the aim of determining the annual mean wind speed by using a Multilayer Perceptron and backpropagation algorithm for training the network in foreign literature [54], showed the popular use of this network model for wind speed forecasting. Many types of neural network models can be used in forecasting weather conditions. For this purpose, networks with relatively low complexity, such as MLP, and networks with a more complex architecture, i.e., deep networks [55,56], Recurrence Neural Networks (RNN), Conditional Restricted Boltzmann Machine models (CRBM), and Convolutional Neural Networks (CNN) are suitable [57]. The application of a gated recurrent unit neural network (GRUNN) modification of RNNs to forecast wind power, which is one of the largest renewable energy sources, is described in a paper published in 2019 by M. Ding and others [58].
LSTM network models for predicting the precipitation based on meteorological data from 2008 to 2018 in Jingdezhen City were deployed in the research described by J. Kang and others in 2020. LSTM is a special kind of RNN, capable of learning long-term dependence. It was introduced by Hochreiter and Schmidhuber [59]. RNNs are special ANNs that are connected in a feedback structure between units of an individual layer. They are called recurrent because they perform the same operation on all elements in the sequence.
They make it possible to model data with time-series characteristics by complementing the limits of non-recurrent ANNs, which independently assume the relationship between inputs [60]. In this article, a study involving temperature prediction published by B. Kwon and others in 2020 is presented. The difference between LSTM and RNN is that it adds a "processor", which is called the cell state, to the algorithm to judge whether the information is useful or not. A network where information enters the LSTM can be judged by rules. Only the information that accords with the authentication will be left behind, and the discrepant information is forgotten by the forget gate [61].
In addition to ANNs, many other approaches are frequently used in weather forecasting, such as multiple regression, SVM, decision trees, and the k-Nearest Neighbor model. The disadvantage of MLP and deep learning models over other methods is that they are time-consuming during the training process. Although SVM requires intensive training and experimentation on different kernel functions as well as other parameters, it is significantly faster compared to MLP and deep learning models [62].
The SVM algorithm was developed by Vapnik and is based on statistical learning theory [63]. SVM includes efficient algorithms for a wide range of regression problems because they not only take into account the approximation of the error to the data but also provide a generalization of the model-that is, its ability to improve the prediction of the data when a new data set is evaluated by it [39].
ANNs, including Multilayer Perceptron, deep ANNs, and other machine learning models, are constantly being improved and widely used for the forecasting of air temperature [64], rainfall [36,37,65], cloudiness [66], and wind speed [67], which proves that these are forward-looking models that are worthy of constant research and improvement for forecasting purposes. To summarize, the most frequently forecasted condition in the above works was temperature; several studies have similarly found the use of models for wind speed prediction, and the most frequently used machine learning model for this purpose was a multilayer Artificial Neural Network. Other models chosen by researchers include • Support Vector Machines and • Linear Regression.
The most commonly used learning algorithm for neural networks is the backpropagation algorithm [2]. Datasets are usually large and contain data from many previous years-optimally 10 years. Before starting the training and prediction, the data are normalized. The authors typically verify the accuracy of the model predictions using measures, such as MAE and MSE.
The research results presented in the analyzed papers motivated us to implement a Multilayer Perceptron as the prognostic model and to investigate its effectiveness using, among others, MAE and the Mean Squared Error (MSE) as measures of the model effective-ness. As the network learning algorithm, the backpropagation algorithm was chosen. We decided that the dataset would contain data from 10 years. the maximum and minimum temperatures, atmospheric pressure, wind speed, and daily precipitation for the next day were planned as the forecast conditions.

The Complexity of Weather Forecasting in Poland
In Poland, the Institute of Meteorology and Water Management is statutorily responsible for preparing weather forecasts, using numerical models. Their verifiability is on average: for short-term forecasts-90-95%, and for medium-term forecasts-70-75%. On the other hand, due to the different methods of preparing and presenting longterm forecasts, it is difficult to discuss their verifiability, as these forecasts rather show the trend of thermal changes or the probability of precipitation. The analysis of longterm weather forecasts in terms of their verifiability is not currently being carried out (https://forum.mazury.info.pl/viewtopic.php?t=15409 accessed on 23 November 2020).
The nature of the weather in Poland was described by the atmosphere physicist Prof. Teodor Kopcewicz: "Poland lies in a zone of moderate climate, with unmoderated weather changes". Poland is one of the more difficult places in the world to forecast the weather. This is because of Poland's climate, which is characterized by high weather variability and significant fluctuations in the course of the seasons in subsequent years. The physical and geographical location of the country means that various masses of air crash over its area, which influences the weather and the climate of Poland.
Frequently moving atmospheric fronts, attrition, and the exchange of various air masses (hot and cold) cause the weather to change frequently and creating great problems with weather forecasting [68]. We can find confirmation of the difficulties in inaccurate weather forecasting in Poland in a recent paper that draws attention to the seasonality of the weather in Poland throughout the year [69].
The atmospheric circulation in this part of Europe is characterized by relatively high annual variability, which causes significant temperature and precipitation fluctuations during the year [70]. There is a paper presenting a workshop on the occurrence of tornadoes in Poland, which confirmed the wide spectrum of weather phenomena observed in Poland, which makes it difficult to accurately predict them [71].

The Specific Climate of Szczecin and Its Influence on Weather
The weather of Szczecin is reflected in the specific climate of Szczecin's Climatic Land (VI and X), which is influenced by its location near the sea, many lakes, a large river basin, landforms, large forest areas, parks and meadows, street greenery, and relief (valleys and hills as illustrated on the Map in Figure 1). The climate of Szczecin and its surroundings is shaped primarily by the advection of polar and sea air masses. The proximity of large water reservoirs, i.e., the Baltic Sea and the Szczecin Lagoon, results in the formation of local breeze circulation affecting the course of the weather.
The Baltic Sea and the Szczecinski Lagoon have a warming effect in the winter, and cause cooling in the summer. Important climatic factors include the latitude, terrain, and elevation above sea level. From the southwest to the northeast, through the center of the province extends the front moraine shaft, which clearly differentiates the spatial distribution of sunshine, temperature, precipitation, and wind speed on its northwestern and southeastern side. The main part of the baric systems moves from western directions. The shifting lowers with atmospheric fronts, which causes weather changes and strong and stormy winds. Spring and summer blades, although many, are less active and strong storm and glare winds.
The Skagerrad lowers formations because of a wave disorder on a front approaching Norway-they typically move in a southeastern direction. Deepening rapidly, they cause storm weather. These occur most often in the winter and spring. Atmospheric circulation is formed under the influence of the Icelandic lower (especially in winter) and the Azores higher (mainly in summer). The climatic conditions in winter are significantly influenced by the strong seasonal Siberian High. Due to the northwestern extension, time differences between the Baltic coast and the southern ends of Poland are clearly visible. In summer, the day is more than an hour longer than in the uplands of southern Poland. In winter, the day lasts equally as long. The shift of about 40 min likewise occurs between the eastern and western parts of the country [72,73]. Due to the described specificity of the climate in the area of the city of Szczecin, which causes difficulties in exact weather forecasting, the authors of the work decided that it is worth checking how the task of prediction of selected weather conditions for this area will perform with a MLP model, whose use for the forecasting of selected weather conditions was presented in many research articles cited in the literature review.
The location where the study was performed and the number of different predicted weather conditions included in the study are novel compared to the previous studies described in the literature that used machine learning (ML) methods. It is also a new and interesting approach of the authors to study the performance of the implemented MLP model for multiple different weather conditions, whereas the reviewed publications on this issue usually presented studies on a single weather parameter.

Materials and Methods
The Multilayer Perceptron (MLP) is a popular and commonly used model among neural networks. The literature study presented in the previous section enabled us to identify many papers from recent years that confirmed the actuality, usefulness, and popularity of using this model in forecasting various weather conditions for different locations around the world. This type of neural network is referred to as a supervised network [15].
Supervised learning comprises training the model using a training dataset, i.e., a set of samples containing known expected output signals. One type of supervised learning used in forecasting results with continuous values is regressive analysis. The model described contains explanatory (training) and explained (predicted) variables. Both types of variables are continuous values. The purpose of this type of network is to create a model that correctly maps the input data to the output data using historical data so that the model can then predict the output data when the desired output data is not known [15]. It enables detecting relationships between variables and predicting future results [74,75].
A multilayer neural network can approximate any function with continuous values between input and output vectors of data by picking the appropriate set of weights. The described properties of neural networks allow solving the problems of forecasting phenomena occurring in the natural world based on collected historical data, such as weather phenomena. The fundamentals of Artificial Neural Networks and the Multilayer Perceptrons with descriptions of their structures and explanations of the principles of operations, including mathematical formulae, are contained in Appendix A.

Overview of the Available Datasets and Selection of a Training Dataset
The dataset from the meteorological station with code 205 located at Szczecin was taken from the archives of Institute of Meteorology and Water Management (IMGW) (https: //dane.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_meteorologiczne/dobowe/ synop/ accessed on 27 April 2021). This source of meteorological data was considered reliable due to the status of IMGW as a state research unit.The real-time forecasting of meteorological phenomena is one of the basic tasks of modern meteorological services. In Poland, official weather forecasts are prepared and developed by the Institute of Meteorology and Water Management-National Research Institute (IMiGW-PIB)-a state unit supervised by the minister in charge of water management. The mission of IMiGW-PIB is to inform society and organizations about the weather-meteorological and hydrological, climate change, and all factors influencing the current weather in Poland.
The major task of the IMGW-PIB is to provide meteorological cover for Poland. For this purpose, the IMGW-PIB Monitor was created-a service for all national operational services and administrative bodies. As part of its statutory activities, the IMGW-PIB prepares and delivers meteorological forecasts, warnings against dangerous phenomena occurring in the atmosphere, and dedicated announcements and bulletins. IMGW is responsible for collecting, storing, processing, and making available domestic and foreign measurements and observation materials.
The Institute develops and distributes weather forecasts and warnings. IMGW is a member of many international organizations. It represents Poland on the forum of WMO (World Meteorological Organisation) or Eumetsat (European Organisation for the Exploitation of Meteorological Satellites). IMGW stores, and makes available in its database, measurement and observation data collected since 1960 for 2112 measuring stations in Poland (https://danepubliczne.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_ meteorologiczne/ accessed on 27 April 2021).
A review of the literature on the use of multilayer neural networks in the prediction of selected weather conditions demonstrated that, for many studies, ten years of previous meteorological data were used [74,75]. We, therefore, decided to use meteorological data from 10 years (2011-2020). In this study, daily data from 7 years (2011-2017) were used as a training dataset with 10 training features for the 3 days before the day for which the forecast was performed.
The test dataset included data from 3 years (2018-2020). One sample contained selected weather conditions for each day. Due to the many samples in the dataset, we chose to apply a simple validation (train/test split) to check the effectiveness of the model. The dimensions of the datasets used for model training and testing are presented in Table 1. The ten features selected to create a training dataset, together with their respective units, are listed in the Table 2. These variables from 1, 2, and 3 days before the prediction date were the inputs for the MLP model. The time horizon of 3-5 days before the day for the forecast was used, among others, by Rasp and others in a study published in 2020 [76]. Table 3 contains the conditions planned for forecasting using the MLP model in this study. They are the outputs of the MLP model.
We selected five weather conditions that are interesting for people from the point of view of watching the weather forecast and are most often provided in weather forecasts, i.e., the maximum and minimum temperature, pressure, wind speed, and precipitation. We further chose these because, in the weather's archive service, which was our point of reference, these conditions were given. The service here presented draws from the OpenWeather service; therefore, it was useful to treat it as a reference point locally for the city of Szczecin.

Data Preprocessing
The input data were normalized, and the five selected daily weather conditions were forecasted for the full 2018, 2019, and 2020 years. The study includes daily short-term forecasts for the next day. MLP was used as a prediction model. To evaluate the prediction accuracy, the results were compared with the values of measurements made available by the Institute of Meteorology and Water Management (IMGW). One stage of data preparation, preceding the use in machine learning, was the preprocessing.
With data concerning the daily precipitation sum, the lack of precipitation was recorded in files downloaded from IMGW as the lack of a value, which is interpreted in Python as a NaN value (Not a Number-value not being a number). It was, therefore, necessary to take measures to obtain the correct results of the implemented models with the data used.
Missing values, with precipitation data, were completed with 0. No missing data were found for the other parameters available in the dataset. However, if some weather data were missing, a suitable solution to fill them in would be to use the k-Nearest Neighbors algorithm for the specificity of successive weather variables in time. In the next step, the dataset was divided into the training and test dataset.

Normalization of the Input Data
Neural networks generally provide better performance when working on normalized data. Using the original data as the input to a neural network may cause algorithm convergence problems. The absence of a normal distribution is a common occurrence for environmental data [77]. Python provides several classes in which data scaling procedures are implemented. One of these procedures is a method that allows for parametric, monotonous transformations, the aim of which is to map data from any distribution to as close as possible to the Gaussian distribution to stabilize the variance and minimize the skew. This method belongs to a new power transformation family that is well defined on the whole actual line and is appropriate for reducing skewness to approximate normality. It has properties similar to those of the Box-Cox transformation for positive variables.
The large-sample properties of the transformation were investigated in a single random sample [78]. We estimated the transformation parameter by minimizing the weighted squared distance between the empirical characteristic function of transformed data and the characteristic function of the normal distribution [79].
This method provides two types of transformations: Yeo-Johnson transformation and the Box-Cox transformation. The Box-Cox conversion can only be used for strictly positive data; therefore, the first variant should predict temperatures that may be negative or zero. The Yeo-Johnson transformation is defined by the Formula (1): while the Box-Cox transformation is described by Formula (2): Standardization of a dataset is a common requirement for many machine learning estimators. Typically, this is done by removing the mean and scaling to the unit variance. However, outliers can often influence the sample mean/variance negatively. In such cases, the median and the interquartile range often provide better results. The interquartile range (IQR) is robust to the outliers that can occur with weather data. In this method, the median is removed and then the data are scaled according to the range of quantiles between the first quartile (25th quantile) and the third quartile (75th quantile) [80]. Due to the nature of the dataset studied, we decided that it was necessary to compare the performance of the prediction model for the two methods described.

Design and Implementation of the Multilayer Perceptron Model
All stages of the study (data preparation, implementation of the Multilayer Perceptron model, and model testing) were performed in Python programming language.

The MLP Parameters and Their Values
The structure of the neural network (the number of layers, number of neurons in particular layers, type of activation function, and topology of networks) as a tool for modeling proper objects was determined in relation to the problem to be resolved. As a starting point, which is as good as any other solution, a network with one hidden layer can be adopted. The best results are obtained by selecting the number of layers and neurons in the layers empirically. This choice is properly arbitrary and depends on the model creator. A grid search including cross-validation was used to determine the values of the ML model hyperparameters [81]. The range of parameter values for the cross-validation procedure was selected based on [75].
• l2-the L2 weight adjustment parameter to reduce model overtraining, making the model more simple and less susceptible to over-matching. The regularization consisted of applying penalties to the network parameters. • Number of epochs-the number of iterations of the algorithm across a set of the training dataset. • η-the network learning rate was used for updating the weights. • α-a parameter for momentary learning that defines the part of the value of the previous gradient added to the updated weights to speed up the learning of the network. • decrease_const-a constant of reduction d as part of an adaptive learning rate that decreases in subsequent epochs for higher convergence.

Methods Implemented in MLP Model
The most important implemented procedures that were run sequentially when training and testing the MLP network are presented in Figure 2

Regularization-Preventing over-Adjustment of the Model
Excessive fitting or over-training of the model is one of the most common problems that appears in machine learning. It occurs when the model works well for the training data but does not generalize the learned rule sufficiently for the unknown test data. One technique used to prevent model over-training is to adjust the complexity of the model by regularization. This is based on introducing additional data and penalizing large scale values. The most common type of regularization is L2 regularization, which is also called the decomposition of weights. This regularization is defined by the Formula (3): where λ is the adjustment parameter. Adjustment is the reason why scaling the features (e.g., normalization) is important. To carry out the adjustment correctly, all features must be adjusted to a uniform scale [74,75]. After the prediction is made, the data is denormalized to present the results of the model with the appropriate values and units for the predicted condition.

Architecture of the Multilayer Perceptron Model
A network architecture was designed with inputs in the input layer, whose role was performed by the most suitable variables selected from 10 weather indicators (features) for the training dataset from 1, 2, and 3 days prior to the day for which a given weather condition is predicted. The decision to use the values of the training features from the 3 days preceding the forecast date was taken based on a review of studies on the forecast of the selected weather parameters available in the literature and online sources [82].
We considered this variant of the training dataset containing the values of each of the training parameters from 1, 2, and 3 days before the day on which the prediction will be made to be sufficient and reasonable. The selected weather parameters for the training dataset are presented in Table 2.
The number of neurons of the hidden layer was established experimentally after implementing the neural network model, by applying the cross-validation method, individually for each predicted parameter. In the case of a neural network, a smaller number of hidden units was beneficial if it did not adversely affect the accuracy of the results generated by the network, as this reduces the learning time, making the network more efficient. The output layer contains a vector with expected values for the input data.

Sensitivity Analysis
The procedure of sensitivity analysis was used to investigate the effect of each parameter on the outputs. Sensitivity analysis using the change of MSE ranked the input variables in a given dataset according to the change of MSE when each input was deleted from the dataset in the training phase. Therefore, the variables that made the largest change in the MSE were considered as the most important [83].

Methods to Evaluate the Effectiveness of Regressive ML Models and to Measure the Correlations between Variables
To objectively assess the performance of the implemented predictive model, we used two measures of the error committed by the network, such as the MAE, MSE, and R 2 , applied in the evaluation of the prediction accuracy of regressive models. To answer the question of why some weather variables were predicted better and others worse, we examined the relationships between the variables included in the model, since ML models are data-driven, and consequently the prediction accuracy naturally depends on the strength of these correlations [18].
We used Pearson's correlation coefficient to examine the correlation between weather conditions. The foundations of the methods used to assess the effectiveness of the regressive models and Pearson's correlation coefficient are presented in Appendix B. Table 4 presents a comparison of the weather condition prediction accuracy metrics from the MLP model for the two normalization methods Yeo-Johnson transformation and IQR. Our analysis of the results allowed us to conclude that the accuracy of the MLP prediction for the compared methods differed depending on the forecasted weather condition and, for temperature, the atmospheric pressure and wind speed were higher for the Yeo-Johnson transformation, while, for daily precipitation, better results were obtained using normalization taking into account the IQR method.

Results of Data Preprocessing
Figures A1-A4 in Appendix C display histograms of the input data before and after the data preprocessing. The shape of the histograms reflects the nature of the data resulting from the specific climate of the city of Szczecin. The presentation of histograms before and after data normalization next to each other allows for a clear visualization of the effect of data normalization on the the histogram shape change. The histogram of the maximum daily temperature in Figure A1a demonstrated an interesting characteristic. This plot shows that the data was multimodal, which may be due to two different sets of environmental circumstances.
The maximum temperature swings in the area of the city of Szczecin can be quite extreme, particularly between seasons. The histogram for the minimum daily temperature shown in Figure A1c after normalization obtained a shape more similar to a histogram with a normal distribution as shown in Figure A1d. The histogram representing the mean daily temperature seen in Figure A2a indicates that this data was somewhat multimodal. Normalization eliminated this effect as can be seen in Figure A2b.
In the case of the minimum daily temperature at ground level, a slight shift toward the high values observed in the histogram in Figure A2c was canceled out after applying normalization, as shown in Figure A2d. For the daily sum of the precipitation, the outliers visible in the histogram in Figure A2e are easy to explain because dry days (days without precipitation) were more frequent. Outliers are also visible for the data in the histogram for the daily dew duration in Figure A3a.
Szczecin's climate is characterized by high cloudiness, humidity, and fog, which is reflected in the histograms presenting the input data representing the weather parameters [72]. The histogram of the mean daily general cloud cover in Figure A3c shows that there were more samples with high values as a result of the large number of days per year with high cloud cover due to the specific climate of the city of Szczecin. In the case of the histogram for the data representing the mean daily humidity as seen in Figure A4a, normalization reduced the shift toward high values as can be seen in Figure A4b.
The histogram for the data representing the mean daily wind speed as shown in Figure A3e presents that more samples had low values due to fewer days with strong winds. After normalization, this effect was significantly reduced, as shown in Figure A3f. The histogram representing the data for the mean daily atmospheric pressure in Figure A4c shows that the data exhibited a normal distribution because the histogram has a characteristic bell shape. The mean value is located in the central part of the histogram. Table 5 presents the MLP hyperparameters as determined using a grid search, including the k-fold cross-validation procedure-the results of which are shown in Table A7 in Appendix E.

Training Variables Selection
The selection of the most suitable subset of training variables was performed by predicting different targets by using the same variables coming from the 3 previous days for the forecasted day [84]. The results of this procedure as performed for prediction of the maximum daily temperature are contained in Table 6. The values of the MSE are ranked in ascending order. The largest error value was obtained for the MLP training and testing with the daily sum of the precipitation, mean daily wind speed, and mean daily general cloud cover. These variables were, therefore, excluded from the predictive model for the maximum daily temperature.
The results of the variable selection for other predicted weather conditions are included in Tables A8-A11 in Appendix E, and the excluded variables are presented in Table 7. Variables that were not excluded became inputs of the models.

Predicted Condition Excluded Variables
Maximum daily temperature Daily sum of precipitation, mean daily wind speed, mean daily general cloud cover Minimum daily temperature Daily sum of precipitation, mean daily wind speed, mean daily general cloud cover Mean daily atmospheric pressure Mean daily relative humidity, maximum daily temperature, daily dew duration Mean daily wind speed Mean daily relative humidity, maximum daily temperature, mean daily general cloud cover Daily sum of precipitation Mean daily relative humidity, mean daily general cloud cover, mean daily wind speed

Results of Sensitivity Analysis
The results of the sensitivity analysis for the maximum temperature prediction with the values of the MSE ranked in descending order for the training and testing datasets are presented in Table 8. The results for other predicted weather conditions are included in Tables A12-A15 in Appendix E. In the case of the prediction of the maximum temperature, training and testing MLP model without the mean daily temperature, daily dew duration, and mean daily atmospheric pressure resulted in a remarkable increase in the error values.
In the event of the prediction of the minimum temperature, training and testing the MLP without the mean daily temperature, minimum daily temperature, and mean daily relative humidity caused a rise in the error values. When the mean atmospheric pressure was predicted, training and testing the MLP without the mean daily atmospheric pressure, mean daily temperature, and mean daily general cloud coverage resulted in growth of the error values.
For the wind speed prediction, training and testing the MLP without the mean daily wind speed, the daily sum of precipitation, and the mean daily temperature resulted in a remarkable increase in the error values. In the case of the prediction of the daily sum of the precipitation, training teh MLP without the daily sum of precipitation, mean daily atmospheric pressure, and daily dew duration caused a rise in the error values. The results of the sensitivity analysis conducted for the model containing all input variables before selecting the most suitable set are included in Appendix D.

Comparison of the Results Obtained by MLP, Two Other ML Models, and the Weather Service
To check the effectiveness of the forecast obtained with the use of our MLP model compared to the accuracy of the weather service forecast, the obtained results of the forecast of the selected weather conditions by the MLP model for the next day were compared with the results of the weather service forecast for 2018 for the city of Szczecin.
OpenWeather is an open-source weather data service that requires no license to download.
OpenWeather's API provides a user-friendly way to download data for a selected city without having to enter the geographic coordinates, provides current and archived data for 40 years back, and includes Polish diacritical marks, which is important for the location for which our study was conducted. The API of this service is also recommended by the Polish government service (https://www.gov.pl/web/popcwsparcie/standard-api-dla-u dostepniania-danych accessed on 1 March 2021). Table A16 in Appendix E presents sample values of the parameters forecasted from 1 January 2018 to 31 December 2020. The predicted parameters are listed below: • Maximum daily temperature in • C. We used the MAE, MSE, and R 2 score as a measure of the accuracy of the MLP model prediction and weather service, calculated for the test data as these are frequently used and proven measures found in the literature for the prediction of weather parameters [32,[35][36][37]. The smaller the values of MAE and MSE were, the more accurate the model was in predicting the tested parameters. High values of the R 2 score, close to 1, were evidence of the high efficiency of the tested model.
The tables with values of the weather conditions measured according to the IMGW, predicted by the MLP, and predicted by the service, include sample data from 1 January 2018 to 5 January 2018. This demonstrated the values of the parameters that we are investigating; however, the comparison was made for data from 1 January 2018 to 31 December 2020, and thus our study covers the whole 2018, 2019, and 2020 years. To compare the effectiveness of the MLP model and the weather service, quantitative model effectiveness models, such as MAE, MSE, and the determination factor R 2 score, were used.
Charts of the parameters measured and projected by the MLP model and weather service were compared in terms of the visual differences in the graphs. The tables also contain the prediction results of the studied weather conditions obtained using two other ML predictive models. One of the benchmark models was the LSTM neural network with a complexity of architecture and parameters comparable to the MLP model used in the study. The second model used for comparison was SVR.
These models were chosen because they are often used in weather prediction studies, as mentioned in our literature review. The ANN models were trained and tested 10 times; therefore, in the tables, the mean and standard deviation values of the results are provided. The results of the evaluation models on the trainnig and testing dataset are presented in Tables 9-13. Based on the small differences between the accuracies for the training and test data, there was no overtraining of the models or over-fitting to the training data. In the next step, the prediction results of the tested models and the weather service were compared for the test data.  Table A17 in Appendix E includes sample maximum temperature values as measured, predicted by the MLP, and predicted by the weather service. Table 9 presents the quantitative values of the MLP and two other predictive model effectiveness measures as well as the weather service in forecasting the maximum temperature. Figure 3 illustrates a comparison of the effectiveness of the MLP model and the weather service in forecasting the maximum temperature.
The MAE was equal to 2.0201 for the MLP and this is lower by 0.3252 compared with the MAE calculated for the weather service, which was equal to 2.3453. This shows that the MLP was more accurate compared with the weather service in terms of this measure of prediction accuracy. The MSE calculated for the MLP was lower than MSE calculated for the weather service by 2.673, which shows the higher accuracy of the MLP over the weather service in terms of the MSE values.
The value of the R 2 score for MLP was higher by 0.0175 compared with for the weather service; therefore, in terms of the R 2 score, the accuracy of MLP in forecasting the maximum temperature was higher. The accuracy of the MLP model for the maximum temperature forecasting was higher when compared with the weather service. Comparison with the LSTM and SVR model based on the accuracy metrics used showed a slightly higher accuracy of the MLP model over the compared models.
All of the models tested were able to achieve greater accuracy in predicting the maximum temperature compared with the weather service. A slightly higher effectiveness of the MLP model compared with for the weather service is noticeable on the graph. The differences between the measured and predicted values in the graph are small, both for the MLP model and for the weather service. This shows the high accuracy of the MLP model for the maximum temperature forecasting for the next day.   Table A18 in Appendix E contains examples of the minimum temperature observed values, the values predicted by the MLP, and those predicted by the weather service. Table 10 includes the quantitative values of the effectiveness measures of the MLP, two other predictive models, and the weather service in forecasting the minimum temperatures. Figure 4 displays a comparison of the effectiveness of the MLP and weather service in forecasting the minimum temperatures.
Here, the effectiveness of the MLP was higher than that of the weather service. The MAE was 1.0496 lower for the MLP, and the MSE had a value of 9.3824 less for the MLP than for the weather service. The determination factor R 2 score used to interpret the model effectiveness was 0.1133 higher for the MLP compared with for the weather service. These results show the much higher effectiveness of the MLP model over the weather service in forecasting the minimum temperature.  For the compared models, the prediction accuracy of the daily minimum temperature was slightly higher for the MLP, while the LSTM model performed better than the SVR. The higher effectiveness in forecasting the minimum temperature for the MLP model compared with for the weather service is evidenced because the chart of values predicted by the MLP is closer to the chart of observed values, especially for the time range from April to October. In the spring and summer months, the weather service forecasts show greater divergence from the observed values than they do in the autumn and winter months, i.e., from January to March and from November to December. The MLP model showed high accuracy throughout the year. Table A19 in Appendix E contains example values of the mean daily atmospheric pressure as measured, predicted by the MLP, and forecasted by the weather service. Figure 5 presents a comparison of the effectiveness of the MLP model and the weather service in forecasting the mean daily atmospheric pressure. Table 11 includes quantitative values of the effective measures of the MLP, two other predictive models, and the weather service in forecasting the mean daily atmospheric pressure.

Forecast of Mean Daily Atmospheric Pressure
The effectiveness of the MLP model was higher than that of the weather service when forecasting this parameter. The MAE value was 0.373 less for the MLP model, and the MSE value was 8.3269 less for the MLP model than for the weather service. The value of the determination factor R 2 score was 0.0478 higher for the MLP model than for the weather service, which shows the higher effectiveness of the MLP in predicting the atmospheric pressure.
However, the calculated value of R 2 score is not as high as the value of this factor calculated for the MLP when forecasting the maximum and minimum temperatures.This observation indicates that the accuracy of the forecasting of atmospheric pressure by the MLP was lower than that of the forecasting of temperatures. For the prediction of the daily atmospheric pressure values, the MLP model enabled slightly higher accuracy compared with the LSTM and SVR. A higher effectiveness in forecasting the atmospheric pressure was observed for the MLP model compared with for the weather service overall in the months in the surveyed year. Observed values Weather service forecast MLP prediction  Table A20 in Appendix E contains example values of the mean daily wind speed as measured, predicted by the MLP, and predicted by the weather service. Figure 6   The effectiveness in the wind speed forecasting was higher for the MLP model than for the weather service as, in the graph, the values predicted by the MLP show much smaller deviations from the observed values compared with the values predicted by the weather service. Table 12 presents the quantitative values of the MLP, two other predictive model effectiveness measures, and the weather service in forecasting the mean daily wind speed. The MAE value was lower for MLP by 0.3731, and the MSE value was lower for MLP by 1.2726 compared with for the weather service; therefore, there was a higher accuracy of the MLP model compared to the weather service results.

Mean Daily Wind Speed Prediction
The value of the determination factor R 2 score also showed a higher efficiency ofor the MLP model compared with the weather service for forecasting the mean daily wind speed as this value was higher by 0.3757 for MLP compared with for the weather service. However, this is lower compared to the value of the R 2 score calculated for the MLP model for temperature and atmospheric pressure prediction, which indicates a lower effectiveness of the MLP for wind speed prediction. For the prediction of the daily wind speed, the two other prediction models obtained higher prediction accuracy. The best results were obtained using the LSTM model, and the SVR model also showed a small advantage over the MLP model. Table A21 in Appendix E includes examples of the daily precipitation values as measured, predicted by the MLP, and forecast by the weather service. Figure 7   The effectiveness of the MLP model and the weather service forecasting the daily sum of the precipitation was lower than with the other weather parameters. Table 13 presents quantitative measures of the effectiveness of the MLP, two other predictive models, and the weather service in forecasting the daily precipitation sum. For the prediction of the daily precipitation, the values provided by the weather service were more accurate than the values predicted by the MLP model and the other models.

Daily Precipitation Prediction
The value of MAE was 0.7731 higher for the MLP, and the value of the MSE was 0.45 higher for the MLP compared with for the weather service. The value of the R 2 score was lower by 0.0206 for the MLP when compared with the weather service. The prediction accuracy of the daily precipitation was comparable for the MLP and LSTM models, while it was lower for the SVR model. The effectiveness of all the predictive models was lower for the sum of the daily precipitation than it was for the other weather parameters analyzed in this survey. Figure 8 shows a bar graph comparing the values of the prediction effectiveness measure R 2 score for all tested weather parameters for the MLP and weather service. For the MLP, the highest predictive performance was observed at the maximum (0.9573) and minimum (0.9423) temperatures. This was followed by atmospheric pressure, for which the R 2 score was 0.8327, followed by wind speed (0.6029), and lastly by precipitation (0.5184). For the prediction of the daily maximum temperature, the predictions of the compared models and the weather service had high accuracy and were comparable; however, all models achieved higher accuracy compared with the weather service. For the daily minimum temperature, the prediction accuracy of the benchmarked models was high and comparable. They were further superior to the weather service than was observed with the maximum temperature.
In the case of prediction studies of the daily mean atmospheric pressure, the accuracy of the tested models was lower than for the temperatures. The MLP and LSTM models showed the highest prediction accuracy, and the prediction accuracy by SVR was slightly lower. However, all the models tested provided a more accurate prediction than the weather service did. The greatest advantage of the predictive models over the weather service was observed for the daily average wind speed. The LSTM model obtained the highest accuracy in this case, followed by SVR, and the accuracy of the MLP was minimally lower in comparison.
The daily precipitation is the weather condition that was the most difficult to accurately predict for the models used in this study. It is the only weather condition in this study for which the weather service forecast was more accurate than the other predictive models. MLP and LSTM were superior to the SVR model in terms of the daily precipitation prediction accuracy.
The output data histograms of the weather conditions predicted with the MLP model are presented in Appendix C in Figure A5. The observed shape of the histograms for the output data reflect the shapes and nature of the input data before normalization. The maximum daily temperature shows a multimodal distribution that was also observed for the input data representing this feature.
In the case of the mean daily wind speed, we observed that more samples were shifted toward low values on the histogram, which was similar to the input data. The mean daily atmospheric pressure demonstrated a normal distribution, which also reflects the nature of the input data for this parameter. For the daily sum of the precipitation, more samples had low values, similar to what was observed for the input data representing this parameter.
To investigate the reason that the predictions for target variables, such as the atmospheric pressure, wind speed, and daily precipitation sum, were less effective than the predictions for temperature, we checked whether their values in the time series depend on the features based on which they were predicted or whether they are rather random.
To check the dependence of the target weather variables on the explanatory variables (i.e., the traits), the strength of the correlation between the forecast weather parameters and the traits in the training dataset was examined using the r-Pearson correlation coefficient.
We examined the number of characteristics for which the absolute value of the Pearson's coefficient of correlation with the predicted parameter was equal to or greater than 0.6, as this value is considered to be a strong correlation threshold. Tables 14-16 includes characteristics that reached an r-Pearson correlation coefficient greater than or equal to 0.6 for the eight weather parameters that were training features. In the column next to the names of these features, the values of the r-Pearson correlation coefficient are presented. For predicted parameters, such as the wind and precip, there were no training characteristics where the correlation was greater than 0.6. Table 14. The parameters with the strongest correlation to the predicted mean daily atmospheric pressure.

Parameter Mean Daily Atmospheric Pressure
Mean daily atmospheric pressure 1 day ago 0.8014 The above analysis demonstrated that parameters, such as the maximum temperature and minimum temperature, had a strong correlation with many other parameters that were features of the training dataset. For these variables, the best prediction results were obtained using the MLP model. The results of the above analysis were different for the mean daily atmospheric pressure.
In the case of this variable, there was a strong correlation with only one parameter that was a feature from the training dataset. This was the atmospheric pressure from one day ago. The effect of the existence of only one parameter from a set of features with a strong correlation for this variable is visible as the reduced accuracy of the prediction of this parameter. For variables, such as the mean daily wind speed and daily precipitation sum, there was no parameter from the feature set that achieved a strong correlation. For these three variables, the prediction was even less accurate.

Discussion
The study described in this report aimed to show the possibility of using a unidirectional multilayer neural network to forecast selected weather indicators and to compare the results achieved with other forecasting models. We assumed that, with a reliable and sufficiently large set of weather data, a properly constructed neural network model, and appropriate software, it would be possible to effectively forecast selected weather indicators.
The results of an application using ANN as presented in this report confirm that Artificial Neural Networks can be useful as a tool for forecasting weather indicators. Although a simple construction of the MLP model-comprising three layers of neuronswas used in the study, the results obtained were satisfactory.
The surprising effectiveness of our implementation of the MLP model with comparable or, in most cases, higher effectiveness compared with the other forecasting models proves that this forecasting model was properly designed and implemented and is a model suitable for the assumed purpose, i.e., forecasting selected weather parameters, and that the data used to train the model were properly prepared.
However, there were some difficulties in applying the proposed weather forecasting model. As reported in the literature [18], neural networks can be susceptible to learning false relationships between data. A pure data-based weather forecasting model may fail to respect basic physical principles and, thus, generate false forecasts because it does not take into consideration that every atmospheric process is affected by physical laws.
There are specific properties of weather data for which classical ML concepts (which work for typical problems solved by ML, such as computer vision and speech recognition) are not effective enough in a complete weather prediction system. The reason for this is the necessity for the model to handle the complexity of the meteorological data and feedback processes to provide accurate prediction results. Another difficulty encountered when using MLPs for weather forecasting is because ANNs are good interpolators but poor extrapolators. The dataset used to train the ANN must, therefore, contain numerous and heterogeneous examples to cover the widest range of cases that the ANN is expected to predict [34].
A separate training and testing dataset containing regional data would be required for each location where the method would be applied. The training dataset needs to be updated regularly due to occurring climate changes, and the network model needs to be re-trained due to the changing climate patterns in the world [85]. Another disadvantage of MLPs and deep learning models over other methods is that their training process takes a long time [62].
Forecasting time series, which include a prediction of the weather parameters changing over time, is an important area of machine learning. The time component provides useful information that is used in the construction of the machine learning model, but it also brings with it problems that make it difficult to accurately predict certain variables. If a time series' data are correlated over time, it is much easier to obtain an accurate prediction because the model uses historical values in the machine learning process and then generates a forecast for the future from these.
When data values change randomly over time, the model cannot predict future changes based on historical events with great accuracy [82]. The implications of this are highly accurate prediction results for weather conditions that show a high correlation with the other variables included in the prediction model and poor results for weather conditions for which no such correlations exist. In most of the analyzed cases, the MLP model achieved higher or comparable results to the forecasts from the internet weather service.
The highest prediction efficiency was observed for the maximum and minimum temperature. These are conditions for which there is a strong correlation in time with many other weather indicators. Thus, there is less risk of them adopting random values. Therefore, these are the parameters that are particularly suitable for prediction by artificial neural networks.
Our conclusion is that it is worthwhile to build models of machine learning for the prediction of time series with values that are strongly correlated with each other because then there is a high probability of obtaining an accurate prediction. These models will be useful, for example, as a correcting tool to forecast the weather conditions at the local scale when numerical modeling is insufficient due to specific local conditions-for instance, as described earlier in the paper, where numerical modeling is performed at a resolution too high to account for the influence of the local topography of the Cadarache valley localized in southeastern France) [34].
In contrast, the overestimation of the predictive capacity of the model for certain parameters results in its low competitiveness compared to the possibilities offered by alternative methods [86]. Our investigation confirmed the facts available in the literature review, in which most of the research concerned the use of machine learning methods to forecast temperatures for which the predictions are accurate because of their strong correlation with other weather indicators.

Conclusions and Future Research Directions
This study presents the use of an Artificial Neural Network with a Multilayer Perceptron (MLP) architecture regressive model to forecast selected weather parameters. The MLP model was built, and its effectiveness was compared with the forecasts available in the website archive. The developed model is a lightweight solution and can be an element of any application that would require forecasting of the above weather parameters. The most important aims achieved by us are listed below: • This study presents a successful attempt to use an application based on a model of a unidirectional multilayer Artificial Neural Network to forecast selected weather conditions for a selected location, i.e., the city of Szczecin. • The application used in this survey was successful for a local scale study. • We obtained satisfactory results, i.e., accurate forecasts, with the simple design of the MLP model, which comprised three layers of neurons. • We confirmed that this forecasting model was properly designed and implemented, that this is a model suitable for forecasting selected weather parameter, and that the data used to train the model were properly selected and prepared. • We analyzed and explained the reasons for the different forecasting accuracy results for the different weather parameters.
Our approach in this study was limited to using data and performing forecasts of the selected weather conditions for the local area of Szczecin city in Poland. The weather data set limited to the territory of the city of Szczecin was a small set. As our study was focused on a regional territory and dataset, the usefulness for a larger area would be limited. The achieved results encourage future investigations to generalize whether the satisfactory scores obtained by this application working on local data will be repeatable and useful in the case of other regions of Poland.
The next direction of further work is to compare the performance of this application for a larger dataset including Poland and the European area. It would also be desirable for the direction of future work to include comparing the effectiveness of our method with applications working on other, newer, and more advanced machine learning models to see if the results obtained with our proposed method working in a weather forecasting application can be improved.
The MLP provided slightly more accurate predictions of temperature and atmospheric pressure compared with the LSTM and SVR. This indicates that an appropriate choice of data set, learning features, and parameters for the MLP training can provide results comparable to more complex ML models. For the wind speed prediction, LSTM and SVR showed a slight advantage, and for the precipitation sum prediction, LSTM. Thus, we concluded that, for weather conditions that are more difficult to predict, more complex ML models, especially LSTM, represent an opportunity to obtain more accurate predictions, and it is worthwhile to focus on research aimed at adjusting their structure to further improve the prediction performance.
As the MLP model did not achieve equally effective results for all the predicted weather conditions, further work is needed to improve the accuracy and expand the model. To improve the accuracy of the prediction of the weather parameters with the MLP model, more hidden layers in the network model can be used, i.e., a deep network model can be used because the quality of the multilayer network is higher compared to traditional simple MLP networks with only a few hidden layers. For more accurate forecasts, the precision of the information provided by the training data can be increased by using more densely spaced samples in the training dataset, e.g., every hour instead of every 24 h.
To obtain more accurate forecasts, more training samples can be used for the day that the weather parameters are forecast, e.g., as a training sample of data from ten days ago. Further work includes testing the effectiveness in the problem of weather parameter prediction for more advanced models, such as deep networks, CNN models, LSTM models, RNN models, and SVR models, as well as comparing the accuracy of the results obtained by these with the results of the implemented MLP. Such a study, in the opinion of the authors, could provide interesting results and enable empirical evaluation of the effectiveness of the MLP model from this work.  Acknowledgments: The authors would like to thank the editor and the anonymous reviewers, whose insightful comments and constructive suggestions helped us to significantly improve the quality of this paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Fundamentals of Artificial Neural Networks and the Multilayer Perceptron
The concept of a neural network is defined by mathematical structures that perform calculations or signal processing by elements called neurons. Neurons are arranged in layers. Signals from each neuron of the preceding layer come to each neuron of the following layer. For predictive purposes, multilayer networks are used, containing a minimum of three layers: the input layer, the hidden layer (this number may be greater), and the output layer. The number of neurons present in the input layer corresponds to the number of input features.
Input layer elements do not perform data processing. Their role is to receive signals, i.e., input variables for the network, and then to separate them and pass them on without modifying the neurons of subsequent layers. Hidden layers, i.e., between the input and output layers, may have a different number of neurons. Their number in practice is empirically selected and is an important element influencing the quality of the results. The task of the hidden layer is essentially information processing. The task of the output layer is to process data and generate final output signals, i.e., the network response (predicted values).
A Multilayer Perceptron is a unidirectional network that contains one or more hidden layers. In this type of network, the data stream flows in one direction-from the input layer, through the subsequent hidden layers, to the output layer. In the hidden layers and the output layer, a neural model with an aggregation of signals using the scalar product and usually a non-linear activation function f (sigmoid or tangensoid functions are typically used) as using only a non-linear neural model (with a non-linear activation function) guarantees the possibility to use a Multilayer Perceptron in modeling nonlinear phenomena.
Learning takes place in a supervised mode, and the most common is backpropagation or another algorithm. Each input is associated with a parameter called the weight, which is subject to modification during the training phase of the network, while, during its operation, it is constant. If, as x is denoted by the n-elementary vector containing the input signals from the training sequence given to the network input, and, as w is an nelementary vector with corresponding weighting values, this processing of input signals in a mathematical neuron model is described by a general rule defined by the Formula (A1): where f means activate, while g means aggregate. The output signal, which takes the value of the activation function, is separated and passed on to subsequent neurons or is the final network response. An aggregation function g in neuron models is most often used for the scalar product of x and w vectors according to the Formula (A2) [87]: This means that the sum of the values of the products of the input signals and the corresponding weights is calculated. Then, the resulting value is subjected to the activation function, and the result is the output value of the neuron y according to the Formula (A3) [88]: When aggregating inputs using a scalar product, the following functions are most commonly used as activation functions for f : sigmoid (logistic), which is also called an s-shape function because of the characteristic shape of the graph represented by the Formula (A4) and tangensoid (hyperbolic tangent) as defined by the Formula (A5) [74,75,87]: where s k is the total stimulation, i.e., a linear combination of the weights and features of a function that can be calculated using the Formula (A6): where, by k, we mean any neuron with an index in the range of 1 . . . K, and K means the number of hidden neurons. In this way, the network's response to the signal (pattern) given to its input is known. For each output, we can specify the function F that generates the network output signals. The set of parameters of the F function is a set of all network scales (W). For a specific output of j, there is a dependency described by the Formula (A7): where X is a vector for the network input signals. The set of W encodes the knowledge about the modeled phenomenon that the network obtains during the learning process [87]. The expected output value is known because it is in the training sequence. It is possible to change the weight in the network in such a way that the value obtained at the output is close to the standard value. Then the error at the neuron output is calculated, which is the difference between the value at its output and the exact value. In this way, the error for the last layer is defined. For the hidden layers, to define the error, we use an algorithm named the backpropagation algorithm [88]. The backpropagation algorithm includes four stages: 1. Initialization of weights with low random values.

2.
Forward propagation (feedforward)-at this stage, each neuron receives and sends a signal to the hidden neurons. Each latent neuron calculates the value of the activation function and sends a signal to the output unit, which calculates the output signal.

3.
Backpropagation-after comparing the output values calculated by the network with the exact value, the error is calculated and sent back to all units.
Modern second-order algorithms, such as the conjugated gradient method and Levenberg-Marquardt method, are faster, which could lead to more frequent use. However, the classic backpropagation method has many important advantages, first of all, it is the simplest algorithm to understand for most users of neural networks, which makes it the most popular with neural network users [74,75]. The error value depends on the weights of the network W, and thus appropriate modification of the weights during the learning process leads to its reduction by striving to minimize the cumulative error for all elements of the training sequence. This error is determined by the Formula (A8): where n is the number of patterns in a training sequence. The minimization process is carried out using iterative methods. This begins with a randomly selected point in the W 0 weighting space and consists of determining the next points W 1 , W 2 , . . . so that E(W 0 ) > E(W 1 ) > E(W 2 ) > . . . is seeking to find a global minimum of the E(W) error function spanning over the multidimensional space of the W weights. Before running the backpropagation algorithm, the weights are selected randomly, assuming an even distribution within a certain numerical range. Correction of the network weights with the backpropagation algorithm determined by the Formulas (A9) and (A10): where y means a neural network response and y * is the expected value for the input. Performing subsequent iterations that change the network scales will minimize the global error. The single processing of all patterns in a training sequence is referred to as epoch. To effectively train the network, it is necessary to carry out many epochs (often in the number of several thousand) [87].
In the backpropagation algorithm, the output error is propagated (backpropagated) from the back (from the output to the input layer) according to the connections of the neurons between layers and considering their activation functions. The modification of the weights that is carried out each time a vector with a sequence of training values is given to the input is called an incremental update of weights. The learning rate η is an element that significantly affects the convergence of the error backpropagation algorithm. There is no general method of determining its value; it is chosen empirically and depends on the type of problem to be solved [88].

Appendix B.1. Mean Absolute Error (MAE)
This is a commonly used measure of the error made by the network. The MAE is calculated by the Formula (A11): where y i is the observed value, y i is the network response (predicted value), and N is the total number of samples [15].

Appendix B.2. Mean Squared Error (MSE)
This is a usefully quantitative measurement of the effectiveness of the model. It is an average value of the cost function SSE (Sum of Squared Errors), which is reduced during training of the model. The formula used to calculate it is (A12): If the value of the MSE calculated for the test samples is much higher than for the training samples, it means that the model has been overtrained.
This can be defined as a standardized version of the MSE method. This allows a better interpretation of the model's performance. It is often treated as a measure of the quality of the regression model. Its value shows to what extent the predictors, i.e., the explanatory variables that have been introduced into the model, allow for the prediction of the values of variables in the test dataset. The R 2 score is calculated according to the Formula (A13) [74,75]: where SSE means the sum of the squares of the errors calculated from the Formula (A14): SST means the total sum of squares calculated using the Formula (A15): that is a variance of the answer. For a set of training data, the R 2 score takes values between 0 and 1, but with test samples, it can be negative. A value of R 2 equal to 1 shows a model ideally fitted to the data with a simultaneous MSE value of 0 [74,75].

Appendix B.4. Pearson Correlation Coefficient
The r-Pearson correlation coefficient is a measure of the strength of the linear correlation between the variables, which takes values in the range from −1 to 1. Correlation values in the range from 0 to 1 show an increasingly strong positive correlation. This means that the data are positively correlated as their values increase simultaneously. Correlation values between 0 and −1 show a negative correlation or the opposite [89]. The values of the r-Pearson correlation coefficient, which oscillate around zero, suggest a weak linear relationship [90]. The r-Pearson correlation coefficient is calculated using the Formula (A16) [91].
where cov(x, y) = E(x · y) − (E(x) · E(y)), where r(x, y)-the r-Pearson correlation coefficient between the variables x and y, cov(x, y)-the covariance between the variables x and y, σ-the standard deviation, and E-the expected value. Table A1 contains ranges for the absolute values of the r-Pearson correlation coefficient and their interpretations.   Normalized data (f) Figure A2. Input data histograms (the mean daily temperature before (a) and after normalization (b), minimum daily temperature at ground level before (c) and after normalization (d), and daily sum of precipitation before (e) and after normalization (f)).  Figure A3. Input data histograms (daily dew duration before (a) and after normalization (b), mean daily general cloud cover before (c) and after normalization (d), and mean daily wind speed before (e) and after normalization (f)).  Figure A4. Input data histograms (the mean daily relative humidity before (a) and after normalization (b), and mean daily atmospheric pressure before (c) and after normalization (d)).  Figure A5. Output data histograms (the maximum (a) and minimum (b) daily temperature, mean daily atmospheric pressure (c), mean daily wind speed (d), and daily sum of precipitation (e)).

Appendix D. Sensitivity Analysis for Each Input Parameter Of Model
The results of the sensitivity analysis for the maximum temperature prediction with the values of MSE ranked in descending order for the training and testing datasets are listed in Table A2. The results for the other predicted weather conditions are included in  Tables A3-A6. In the case of the prediction of the maximum temperature, the training and testing MLP model without mean temperature 1 day ago, training without the sum of precipitation 1 day ago, general cloud cover 1 day ago, mean atmospheric pressure 1 day ago, and testing without mean atmospheric pressure 2 days ago demonstrated a remarkable increase in the error values.
In the event of the prediction of the minimum temperature, the training and testing MLP without minimum temperature 1 day ago, the sum of precipitation 1 day ago, general cloud cover 1 day ago, relative humidity 1 day ago, and testing without mean temperature 1 day ago caused a rise in the error values.
When the mean atmospheric pressure was predicted, the training and testing MLP without mean atmospheric pressure 2 days ago, training without dew duration 3 days ago, mean temperature 1 day ago, minimum temperature 2 days ago, testing without mean atmospheric pressure 1 day ago, minimum temperature 1 day ago, and mean temperature 3 days ago resulted in the growth of the error values.
For the wind speed prediction, the training and testing MLP without wind speed 1 day ago, training without the sum of precipitation 2 days ago, general cloud cover 2 days ago, dew duration 1 day ago, testing without wind speed 3 days ago, and mean temperature 2 days ago demonstrated a remarkable increase in the error values.
In the case of the prediction of the daily sum of precipitation, the training MLP without mean atmospheric pressure 1 day ago, dew duration 2 days ago and 1 day ago, the sum of precipitation 1 day ago, testing without mean temperature 1 day ago, and maximum temperature 1 day ago caused a rise in the error values.