A Hybrid Multi-Objective Optimizer-Based SVM Model for Enhancing Numerical Weather Prediction: A Study for the Seoul Metropolitan Area

: Temperature forecasting is an area of ongoing research because of its importance in all life aspects. However, because a variety of climate factors controls the temperature, it is a never-ending challenge. The numerical weather prediction (NWP) model has been frequently used to forecast air temperature. However, because of its deprived grid resolution and lack of parameterizations, it has systematic distortions. In this study, a gray wolf optimizer (GWO) and a support vector machine (SVM) are used to ensure accuracy and stability of the next day forecasting for minimum and maximum air temperatures in Seoul, South Korea, depending on local data assimilation and prediction system (LDAPS; a model of local NWP over Korea). A total of 14 LDAPS models forecast data, the daily maximum and minimum air temperatures of in situ observations, and ﬁve auxiliary data were used as input variables. The LDAPS model, the multimodal array (MME), the particle swarm optimizer with support vector machine (SVM-PSO), and the conventional SVM were selected as comparison models in this study to illustrate the advantages of the proposed model. When compared to the particle swarm optimizer and traditional SVM, the Gray Wolf Optimizer produced more accurate results, with the average RMSE value of SVM for T max and T min Forecast prediction reduced by roughly 51 percent when combined with GWO and 31 percent when combined with PSO. In addition, the hybrid model (SVM-GWO) improved the performance of the LDAPS model by lowering the RMSE values for T max Forecast and T min Forecast forecasting from 2.09 to 0.95 and 1.43 to 0.82, respectively. The results show that the proposed hybrid (GWO-SVM) models outperform benchmark models in terms of prediction accuracy and stability and that the suggested model has a lot of application potentials.


Introduction
The weather has a considerable impact on the daily life of all living things, including humans and animals, and numerous industry sectors, that is why weather forecasting is one of the most regularly explored disciplines [1]. Because temperature is so closely linked to energy generation and agricultural operations, it is the most important weather factor [2,3] as low and high temperatures can affect agricultural activities, precise temperature forecasting is essential for avoiding crops damage [4]. However, because weather parameters, including temperature, are continuous, multi-dimensional, data-dense, chaotic, and dynamic, precisely predicting temperature is always difficult [5,6].
There are two types of models used in weather forecast research: physics-based models and data-based models. First, the effects of atmospheric dynamics, thermal radiation, and the influence of green spaces, lakes, and oceans were investigated numerically using physics-based weather forecasting models. Most public and commercial weather forecasting systems use physics-based models [1,7]. Data-driven models, on the other hand, predict the weather using statistics or algorithms based on machine learning.
Data-driven models have the advantage that they can recognize unexpected patterns in the weather system with no prior knowledge. However, it is possible that large amounts of data are generated and it is not understood how the models work. Physical techniques have the advantage that they can be easily understood and extrapolated from observable situations. The disadvantage is that it requires well-defined prior information and a lot of computing power [8,9] Data-driven models for weather forecasting have recently been explored with greater intensity due to an increment in the number of features and observations.
Numerical weather prediction models (NWP), based on physical correlations of parameters and principles of atmospheric dynamics, have become an important tool for predicting numerous meteorological components, including air temperature. Because of their coarse grid resolution and imprecise physical parameterizations, NWP models tend to simplify the precise properties of terrestrial, atmospheric, and ocean systems. Due to incorrect physical parameterization, incorrect initial/boundary conditions, and domain and resolution dependence, uncertainties in NWP models lead to model distortions in air temperature predictions, despite continued advances in model performance. As a result, post-processing of the model output may be necessary to remove distortion for operational use of the models. Several statistical approaches were used to correct for bias in the air temperature data obtained by the NWP models. As a result, post-processing of the model output may be necessary to remove distortion for operational use of the models. Several statistical approaches have been used to correct bias in obtained air temperature data provided by the NWP models [10][11][12]. To improve forecasting effectiveness, these tactics applied to weather elements generated in NWP models in different countries.
The most widely used methods to correct bias in air temperature prediction are the Model Output Statistics (MOS) and Kalman Filter (KF) approaches. The MOS improves the prediction precision by using a statistical linear model derived between the results of the historical model and the observational data for the output of the NWP model [12]. Thanks to recent advances in computing resources, KF is now widely used to solve nonlinear problems. KF first corrects the NWP model result by predicting the air temperature. The parameters of the next forecast phase are changed recurrently using the observed air temperatures [13,14]. Despite the use of a variety of machine learning techniques to reduce temperature drift, improving modeling accuracy remains a challenge. Recently, some researchers have attempted to improve predictive performance by combining the results of various machine learning algorithms in a variety of areas [15][16][17][18]. The results of all these researches show that integrating multiple machine learning models improves performance by overcoming the limitations of each classifier separately. Since machine learning methods are not affected by multicollinearity in input variables, they can process a large number of them. Unlike MOS and KF, which require bias correction to generate a model for each station, machine learning can be used to build a model that works for many stations. When spatially continuous input variables are introduced into machine learning models, the spatial distributions of predictions can be tracked.
Different learning algorithms (Support Vector Regression (SVR) and Random Forest (RF)) were used to correct the bias in the air temperature outputs of the NWP model. Eccel et al. (2007) [19] evaluated two machine learning approaches (ANN and RF) to improve the cryogenic forecasting capabilities of two NWP models, ECNWF and Local Area Model Italy (LAMI), in an Italian Alpine region. They found RF gave the best results compared to other methods, with the added benefit of being easy to automate. Yi et al. (2018) [20] improved the accuracy of air temperature from the Local Data Assimilation and Prediction System (LDAPS) model in Seoul, South Korea, by using SVR and a linear regression model, finding that SVR showed higher correction accuracy than the linear regression model. In addition, the most widely used technology for predicting air temperature is the artificial neural network (ANN) [21]. Marzban (2003) [22] used ANN for post-processing of the Advanced Regional Prediction System (ARPS) model's hourly temperature outputs, obtaining an average 40% reduction in the mean squared error for all validated weather stations. Vashani et al. (2010) [23] found that the ANN and KF methods show better bias correction performance than the other methods for the summing accuracy of 30 weather stations in Iran, and ANN produced slightly higher accuracy than KF for longer forecast ranges. Zjavka (2016) [24] reported that a polynomial neural network could successfully bias-correct the National Oceanic and Atmospheric Administration (NOAA) meso-scale model to forecast hourly air temperature. To correct for bias in air temperature estimates from the European Center for Mid-Range Weather Forecasts (ECMWF) model, Isaksson (2018) [25] compared a deep neural network with KF. During this research, it was found that in most of the verified stations the neural network model exceeds KF in terms of error reduction. To correct for LDAPS, Dongjin Cho et al. (2020) [14] used a multi-model set (MME) and other machine learning techniques, with the MME model outperforming the other algorithms in terms of generalizability.
Using a support vector machine algorithm-based multi-objective grey wolf optimizer (GWO-SVM), this study seeks to eliminate distortion in LDAPS air temperatures, one of the NWP model outputs produced by the Korea Meteorological Administration (KMA). To our knowledge, no studies have been performed on improving the prediction of air temperature derived from the NWP model through an optimized multi-objective approach based on a machine learning algorithm. As a result, the major contributions of this paper are:

1.
Developing a hybrid model (GWO-SVM) to improve the forecasting of the daily maximum and minimum air temperatures produced by the NWP model; 2.
The proposed optimizer model (GWO) is compared with benchmark optimizers regarding the prediction accuracy and stability of SVM algorithm; 3.
Examine the proposed model's forecasting in comparison with other machine learning approaches.
The rest of this work is arranged in the following manner. The suggested model is implemented in Section 2, which introduces the required theories, describes the collected data and prediction stages, and describes the proposed model's implementation. Section 3 presents the prediction results and significant challenges raised by this research. Finally, the conclusion is presented in Section 4.

Materials
The dataset was obtained from the UCI online resource [26] and used to correct for bias in the next day's maximum and minimum air temperatures expected by the Korea Meteorological Administration's LDAPS model for Seoul, South Korea. This dataset contains data for the summer from 2013 to 2017. It includes fourteen weather forecast data for the Numerical Weather Forecast (NWP), two in situ observations, and five geographic ancillary variables over Seoul, South Korea, throughout the summer. In this study, 14 forecast data from the LDAPS model, forecast data for the next day, and five auxiliary variables were used as input variables, being the maximum and minimum air temperatures of the next day (Forecast T max and Forecast T min) as objective variables (Table 1).

Variable Type Abbreviation (unit) Description
The variable that predicted using LDAPS SVM is a machine learning model that is commonly used. It is ideally suited to small sample sizes and has a strong statistical base [27]. In the disciplines of energy, ecology, hydrology, and economics, SVM has a wide range of applications [28][29][30][31][32]. In a regression issue, the training set is defined as [33,34] x j , y j | x j , y j ∈ R n , j = 1, 2, · · · n (1) where x j is the input and y j is the output. The SVM model's detailed form is: where is the weighted vector, φ(x) is the nonlinear mapping function; c is the deviator. In the SVM model, Two hyper-parameters that influence prediction performance are the kernel width and the penalty factor.

Multi-Objective Grey Wolf Optimizer
The grey wolf optimizer [35] is the foundation for building a multi-objective grey wolf optimizer (GWO). The GWO algorithm is a meta-heuristic algorithm based on wolf hunting behavior [9,36]. Every wolf in the herd has the potential to solve the problem. The ideal, suboptimal, and alternate solutions are represented by the four levels of the wolf swarm. Wolves approach their prey when they find it. Its position equations are: where the separation distance between the prey and the wolf is given by GWO saves the top three solutions and uses Equations (4) and (5) to identify the optimum solution and continuously update the position of the grey wolf.
where α, β, and γ are different levels of grey wolves. The newly created individual is compared to the archived individual after each iteration. Furthermore, all individuals are categorized depending on the distance of the objective function value to avoid an overabundance of similar individuals. Second, the selection procedure of the leader wolf has shifted. Overcoming the problem of directly selecting three non-dominant solutions using the Pareto technique [37] by using roulette to choose the archive's leader wolf. Equation (6) ca calculate the probability of each hypercube [38].
where c is a constant; L i is the number of Pareto optimal solutions; P i is the probability of the hypercube.
In this paper, GWO has been used among other optimization algorithms because the advantages of GWO are as follows [35,39]: easy to implement due to its simple structure; less storage and computation requirements; faster convergence due to continuous reduction in search space; fewer decision variables; and ability to avoid local minimums. With only two control parameters to adjust the performance of the algorithm, which insures better stability and avoids complexity.

Proposed Approach
As shown in Figure 1, the prediction system includes three phases: (1) data preprocessing, (2) developing an optimization and prediction system, and (3) performance analysis.

Data Preprocessing
The main purpose of this step is to remove the artifacts from the dataset (variables that have many missing data points, outlier data, and skewed data) to improve prediction system performance. After that, split the dataset into two parts. One part is used for training the regression model, while the other part is used for the final evaluation of the model. Data processing steps are shown in the following: (a) Exploratory Data Analysis (EDA) EDA is a common approach [39] for explaining the fundamental characteristics of a dataset by studying the characteristics, usually using visual methods. Histogram and the Interquartile Range (IQR) algorithm were employed to investigate and seek information about dataset artifacts.

(b) Removing the Outliers
Outliers were eliminated by applying winsorization [40], a statistical modification that reduces the impact of potential outliers by limiting extreme values in the data. This research investigates different threshold values for removing outliers from data. It will be discussed in the discussion section.

(c) Skewness Reduction
Data skewness has a significant impact on the predictive model's accuracy. Skewed data have a distribution that is pushed to one side or the other rather than being normally distributed. Wherefore, to improve accuracy, skewness should be removed from the variables. A log transformation is used to reduce skewness. A log transformation belongs to the more general family of Box-Cox transformations Box-Cox, 1964) [41], a Box-Cox transformation T λ is defined as where is a positive variable.

Data Preprocessing
The main purpose of this step is to remove the artifacts from the dataset (variables that have many missing data points, outlier data, and skewed data) to improve prediction system performance. After that, split the dataset into two parts. One part is used for training the regression model, while the other part is used for the final evaluation of the model. Data processing steps are shown in the following: (a) Exploratory Data Analysis (EDA) EDA is a common approach [39] for explaining the fundamental characteristics of a dataset by studying the characteristics, usually using visual methods. Histogram and the Interquartile Range (IQR) algorithm were employed to investigate and seek information about dataset artifacts.
(b) Removing the Outliers Outliers were eliminated by applying winsorization [40], a statistical modification that reduces the impact of potential outliers by limiting extreme values in the data. This research investigates different threshold values for removing outliers from data. It will be discussed in the discussion section.

(c) Skewness Reduction
Data skewness has a significant impact on the predictive model's accuracy. Skewed data have a distribution that is pushed to one side or the other rather than being normally distributed. Wherefore, to improve accuracy, skewness should be removed from the variables. A log transformation is used to reduce skewness. A log transformation belongs to the more general family of Box-Cox transformations Box-Cox, 1964) [41], a Box-Cox transformation T λ is defined as where x is a positive variable.

Development Regression Algorithm
After the preprocessing is complete, the processed data are entered into the SVM-GWO hybrid regression system. This model comprises two operations (training and optimization of the regression model). These two operations are synchronized and executed in the training set. Both SVM training and SVM optimization are carried out simultaneously. When the optimization is complete, the SVM training is also completed. Academics often create an objective function to reduce the training set's prediction error in the traditional optimization problem (single-objective optimization). Since the multi-objective optimization used in this article considers both the precision and the stability of the prediction, two objective functions are defined: where Obj St and Obj Acc are the objective stability and prediction precision functions, respectively; RMSE training is the RMSE in the training set; S t is the size of the training set's sample; A k and P k are the actual and predicted values at time k; std is the population's standard deviation. Figure 2 shows the Flow chart of the hybrid regression system (SVM-GWO).

Development Regression Algorithm
After the preprocessing is complete, the processed data are entered into the SVM-GWO hybrid regression system. This model comprises two operations (training and optimization of the regression model). These two operations are synchronized and executed in the training set. Both SVM training and SVM optimization are carried out simultaneously. When the optimization is complete, the SVM training is also completed. Academics often create an objective function to reduce the training set's prediction error in the traditional optimization problem (single-objective optimization). Since the multi-objective optimization used in this article considers both the precision and the stability of the prediction, two objective functions are defined: where ObjSt and ObjAcc are the objective stability and prediction precision functions, respectively; RMSE training is the RMSE in the training set; St is the size of the training set's sample; Ak and Pk are the actual and predicted values at time k; std is the population's standard deviation. Figure 2 shows the Flow chart of the hybrid regression system (SVM-GWO).

Figure 2.
Flow chart of hybrid regression system.

Performance Analysis
The prediction performance is measured using two commonly used error metrics: RMSE (Root-mean-square error) and R 2 (The adjusted coefficients of determination). The following are their expressions:  Figure 2. Flow chart of hybrid regression system.

Performance Analysis
The prediction performance is measured using two commonly used error metrics: RMSE (Root-mean-square error) and R 2 (The adjusted coefficients of determination). The following are their expressions: where P i and A i have predicted values, the actual number of the ith record, and n is the total number of records (61).

Results and Discussion
This part will exhibit the performance of the suggested optimization and prediction system for minimum and maximum temperature forecasts, as well as the results obtained from preprocessing data. Figure 3 shows the box plot of variables; Tmax_ present , Tmin_ present , RHmin_ LDAPS . Tmax_ LDAPS , Tmin_ LDAPS , and LHF_ LDAPS . There are a lot of outliers in the figures, as you can see. The box plot is a data preparation method for detecting extreme values and outliers. It computes dispersion by dividing a rank-ordered dataset into four equal portions known as quartiles [9]. Q1, Q2, and Q3 are the values that divide each section, with Q1 and Q3 denoting the middle values in the first and second halves of the rank-ordered dataset, respectively, and Q2 denoting the median value for the entire set. The Interquartile Range (IQR) is calculated by subtracting Q3 from Q1. Data instances that fall below Q1 + 1.5 IQR or above Q3 + 1.5 IQR are considered outliers. Getting rid of these outliers enhances prediction accuracy. As a result, data that are over three standard deviations from the mean value is discarded. We used winsorization to remove outliers, and we experimented with multiple threshold values ranging from 3 to 4.8. We found the suitable values that achieved with high prediction performance without loss large dataset was 4.2 (the dataset reduced by 5.8%).
total number of records (61).

Results and Discussion
This part will exhibit the performance of the suggested optimization and prediction system for minimum and maximum temperature forecasts, as well as the results obtained from preprocessing data. Figure 3 shows the box plot of variables; Tmax_present, Tmin_present, RHmin_LDAPS. Tmax_LDAPS, Tmin_LDAPS, and LHF_LDAPS. There are a lot of outliers in the figures, as you can see. The box plot is a data preparation method for detecting extreme values and outliers. It computes dispersion by dividing a rank-ordered dataset into four equal portions known as quartiles [9]. Q1, Q2, and Q3 are the values that divide each section, with Q1 and Q3 denoting the middle values in the first and second halves of the rank-ordered dataset, respectively, and Q2 denoting the median value for the entire set. The Interquartile Range (IQR) is calculated by subtracting Q3 from Q1. Data instances that fall below Q1 + 1.5 IQR or above Q3 + 1.5 IQR are considered outliers. Getting rid of these outliers enhances prediction accuracy. As a result, data that are over three standard deviations from the mean value is discarded. We used winsorization to remove outliers, and we experimented with multiple threshold values ranging from 3 to 4.8. We found the suitable values that achieved with high prediction performance without loss large dataset was 4.2 (the dataset reduced by 5.8%).          Figure 6. shows the cloud cover data (CC1_ LDAPS , CC2_ LDAPS , CC3_ LDAPS, and CC4_ LDAPS ). All 6-h splits are right-skewed, and the majority of these splits' values are close to zero. Figure 7 shows the variables' data distribution after applying a log transformation technique to remove skewness. We note the Skewness is removed compared to  Removing data skewness has a significant effect on the accuracy of our predictive model.  Figure 7 shows the variables' data distribution after applying a log transformation technique to remove skewness. We note the Skewness is removed compared to  Removing data skewness has a significant effect on the accuracy of our predictive model.  Figure 7 shows the variables' data distribution after applying a log transfo technique to remove skewness. We note the Skewness is removed compared to Fig  6. Removing data skewness has a significant effect on the accuracy of our pr model. To emphasize the benefits of the proposed model, this paper defines two mode the benchmark models for comparison with our proposed method. The first one of t models is the classic SVM model, the second model is the PSO with SVM. Reference shows the theory of various models, as well as the reasons for selecting them.
The forecasting accuracy of the proposed model (SVM-GWO) and the selected be mark models (SVM and SVM-PSO) are shown in Figure 8. All models were trained u data from 2013 to 2016, and weather data for 2017 was expected. For the predictio next-day maximum and lowest air temperatures in 2017, the RMSE values for all mo were determined.
In Figure 8, we can observe from the average RMS line for Tmax and Tmin Fore the performance of SVM prediction has improved when using PSO and GWO. When S is paired with GWO, the average RMSE prediction is lowered by around 51%, and w SVM is combined with PSO, the RMSE is decreased by about 31%. These findings ba up the GWO-SVM model's superior prediction capacity, leading to the conclusion tha suggested model has the best prediction accuracy of all the models tested.  To emphasize the benefits of the proposed model, this paper defines two models as the benchmark models for comparison with our proposed method. The first one of these models is the classic SVM model, the second model is the PSO with SVM. Reference [42] shows the theory of various models, as well as the reasons for selecting them.
The forecasting accuracy of the proposed model (SVM-GWO) and the selected benchmark models (SVM and SVM-PSO) are shown in Figure 8. All models were trained using data from 2013 to 2016, and weather data for 2017 was expected. For the prediction of next-day maximum and lowest air temperatures in 2017, the RMSE values for all models were determined. To emphasize the benefits of the proposed model, this paper defines two models as the benchmark models for comparison with our proposed method. The first one of these models is the classic SVM model, the second model is the PSO with SVM. Reference [42] shows the theory of various models, as well as the reasons for selecting them.
The forecasting accuracy of the proposed model (SVM-GWO) and the selected benchmark models (SVM and SVM-PSO) are shown in Figure 8. All models were trained using data from 2013 to 2016, and weather data for 2017 was expected. For the prediction of next-day maximum and lowest air temperatures in 2017, the RMSE values for all models were determined.
In Figure 8, we can observe from the average RMS line for Tmax and Tmin Forecast, the performance of SVM prediction has improved when using PSO and GWO. When SVM is paired with GWO, the average RMSE prediction is lowered by around 51%, and when SVM is combined with PSO, the RMSE is decreased by about 31%. These findings backed up the GWO-SVM model's superior prediction capacity, leading to the conclusion that the suggested model has the best prediction accuracy of all the models tested. In addition, a scatter plot was created and displayed in Figure 9 to show correlations between the actual observations and predictions. We observe the proposed model predicted temperatures were strongly correlated, with R 2 values of 0.91 in Tmax_Forecast In Figure 8, we can observe from the average RMS line for Tmax and Tmin Forecast, the performance of SVM prediction has improved when using PSO and GWO. When SVM is paired with GWO, the average RMSE prediction is lowered by around 51%, and when SVM is combined with PSO, the RMSE is decreased by about 31%. These findings backed up the GWO-SVM model's superior prediction capacity, leading to the conclusion that the suggested model has the best prediction accuracy of all the models tested.
In addition, a scatter plot was created and displayed in Figure 9 to show correlations between the actual observations and predictions. We observe the proposed model predicted temperatures were strongly correlated, with R 2 values of 0.91 in Tmax_ Forecast and 0.93 in Tmin_Forecast, and weakly correlated when using SVM-PSO with R 2 values of 0.56 in Tmax_ Forecast and 0.51 in Tmin_ Forecast .  Yearly Hindcast validation for our proposed model for predicting both T max Forecast and T min Forecast was performed to support the prior findings, and it was compared to a previous work model published by Dongjin Cho et al. [14]. The author had previously predicted these values based on other machine learning models, and it was concluded that the multi-model ensemble (MME) model performed better in terms of generalization than the LDAPS model and other hindcast validation machine learning models. In addition, the (SVM-GWO) model's findings were compared to those of the LDAPS model. From 2015 to 2017, hindcast validation was performed for each year. The data were used to train prediction models that forecasted the period until the end of the year from January 1 to July 31 for each year. Table 2 shows the annual validation of three models using Hindcast.  Yearly Hindcast validation for our proposed model for predicting both T max Forecast and T min Forecast was performed to support the prior findings, and it was compared to a previous work model published by Dongjin Cho et al. [14]. The author had previously predicted these values based on other machine learning models, and it was concluded that the multi-model ensemble (MME) model performed better in terms of generalization than the LDAPS model and other hindcast validation machine learning models. In addition, the (SVM-GWO) model's findings were compared to those of the LDAPS model. From 2015 to 2017, hindcast validation was performed for each year. The data were used to train prediction models that forecasted the period until the end of the year from January 1 to July 31 for each year. Table 2 shows the annual validation of three models using Hindcast.
In particular, they had the highest improved performance in 2016 for forecasting Tmax_ Forecast and 2017 for forecasting Tmin_ Forecast because the SVM-GWO model forecast had the lowest RMSE 0.93 in 2016 for Tmax_ Forecas and 0.696 in 2017 for T_min_ Forecast. Whereas the MME reference model had the lowest RMSE 1.45 for forecasting T_max_ Forecast , in 2016 and 0.84 for forecasting Tmin_ Forecast in 2017. Nevertheless, it remains, In most years, our model had a lower RMSE than the other models for forecasting both Tmax Forecast and Tmin Forecast Generally, the SVM-GWO reduces the RMSE value from 2.09 to 0.95 and 1.43 to 0.82 for forecasting both Tmax_ Forecast and Tmin_ Forecast , respectively. This result confirms the results of the high predictive capacity of the hybrid model presented and leads to the conclusion that the (SVM-GWO) model offers better and more accurate predictions compared to traditional machine learning models and can be a reliable method to predict future max and minimum air temperatures.  Figure 10 shows the daily RMSE and R 2 time-series values of the LDAPS and the proposed model SVM-GWO were compared for the last year of the investigation period (2017). The time-series represents by day of the year (DOY) with RMSE and R 2 values for LDAPS and proposed SVM-GWO model for forecast Tmax_ Forecast and Tmin_ Forecast . Both T_max_ Forecast and T_min_ Forecast forecasts by SVM-GWO generally showed a lower daily RMSE than the LDAPS model Figure 10a,c This is because the time-series of the SVM-GWO corrected temperatures were closer to the observations. For the Tmax_ Forecast forecast, the lowest RMSE for SVM-GWO model was DOY 227 and the highest RMSE was DOY 212. Although for the T_max_ Forecast , the lowest and highest RMSE were DOY 197 and 222, respectively. Figure 10b,d Generally, The SVM-GWO model had a higher R 2 than LDAPS and MEE mod for both Tmax_ Forecast and Tmin_ Forecast forecasts, As a result, the SVM-GWO model accurately simulates the temperature distribution within a metropolis. . This result confirms the results of the high predictive capacity of the hybrid model presented and leads to the conclusion that the (SVM-GWO) model offers better and more accurate predictions compared to traditional machine learning models and can be a reliable method to predict future max and minimum air temperatures. Figure 10 shows the daily RMSE and R 2 time-series values of the LDAPS and the proposed model SVM-GWO were compared for the last year of the investigation period (2017). The time-series represents by day of the year (DOY) with RMSE and R 2 values for LDAPS and proposed SVM-GWO model for forecast Tmax_Forecast and Tmin_Forecast. Both T_max_Forecast and T_min_Forecast forecasts by SVM-GWO generally showed a lower daily RMSE than the LDAPS model Figure 10a,c This is because the time-series of the SVM-GWO corrected temperatures were closer to the observations. For the Tmax_Forecast forecast, the lowest RMSE for SVM-GWO model was DOY 227 and the highest RMSE was DOY 212. Although for the T_max_Forecast, the lowest and highest RMSE were DOY 197 and 222, respectively. Figure 10b,d Generally, The SVM-GWO model had a higher R 2 than LDAPS and MEE mod for both Tmax_Forecast and Tmin_Forecast forecasts, As a result, the SVM-GWO model accurately simulates the temperature distribution within a metropolis.

Conclusions
The LDAPS model outputs of Tmax_Forecast and Tmin_Forecast in the Seoul Metropolitan Area are improved using a hybrid model that includes a gray wolf optimizer (GWO) and a support vector machine (SVM). The forecast models were created by using 14 LDAPS model forecast data, in situ observations' Tmax_Forecast and Tmin_Forecast, and five auxiliary data as input variables. The four machine learning algorithms and the LDAPS model were evaluated using hindcast validation. When compared to the PSO and conventional SVM, the Gray Wolf Optimizer showed its strength by generating more stable and accurate results, with the average RMSE of SVM for Tmax and Tmin prediction lowered by roughly 51% when combined with GWO and 31% when combined with PSO. In addition, the hybrid model (SVM-GWO) improved the performance of the LDAPS model by lowering the RMSE values for Tmax_Forecast and Tmin_Forecast forecasting from 2.09 to 0.95 and 1.43 to 0.82, respectively. Despite the need for further research with other NWP models, this strategy is likely to be successful when applied to other NWP models for the study area that can deterministically predict next-day temperatures.

Conclusions
The LDAPS model outputs of Tmax_ Forecast and Tmin_ Forecast in the Seoul Metropolitan Area are improved using a hybrid model that includes a gray wolf optimizer (GWO) and a support vector machine (SVM). The forecast models were created by using 14 LDAPS model forecast data, in situ observations' Tmax_ Forecast and Tmin_ Forecast , and five auxiliary data as input variables. The four machine learning algorithms and the LDAPS model were evaluated using hindcast validation. When compared to the PSO and conventional SVM, the Gray Wolf Optimizer showed its strength by generating more stable and accurate results, with the average RMSE of SVM for Tmax and Tmin prediction lowered by roughly 51% when combined with GWO and 31% when combined with PSO. In addition, the hybrid model (SVM-GWO) improved the performance of the LDAPS model by lowering the RMSE values for Tmax_ Forecast and Tmin_ Forecast forecasting from 2.09 to 0.95 and 1.43 to 0.82, respectively. Despite the need for further research with other NWP models, this strategy is likely to be successful when applied to other NWP models for the study area that can deterministically predict next-day temperatures.