Application of Semi-Empirical Models Based on Satellite Images for Estimating Solar Irradiance in Korea

: The application of solar energy as a renewable energy source has signiﬁcantly escalated owing to its abundance and availability worldwide. However, the intermittent behavior of solar irradiance is a serious disadvantage for electricity grids using photovoltaic (PV) systems. Thus, reliable solar irradiance data are vital to achieve consistent energy production. Geostationary satellite images have become a solution to this issue, as they represent a database for solar irradiance on a massive spatiotemporal scale. The estimation of global horizontal irradiance (GHI) using satellite images has been developed based on physical and semi-empirical models, but only a few studies have been dedicated to modeling GHI using semi-empirical models in Korea. Therefore, this study conducted a comparative analysis to determine the most suitable semi-empirical model of GHI in Korea. Considering their applicability, the Beyer, Rigollier, Hammer, and Perez, models were selected to estimate the GHI over Seoul, Korea. After a comparative evaluation, the Hammer model was determined to be the best model. This study also introduced a hybrid model and applied a long short-term memory (LSTM) model in order to improve prediction accuracy. The hybrid model exhibited a smaller root-mean-square error (RMSE), 97.08 W/m 2 , than that of the Hammer model, 103.92 W/m 2 , while producing a comparable mean-bias error. Meanwhile, the LSTM model showed the potential to further reduce the RMSE by 11.2%, compared to the hybrid model.


Introduction
Solar photovoltaic (PV) power has been widely utilized to cope with global warming and climate change, which are dominantly caused by the use of fossil fuels. Owing to its simple adaptation and low maintenance cost, solar PV technology is also becoming more common compared to other clean energies. However, the intermittent nature of solar irradiance is a major drawback that leads to unpredictable and uncontrollable electricity production from solar PV systems. Solar energy strongly depends on local meteorological factors, such as clouds and aerosol molecules, as well as sun position at a specific site. Therefore, solar irradiance data are critical for the optimization of solar PV power generation and the operation of solar PV systems [1][2][3][4].
The most reliable solar irradiance data are obtained from direct measurement using a pyranometer and pyrheliometer on the ground. Nowadays, the technology for irradiance measurement has advanced to the point that real-time data can be automatically collected and monitored. Nonetheless, ground measurement systems are expensive and require careful maintenance. As a result, ground measurement has limited spatial coverage, and the measurement database is still deficient for satisfying the demands of a PV-using society. Despite the attempts of some scholars to overcome the limited spatial coverage of direct measurement using conventional approaches, such as interpolation, the accuracy of the interpolated data is still not adequate [5,6].
Satellite data play an essential role in gathering information on solar irradiance on the earth's surface. Zelenka et al. [5] showed that the accuracy of satellite-derived data is comparable with that of interpolated data from a station 25 km from the measurement site. The national solar radiation database in the United States has already taken advantage of satellite-driven irradiance data for nationwide coverage [7]. Geostationary satellites, whose orbit speed is identical to that of earth rotation, can repeatedly capture images of a specific area on the earth's surface. The new generation of geostationary satellites are capable of delivering data images with a high spatial resolution of 1 × 1 km 2 for each visible channel image up to a 15-min time interval. Therefore, the solar irradiance data extracted from geostationary satellite images have a high potential for efficiently and inexpensively satisfying the demands of the solar energy sector.
Many researchers have developed and improved models to extract global horizontal irradiance (GHI) from satellite images [1][2][3]6,[8][9][10][11][12]. The GHI estimation models are commonly in a physical or semi-empirical form. The physical models theoretically compute the extraterrestrial solar irradiance transmission through the atmosphere by calculating the radiative transfer equation. On the other hand, the semi-empirical models count on a linear interrelation between the satellite-recorded surface albedo and the clear-sky index, along with the consideration of important physical phenomena.
In terms of reliability and versatility, physical models have shown higher performance than semi-empirical models, as shown in Table 1. However, physical models technically require more input of atmospheric data, such as aerosol optical depth (AOD), ozone content, and precipitable water. Such atmospheric data are often less accessible at a specific location than ground measurement data of solar irradiance. In contrast, semi-empirical models rely on simple assumptions and statistically determined coefficients. As a result, semi-empirical models require only a few input variables, and furthermore most of them are geometric factors. Semi-empirical models offer easier calculation, simpler implementation, and faster operation [13,14]. Due to their simplicity, semi-empirical models are often favorable for forecasting solar irradiance [8]. GHI forecasting means that the future values of input variables need to be known. The forecast accuracy of the physical model can quickly decrease with the number of input variables unless their values are properly forecast.  Cano et al. [9] proposed a pioneering form of the semi-empirical model, known as the Heliosat-1 model, which has become a fundamental technique for numerous following models. They assumed that the cloud index (CI) can be estimated based on the brightness of satellite sensor data. They calculated GHI by simply reducing clear-sky GHI according to the CI. Since then, Rigollier et al. [3], Perez et al. [2], Hammer et al. [8], Beyer et al. [10], and other scholars made improvements on the Cano model. As the well-known semi-empirical models were mainly developed using data in Europe and North America, many researchers evaluated and modified them for application to various other regions [6,11,12]. Table 1 implies that, based on Köppen-Geiger climate classification [24,25], the semiempirical models were mostly dedicated to continental and temperate areas. Only a few studies addressed a hot summer continental climate, which is categorized as the "Dwa" subtype. In the northern region of China, Yang et al. [4] predicted GHI 1, 2, and 3 h ahead by using particle image velocimetry and the Rigollier method. Jia et al. [16] combined the McClear clear-sky irradiance [18] and Rigollier models to estimate GHI at Zhangbei. Both the studies utilized the Fengyun-4 satellite system.
In Korea, several studies have attempted to extract GHI from Korea's Communication, Ocean, and Meteorological Satellite (COMS) system [15,17,19,22,23]. Since the COMS system was developed, most studies have focused on physical models. Zo et al. [19] initiated GHI modeling by using COMS data as input and introducing a physical model, which is called the Gangneung-Wonju National University (GWNU) model. Yeom et al. [22] applied the Kawamura model [26] and generated a solar irradiance map by considering topographical factors, such as terrain slope. In addition, Kim et al. [23] estimated GHI using their own physical model, which is called the CLAVR-x model. Still, as far as we know, only one such study has employed the semi-empirical method. Choi et al. [15] added a preprocessing algorithm into the Rigollier model to distinguish cloud from ground.
Considering the importance of semi-empirical models and the lack of relevant studies evaluating them in the hot summer continental climates, the present study focuses on two objectives. The first objective of this study is to conduct a comparative analysis of the existing semi-empirical solar irradiance models using data from Korea, and thus identify a semi-empirical model that is suitable for the hot summer continental climate. The second objective is to investigate the improvements in accuracy that could be achieved by combining formulas from various models and by applying a machine learning technique for replacing climate-dependent coefficients in the semi-empirical model. The fulfillment of these objectives will contribute to accurately estimating a regional database of solar radiation for the hot summer continental climate.
This paper consists of five sections. Section 2 presents ground station measurement of solar irradiance data, COMS satellite images, and AERONET data employed in this study. In Section 3, semi-empirical modeling methods, which include calculations of the cloud index, clear-sky irradiance, and GHI conversion, as well as the machine learning method, are explained. In addition, error metrics for accuracy evaluation of each model are listed. Detailed results, analyses, and discussions are presented in Section 4. Finally, Section 5 provides a summary of the major findings from the study.

Satellite Images, Ground Measurement, and AERONET Dataset
The COMS has become a significant player in providing reliable weather data and atmospheric conditions since the Korean government launched its first geostationary satellite in 2011 at 128.2 • E. The satellite system records the Korean peninsula and Asia-Pacific region, including the Australian continent, China, and Southeast Asia.
The COMS system carries three infrared (10.8 µm, 12 µm, and 3.7 µm), water vapor (6.7 µm), and visible (0.67 µm) channels to observe the earth's surface. The images that COMS produced are composed of three levels of quality: level 1A, level 1B, and level 2. The level 1A images are satellite images processed through radiometric, geometric, and georeferencing calibration. In level 1B, images are then screened from distortion and nonuniform pixel intensity. Finally, images in level 2 are extracted into applicable maps, such as for forest fires, fog observation, and solar irradiance maps. Technically, the COMS satellite produces 16-bit visible images at 15-min intervals, and the satellite system provides a spatial resolution of 1 × 1 km 2 for the visible sensor and 4 × 4 km 2 for the three infrared channels. In this work, we utilized the hourly level 1B of satellite images ( Figure 1) obtained from 1 January to 31 December 2018 and selected the pixel from each image that was closest to the ground measurement in order to validate our model. This study performed GHI measurements from a ground station located in the northern part of Seoul (latitude, 37.612 • N; longitude, 126.996 • E; and altitude, 141 m). The ground measurement system comprises one pyranometer (MS-802 model) for the GHI and one pyrheliometer (MS-57 model) for direct normal irradiance (DNI). Both of the instruments, manufactured by the Eko company in Japan, have a sensitivity around ±7 µVm 2 /W for the GHI and DNI, respectively [27,28]. Identical to the COMS satellite data, hourly intervals for the ground data were also locally archived from 1 January-31 December 2018.
In this study, quality control was applied based on the procedure from Gueymard and Ruiz-Arias [29] to guarantee reliable data from the ground measurement. In particular, data that did not meet the following criteria were rejected: GHI > 0 W/m 2 3. GHI < 1.5G ext cos 1.2 θ z + 100 W/m 2 4. DNI ≥ 0 W/m 2 5.
DNI ≤ G ext , where θ z and G ext denoted the solar zenith angle and extraterrestrial irradiance on a normal surface, respectively. The data that did not meet the requirements were rejected. Note that the criterion of θ z > 80 • was applied to remove GHI values at low sun elevation, and DNI data only were used for the data quality control. The aerosol robotic network (AERONET) is a network of ground-based measurements providing continuous data on spectral AOD, precipitable water, and inversion aerosol in various aerosol regimes using a sun radiometer [30]. The sun-sky radiometer system of AERONET quantifies solar spectral measurements every 15 min through eight spectral bands at wavelengths of 340, 380, 440, 500, 675, 870, 940, and 1020 nm [31]. The AERONET dataset has three grades: level 1 for raw data, level 1.5 for data with clouds removed, and level 2 for quality-controlled data. In this study, clear-sky irradiance was calculated through the Linke turbidity factor (T L ) to consider the reduction of solar irradiance owing to atmospheric effects. The level 2 data of AERONET, which were collected from 1 January-31 December 2018, were employed and arranged into hourly intervals to determine the value of T L .

Global Horizontal Irradiance Models
In this study, from their accuracy, applicability, and versatility, well-known semiempirical models were chosen to generate the GHI over the Korea peninsula [1,3,8,11]. The Beyer, Rigollier, and Hammer models originally utilized the Meteosat imagery system and their results were validated through ground measurements located in Europe. Beyer et al. [10] modified the original Heliosat-1 model by introducing new variants for geometrical correction, backscatter correction, and clear-sky irradiance computation. It was revealed that the hourly insolation accuracy of the Beyer model has an RMSE of 120 Wh/m 2 (rRMSE of 16%). Rigollier et al. [3] improved the Heliosat-1 method by applying calibrated radiances instead of digital numbers. The application of the calibrated radiances as input opened the opportunity for substituting several empirical components of Heliosat-1 into the physical computation. The results of the Rigollier model revealed RMSEs and rRMSEs of 62 Wh/m 2 (45%), 96 Wh/m 2 (27%), and 103 Wh/m 2 (18%) for January 1995, April 1995, and July 1994, respectively.
The Hammer model was established to forecast the GHI several hours ahead. A motion vector field from two consecutive satellite images can be generated to determine cloud motion. The vector field was computed with the current images and extrapolated up to 120 min ahead with a 30-min interval. Finally, the GHI could be forecast by applying the extrapolated pixels of the satellite image to the Hammer model. The relative RMSEs (rRMSE) were 35% and 40% for 1-h and 2-h forecast horizons, respectively. Contrary to the other three models, the Perez model used the geostationary operational environmental satellite (GOES) system in the United States. Validation of the Perez model was then conducted against ten ground measurements from different climatic types of the American region. The RMSE of the Perez model varied from 44 to 118 W/m 2 , depending on the location of the ground measurement. Below are the detailed explanations of the procedures used for each model.

Cloud Index Formula
The schematic diagram in Figure 2 illustrates the flow to derive the GHI using the semiempirical models. The two important procedures are the CI estimation, which represents the cloudiness over the earth's surface based on a satellite image, and the calculation of the clear-sky irradiance using atmospheric data. Thus, the GHI can be determined as the decrease in the clear-sky irradiance due to the CI. Normalization of satellite image data is mandatory to obtain a CI that is independent from optical and geometric effects [1]. The air mass and backscatter effects mainly cause optical distortion in image data. The air mass effect occurs because sunlight is scattered and absorbed by particles when it passes through the atmosphere. As a result, the closer the sun is to the horizon, the stronger the air mass effect is. The backscatter effect, which is also known as the hot-spot effect, implies that sunlight more significantly goes back in the sun's direction after scattering by atmospheric particles. Thus, the sensor system overestimates the ground albedo when the satellite is close to the sun.
Each semi-empirical model has its own correction formula, which was originally developed using its satellite systems and local stations. The conversion formulas from raw satellite pixel data (C sat ) to normalized data (CC) are presented in Table 2. The Beyer and Hammer models include a correction factor (C 0 ) for the air mass and backscattering effects, where C 0 is determined through θ z , backscattering angle (ψ), and satellite zenith angle (ϕ).
The production of the solar constant (I 0 ) and the eccentricity correction of the sun-to-earth distance (ε) represent the extraterrestrial irradiance.
The Rigollier model extracted its correction factor (C 0R ) using some equations from the European Solar Radiation Atlas (ESRA) model. To obtain CC R , clear-sky diffuse irradiance (DHI cR ), transmittance from the sun to the earth (T), and transmittance from the earth to the satellite system (T sat ) were computed. The parameters θ z , ϕ, and T L were the key input variables for determining the energy that is scattered in the atmosphere and evaluated via the ESRA model. The maximum total irradiance of the visible sensor (I 0met ) was obtained from the convolution of the spectral distribution of I 0 by the spectral sensitivity curve of the radiometer [3]. To obtain further details on the normalization process, the reader is encouraged to refer to Rigollier et al. [3] and the ESRA [32].
Contrary to the other three models, the Perez model introduced dependence on the sun elevation (γ) and air mass (AM) to normalize pixels from air mass effect. Perez et al. [2] evaluated the backscatter correction by deriving different lower boundaries of the dynamic range from several years of recorded satellite data. However, the present study only used the data of a single year, and the backscatter effect was not considered.
In addition to the models chosen in this study, more models are available throughout the literature. The combination of formulas used by various models was attempted to investigate the possibility of increasing prediction accuracy. As a result, the normalization equations provided by Zelenka et al. [5], the clear-sky irradiance equation given by Kasten [33], and the GHI conversion equation proposed by Perez et al. [2] were combined to create a single model. In this study, this model will be referred to as the hybrid model. Originally, Zelenka et al. [5] used θ z and ψ as the correction factor (C 0M ) to normalize ground albedo (C min ).
Clouds are the main factor attenuating solar irradiance when sunlight penetrates the atmospheric layer. The CI, which is used to represent the amount of clouds in the sky, can be estimated via a relative difference in the normalized pixel value between the lower and upper boundaries of its dynamic range. The lower and upper boundaries can be interpreted as the earth's surface and cloud reflectance, respectively. Cloud index can be determined according to the previous studies [8,10,34]: where C max and C min denote the highest and lowest values of CC, respectively. The CI value ranges from zero, which represents the darkest condition (no clouds) to unity, which represents the brightest condition (overcast). For the Rigollier model, C max should be obtained in the same way as CC R , but using the effective cloud albedo formulation, as follows [3]:

Clear-Sky Irradiance Formula
Another essential element of semi-empirical models is the determination of the maximum solar irradiance limit, i.e., the clear-sky irradiance (GHI c ). Under a cloudless condition, the solar irradiance attenuation depends on the concentration of absorbers and scatterers, such as aerosols, gases, and water vapor, inside the atmosphere. Most clear-sky models address solar attenuation through a derivation from atmospheric components and geometric position. With regard to the semi-empirical approach, the complexity of the clear-sky irradiance model, which is caused by rapid changes in aerosols, water vapor, and other molecules, is relatively reduced. The atmospheric variables can instead be simplified into a single turbidity variable of T L [35][36][37]. In this study, well-known clear-sky irradiance models (Table 3) were selected to examine GHI c . Table 2. Pixel normalization formulas for obtaining an independent cloud index (CI) (step 2). Each "raw" pixel (C sat ) is subjected to this normalization before CI extraction.

Model Equation
Beyer Rigollier Hammer The Beyer model applies a straightforward clear-sky model based on only a geometric calculation. This model is called the Bourges clear-sky formula and uses only θ z to summarize the attenuation of extraterrestrial solar irradiance during transmission in the atmosphere [10]. The model was purely developed from an empirical correlation between ground measurement, sun position, and site location [38].
The clear-sky irradiance of the Rigollier model, which is known as the ESRA model, is decomposed into direct and diffuse components, considering T L and the Rayleigh scattering factor (δ R ). The clear-sky irradiance formula of the Hammer model is similar to that of the Rigollier model. The clear-sky DNI equations in both the Rigollier and Hammer models are identical [39], but the clear-sky DHI equations are different.
Owing to the complexities in the derivation of the clear-sky GHI from the Rigollier model, it was preferable to adopt a simpler formula, successfully applied in previous literature. Here, the Kasten clear-sky model was selected to determine the clear-sky GHI of the hybrid model. The model conserved site elevation (z) and atmosphere turbidity for a site. By including z in the equation, the Kasten model demonstrated its advantage in investigating clear-sky GHI at high elevations. In the Perez model, the clear-sky GHI is calculated based on the study by Ineichen and Perez [36], in which they upgraded the original Kasten model. This new variant also considers T L and z to enhance its suitability for observational data from the West of the American continent.
The most important input parameter in the clear-sky model is the Linke turbidity, which accounts for atmospheric conditions. The Linke turbidity is also a convenient tool for simplifying the degree of atmospheric attenuation due to absorption and scattering. Theoretically, T L can be evaluated via either a direct or indirect formula [40]. The direct formula, which is called the pyrheliometric formula, commonly exploits DNI measurement data, AM, and δ R [2]. In contrast, the indirect formula, which is called the parameterized formula, uses atmospheric components, such as AM, precipitable water vapor (w), angstrom turbidity factor (β), and AOD [41]. After testing various formulas of T L , the model introduced by Remund et al. [41] (Equation (13)) turned out to be optimal and was used throughout this study: T L = (1.8498 + 0.2425w − 0.0203w 2 ) + (15.427 + 0.3153w − 0.0254w 2 )β (13) Table 3. Clear-sky irradiance formulas used for each semi-empirical model.

GHI Conversion Formula
After CI and GHI c are given, the GHI can be calculated from GHI c by appropriately reducing it according to CI. The reduction factor is called the clear-sky index (k c ) and is expressed as a correlation function. Finally, the conversion equations are presented in Table 4. GHI = k c GHI c (32) Table 4. Global horizontal irradiance (GHI) conversion used for each semi-empirical model.

Model Equation
Beyer Rigollier Hybrid Table 5 presents the input variables of all the semi-empirical models utilized in this study. Compared to the other three models, the Beyer model is the simplest and does not require T L to account for the atmospheric conditions. On the other hand, the Rigollier and Hammer models are the best options in terms of versatility and comprehensiveness. These models allow generating three components at once, i.e., GHI, DNI, and DHI, under clear skies. They were also demonstrated to be accurate in major areas of Europe and Africa, which include regions with a similar climate to Korea [6,12]. The Perez model could also be a prudent candidate for application in Korea, based on the number of input variables. Note that one variable is implicit in the Perez model; Perez et al. [2] supposed that C min dynamically changed, rather than being a constant throughout a year and thus determined it according to historical data. Consequently, this step to select C min can be regarded as an independent variable. In this study, this step was not implemented. The hybrid model also takes four input variables for accuracy while trying to circumvent the complexities in other models. Table 5. Comparison of input variables in semi-empirical models.

Model
Input Total

Long Short-Term Memory Model
The long short-term memory (LSTM) model is a variant of the recurrent neural network model that enables learning order dependence in sequential data [42]. The model was successfully applied for temporal and ordinal data, including image captioning, machine translation, and speech recognition [43][44][45]. Although the LSTM model works in similar steps to the recurrent neural network model, the procedures inside the cell are different. As shown in Figure 3, LSTM contains three main gates: input, output, and forget gates with internal memory. Each cell memorizes values in certain time intervals and these three gates control the information that comes in and out of the cell. The gates and internal memory of LSTM are used to resolve the vanishing and exploding gradient problems that often occur in recurrent neural network models. The operational formulas of the LSTM method and definitions of notation are as follows: • X t is the input vector to the memory cell at time t. • σ is the sigmoid function. • b is the bias vector. • W, U and V are weight matrices. The sigmoid function, σ in Equations (43)- (45) generates values from zero to one. A value of one signifies that the input data are available to transfer through the gate, while the value of zero implies the opposite [43,46]. S t via tanh function in Equation (46) produces a new value at time t, which ranges from −1 to 1. The value of C t in Equation (47) is obtained from the inputs of i t and f t , and the multiplication between o t and tanh(C t ) in Equation (48) results in the final output, h t .
The LSTM model showed the best performance compared to the other machine learning methods for GHI prediction. Rajagukguk et al. [43] mentioned that the RMSE of the LSTM model ranged from 108.88 W/m 2 to 30.21 W/m 2 , depending on location, input parameter, forecast horizon, and time interval. Thus, this study applied LSTM for solar irradiance forecasting in order to compare it with semi-empirical models, which contain regression equations. The inputs for the LSTM model were C sat , θ z , T L , and ψ, whereas the constant value of z was ignored. The data from January-September 2018 were used for training, and those from October-December 2018 were selected for data tests.

Error Metrics for Model Validation
Determining statistical accuracy is necessary to quantify whether a model underestimates or overestimates the GHI. This study validated the accuracy through MBE, RMSE, rRMSE, and rMBE determination. The formulas were defined, respectively, as follows: where n denotes the total number of data points. The RMSE indicated how close the predicted values were to the observational data with help from a regression line. The RMSE always produces a positive value. On the other hand, the MBE demonstrates the average bias in a model. The MBE is a sum of predicted data minus the observations. Hence, a positive MBE value indicates an overprediction, whereas a negative one represents an underprediction.

Results and Discussion
The vital step in determining CI is the pixel normalization to compensate for backscatter and air mass effects, which often create bias in images. Thus, the normalization provides the "true" brightness of the earth's surface. Note that the digital number for COMS pixels varies from 0 to 1023. Figure 4 presents box-and-whisker plots of the pixel values after the normalization of each model was conducted. The normalized pixels were expected to demonstrate a constant dynamic range of the upper and lower boundaries. In all the models, the lower boundary was relatively constant, except at low backscatter angles. In other words, when the backscatter angle reached 0 • , i.e., the sun was close to the satellite, the lower boundary tends to become slightly increased. The constant lower boundary indicates that the albedo for the earth's surface had not change significantly. The upper boundaries, which represent overcast conditions, were also expected to remain constant. The variations of the upper boundaries were larger than those of the lower boundaries. Moreover, large maximums and some outliers beyond the maximum appeared. Consequently, the determination of the upper boundaries is more uncertain, and the GHI estimation under overcast conditions tends to be less accurate. Compared to the other three models, the Beyer model exhibited large deviations in terms of the interquartile range and the maximum. Based on the range of the normalized pixel counts, as shown in Figure 4, CI was determined according to Equation (1). CI indicates an attenuation factor from clear-sky GHI.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 14 of 23 was unable to estimate the GHI when 0.75 T k > . It is obvious that the Rigollier model underestimates when T k is small, and the opposite when T k is large. Error metrics for the comparisons in Figure 6 are summarized in Table 6. As indicated in Figure 6b, MBE values confirmed the biased estimation by the Rigollier model for different T k levels. The Hammer model also exhibited a similar trend to the Rigollier model, even though MBE values were reduced. Probably, the change from underestimation to overestimation as T k increases is a common feature of Helisat-2 models. On the other hand, RMSE generally decreases with T k , which implies difficulties in generating accurate CI or limitations of CI-based modeling.  The derivations of GHI c against γ are visualized in Figure 5. In general, all the models showed similar trends, where GHI c escalated along with γ. However, each model also yielded different dispersions due to their basic formulas. As previously discussed, the Beyer model does not consider atmospheric conditions via T L . Consequently, the dispersion of the Beyer model results was the smallest among the models. The Rigollier and Hammer models produced a similar trend, while the latter showed slightly larger dispersions. Contrarily, the dispersion in the Perez and hybrid models was significantly wide. Note that a small or large dispersion does not imply that the GHI c has good accuracy. Instead, it indicates that T L has significant effects in those two models. The maximum and minimum values of GHI c also varied among the models. The  Based on CI and clear-sky irradiance, as shown in Figures 4 and 5, GHI values under all sky conditions were derived and compared with the measurement data in Figure 6. The comparison was distinguished on different levels of the clearness index (k T = GHI model /I 0 cos θ z ). Note that k T ≤ 0.25, 0.25 < k T ≤ 0.5, 0.5 < k T ≤ 0.75, and k T > 0.75 were described as cloudy, partially cloudy, clear, and clearest sky conditions, respectively. The scatterplots in Figure 6 indicate that all the semi-empirical models worked well within a k T range from 0.25 to 0.75. The Beyer model, the simplest model, was unable to estimate the GHI when k T > 0.75. It is obvious that the Rigollier model underestimates when k T is small, and the opposite when k T is large. Error metrics for the comparisons in Figure 6 are summarized in Table 6. As indicated in Figure 6b, MBE values confirmed the biased estimation by the Rigollier model for different k T levels. The Hammer model also exhibited a similar trend to the Rigollier model, even though MBE values were reduced. Probably, the change from underestimation to overestimation as k T increases is a common feature of Helisat-2 models. On the other hand, RMSE generally decreases with k T , which implies difficulties in generating accurate CI or limitations of CI-based modeling. The accuracy for all sky conditions means the annual performance throughout 2018. All the models yielded promising results, in which they did not exceed 35.09% of rRMSE and −9.45% of rMBE. It was found that the Hammer model outperformed the other three semi-empirical models, with the best agreement of RMSE and MBE, as 103.92 W/m 2 and 0.09 W/m 2 , respectively. Despite the fact that the Hammer model performed the best, the differences in accuracy among the Hammer, Perez, and Beyer models were small. The deviation in the rRMSE between the Hammer and Perez models was only 2.1%, whereas that between the Hammer and Beyers models was only 2.64%. In contrast, the Rigollier model exhibited the largest error, with an MBE reaching −36.84 W/m 2 .
The hybrid model had slightly improved RMSE and rRMSE compared to the Hammer model, with the differences of 6.84 W/m 2 and 2.42%, respectively. Meanwhile, the MBE and rMBE of the hybrid model were slightly larger than those of the Hammer model, with the differences 1.51 W/m 2 and 1.78%, respectively. However, the rMBE of the Hammer model significantly varies with k T , such that rMBE exceeds −20% when k T ≤ 0.25. The rMBE of the hybrid model is fairly constant regardless of k T under the maximum value of 3.7%. The separation of GHI c into DNI c and DHI c , such as in Rigollier and Hammer models (refer to Table 3) does not guarantee better accuracy. As shown in Table 6, the Hammer and Rigollier models demonstrated contrasting accuracies, either very good or very poor, in this study. Although the Beyer model adopted a very simple formula that does not address atmospheric conditions via T L , it produced better accuracy than the Rigollier model. The hybrid model also took advantage of a simple clear-sky irradiance formula and improved accuracy. As a result, a simple model can be better than a complicated one if the clear-sky model is not sufficiently accurate. Figure 7 shows variations of solar irradiance under clear-sky and overcast conditions on exemplary days. In general, all the semi-empirical models were capable of estimating hourly variations. However, all the models consistently overestimated or underestimated, for instance, for the morning on 9 April, afternoon on 9 April, morning on 2 December, and so on. Such consistent estimations, regardless of model, demonstrate that either the atmospheric data to produce clear-sky irradiance are inaccurate or the assumption that only the cloud amount contributes to reducing solar irradiance is incorrect. Pollen and yellow sand dust often occur during the spring season in Korea, but clear-sky irradiance models in Table 3 may not suitably account for such regional features. More importantly, rain and snow are likely to affect solar irradiance. As evidenced by large RMSE values when k T is low from Table 6, the consideration of the effects of rain and snow can improve model accuracy. In order to investigate seasonal variations of model performance, monthly rRMSE and rMBE were plotted in Figures 8 and 9. Even though seasonal variations of rRMSE are not so noticeable in Figure 8, prediction accuracy was slightly better in the spring and fall seasons. Probably, the reason was that sky in both of the seasons tends to be clear in Korea. Unlike rRMSE in Figure 8, rMBE in Figure 9 reveals remarkable seasonal variations independent of the model. Except for the Beyer model, overestimation only occurs during the summer season. The closer to the winter season the time gets, the larger the underestimation becomes. This seasonal change of accuracy is caused by ground albedo change [2,47]. The location of interest, i.e., Kookmin University, is adjacent to the mountain, and the 1 × 1-km 2 COMS pixel corresponding to the location is likely to include both buildings and forest. Consequently, the combined effects of vegetation, soil, and building properties resulted in the ground albedo, C min . In winter, this issue becomes even more evident due to snow cover, which is often disguised as cloud. If the ground albedo change is considered, the solar irradiance model can improve prediction accuracy.  The Rigollier model consistently produced the largest error, and it only performed slightly better than the Beyer model during April and September 2018. Contrarily, the Hammer model produced smaller rRMSE and rMBE values than the Beyer, Perez, and Rigollier models throughout the 12 months. The lowest errors gained by the Hammer model were 18.8% for the rRMSE and 0.6% for the rMBE in April. The accuracy of the Perez model followed that of the Hammer model, with the best rRMSE and rMBE being 22.6% in February and 3.7% in September, respectively. The hybrid model consistently had a better accuracy than any of the four semi-empirical models for each month, with the lowest rRMSE and rMBE values at 17.1% and −0.1% during April, respectively. The hybrid model also showed higher accuracy compared with the other models during winter, as can be observed from the rRMSE values of 19.9%, 21.4%, and 18.5% for December, January, and February, respectively.
So far, the hybrid model has turned out to be more accurate than any other model. However, the development of such models is cumbersome and awkward because the hy-brid model in this study was created after using trial-and-error combinations of constituent formulas. Therefore, to replace the combination process in the hybrid model, an LSTM model was attempted, and its results are summarized in Table 7. As the hybrid model produced the best accuracy among the semi-empirical models, the input variables of the hybrid model (C sat , θ z , T L , and ψ) were selected for those of the LSTM model. The LSTM model revealed a slight improvement in RMSE compared to the hybrid model. Its MBE slightly increased, but it was comparable to that of the hybrid model and still better than those of the other models. Consequently, it was demonstrated that the machine learning method can replace the hybrid model.

Conclusions
Four semi-empirical models that use pixel values from a COMS geostationary satellite in order to extract GHI were evaluated. The models were compared using the same data time-resolution and location. The investigation was also deepened by separating the GHI based on daily, monthly, and annual evaluations, as well as on a clearness index. This study demonstrated that the Hammer model was selected as the most appropriate model for deriving the GHI over Korea, according to the RMSE and MBE.
The hybrid model, which was developed from the combination of correlation formulas, was found to exhibit moderately improved accuracy and successfully simplified a lengthy procedure in the Hammer and Rigollier models. The hybrid model was also able to slightly decrease the RMSE and rRMSE of the Hammer model from 103.92 W/m 2 and 26.54%, to 97.08 W/m 2 and 24.12%, respectively.
In order to replace the trial-and-error combination of the hybrid model, a machine learning model of LSTM was also presented. The LSTM model was able to predict GHI more accurately than the hybrid model, with an enhancement in rRMSE of 2%, although its rMBE was still comparable to that of the hybrid model.
Semi-empirical models can accurately determine the GHI with a small number of inputs, such as θ z , ϕ, T L , ψ, and z. In the context of Korea, this study provided a comparison of well-known semi-empirical models and suggested improved methods using hybrid formulas and the LSTM method. Hopefully, this study will be helpful for utilizing a semi-empirical approach to estimating solar irradiance in regions with a hot summer continental climate.