A Method to Improve the Accuracy of Simulation Models: A Case Study on Photovoltaic System Modelling

: This research presents a method to improve data accuracy for the more efﬁcient data management of the studied applications. The data accuracy was improved using the preciseness function learning model (PFL model). It contains a database in which the amount of data is more or less dependent on all of the possible behavior of the studied application. The proposed model improves data with functions obtained by optimizing curves to represent the data at each point, which estimate the database’s diffusion behavior, and functions can be built around all of the various forms of databases. The proposed model always updates its database after processing. It has been learning to optimize the processing precision. In order to verify the precision of the proposed model through its application to a PV system simulation model, the process’s database should contain at least one year. This is because the overall behavior of the PV power output in Thailand depends on the seasonal weather; Thailand has three seasons in a period of one year. The testing was performed by comparing the PV power output. The simulation results with the actual measurement data (12 MW PV system) can be divided into two conditions: the daily comparison and the seasonal PV power output. As a result, the proposed model can accurately simulate the PV power output despite the sudden daily climate change. The average nRMSE (normalized RMSE) of the proposed model is very low (1.23%), and ranges from 0.30% to 2.26%. Therefore, it has been proven that this model is very accurate.


Introduction
The application of mathematical models makes process improvement more convenient because sometimes it is not able to perform experiments and modify the processes. The model helps us to obtain in-depth information to be used for the benefits of education or business. The adequacy of the model which is used relies on its accuracy in processing the results. Suppose the model has low accuracy; it will cause more disadvantages than benefits. The primary source of this research will be a recognition of the importance of the model's accuracy. At present, photovoltaic generators have been installed as a substitute for various sources of electrical energy, for applications such as home appliances, utilities, external battery packs, and generators, etc. The most critical steps after the installation of the PV system are maintenance and checking. If we estimate the PV system's energy production, we will be available to check for system faults by comparing the measured data with the simulation data for energy management, and checking whether it exceeds its maximum benefit. Data accuracy is a critical factor for effective energy data management. As such, this research will present a PV system estimation model with a preciseness function learning model.
Many researchers have conducted studies on the creation of PV models, from onepanel simulations to systems simulations of solar cells. PV models are mainly affected by the module temperature and solar irradiance. Most of the modeling research is the analysis of single-diode equivalent circuits (five parameters model) [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. These models have five parameters: photocurrent, the ideal factor according to the photovoltaic technology, parallel resistance, leakage current, and series resistance. In many studies, the data accuracy has been improved after simulating the data from the model by different methods. For example, the improvement of the accuracy of PV models by the use of weight functions obtained by 1-year measured data has been studied [1]. This method makes the model more accurate by evaluating the database using a polynomial equation. This equation is a data evaluation in one form of a curve-fitting process. Other research has shown the linear weighting method for PV power forecasting models [2]. This method updates the model data using a linear equation in order to obtain the final result. Based on several studies on the improvement of the accuracy of the database, each method [3][4][5][6] has different strengths depending on the distribution behavior of the database. Therefore, this research combines each method's strengths and develops it as a new data precision improvement method in order to make the model more efficient for the best data management.
This research aims to improve the processing accuracy of models in order to make the models more useful. The method used to improve the data accuracy is the incorporation of the strengths of the creation of a function from a database to solve problems if a high distribution of information is created. This model can be made more effective by adding output data into the database to enable the model to be self-learning, and to calculate more precisely in the next calculations. The case studies applied to this process were the power output simulation model of photovoltaic systems, and comparison of the efficiency with the other methods or processes of accuracy improvement researched earlier. The test used data from photovoltaic systems in Thailand (14 • 10 78.1 north latitude and 100 • 16 94.9 east longitude). There were two inputs in this model, which are solar irradiance and module temperature, and the output of the model was the power output value of the photovoltaic system. The test was to bring the PV power output value from the accuracy improved model in the different forms, including this research process compared to the actual PV power output in three seasons: the summer, the rainy season, and the winter. The climate changes of the target area have a profound effect on the performance of photovoltaic systems, and also the timing of the installation of the system that causes energy loss in various fields. For this reason, the research devised a method to optimize the data accuracy of the PV system simulation model in order to be as accurate as possible in any climate change. The proposed model detects an abnormality in the power generation of the system and uses the obtained data to check the system for the most efficient system.

Proposed Model for Accuracy Improvement (Interactive Curve Fitting)
The model was created from the analysis of the most suitable curve to represent the data in each point of interest in order to estimate the value of the data, which is called a 'curve-fitting process' [19,20], and to select the target function in many forms for analysis, which is called the 'optimization process'. The functions used in this model are shown as follows.
Exponential models provide a 1-term and a 2-term exponential model given by Equations (1) and (2).
Exponential models are often used when the quantity's rate of change is proportional to the quantity's initial amount. If the coefficient associated with b and/or d is negative, y represents exponential decay. If the coefficient is positive, y represents exponential growth.
The Fourier series models are the summation of the sine and cosine functions that describe a periodic signal. It represents them in either the trigonometric form or the exponential form. This fitting curve, shown in Equation (3), provides this trigonometric Fourier series form: where a o is the model's a constant (intercept) term in the data, and is associated with the i = 0 cosine term; w is the signal's fundamental frequency; n f is the number of terms (harmonics) in the series 1 ≤ n f ≤ 5. The Gaussian model fits peaks; and is given by where a is the amplitude, b is the centroid, c is related to the peak width, and n g is the number of peaks to fit; this provides a 1-term and a 2-term Gaussian model. Polynomial models for curves are given by where n p + 1 is the polynomial order, n p is the polynomial degree, and 1 ≤ n p ≤ 5. The order gives the number of coefficients to be fitted, and the degree gives the highest power of the predictor variable.
The sum of sines model fits periodic functions, and is given by where a is the amplitude, b is the frequency, and c is the phase constant for each sine wave term. n si is the number of terms in the series, and it provides 1-term and a 2-term Sum of Sines models. Each of the methods above is estimated to be accurately calculated as a function based on the studied database's distributed behavior.

One Diode Equivalent Circuit
In order to understand solar cells' electrical properties, the equivalent circuit of solar cells [7,8]-as shown in Figure 1-was considered. The solar cell consists of a constant current source, which is the current of the solar cells caused by the solar irradiance (I ph ), the p-n junction of a diode, the series resistance in the solar cell (R s ), and the parallel resistance within the solar cell (R sh ).
The photocurrent (I ph ) from solar irradiance will be divided through the diode and R s , and then load (R L ). The current flowing through R s and R L is I.
where I ph is photocurrent (A), I d is the current generated by the diode (A), and I is current that the solar cells produce (A). When R s = 0, and I d is given by where I d is the current generated by the diode (A), I o is the reverse saturation current of the diode (A), q is the electron charge value (1.602 × 10 −19 C), k is the Boltzmann constant (1.381 × 10 −23 J/K), T m is the actual temperature at the PV module (K), R s is the series resistance (Ω), R sh is the shunt or parallel resistance (Ω), and n is the ideal factor of a diode according to the PV technology involved (chosen from Table 1). By substituting Equation (9) into Equation (8), we obtain Equation (10): In the case of a short-circuit, the load to obtain the maximum current value of the solar cell (I sc ), V o = 0, is given by In this case, the photocurrent (I ph ) is I sc , and is equal to the current generated by the solar irradiance. The circuit is opened at the load point for the maximum solar cell voltage (V oc ), I = 0, and V o = V oc . ln So : V oc = nkT m q ln I ph I 0 + 1 (14) where I ph is the photocurrent (A), I o is the reverse saturation current of the diode (A), q is the electron charge value (1.602 × 10 −19 C), k is the Boltzmann constant (1.381 × 10 −23 J/K), T m is the actual temperature at the PV module (K), V oc is the open-circuit voltage (V), and n is the ideal factor of a diode according to the PV technology involved (chosen from Table 1).
In the case of the load having 0 < R L < infinitely, the solar cell supplies current (I) and voltage (V) to the load between 0 < I < I sc and 0 < V < V oc . The values of the current and voltage that cause both volumes to have the highest value are called the maximum current and the maximum voltage, respectively. This gives the maximum electric power, as shown in Figure 2.

Effects of the Series Resistance and Parallel Resistance of Solar Cells
From the equivalent circuit of a solar cell, as shown in Figure 3, while operating, it can be seen that the current from solar irradiance is represented by a constant current source (I ph ). It emits an electrical current in the opposite direction to the current that flows through the p-n junction in ideal (I d ). The series resistance occurred because of the semiconductor's resistance, the resistance of the ommic contact area between the metal and the p and n parts, the resistance of the connecting wires, and the sum of all of the series resistance, abbreviated as R s . Parallel resistance, R sh , is a hypothetical resistance parallel to the boundary in the complete p-n junction.
Ideally, the value of R sh is infinite and, in the ideal solar cell, the value of R s is zero. However, in practice, the semiconductor crystal has a breakdown. Joints, especially those with large areas, have defective parts, resulting in an incomplete p-n junction. Therefore, the R sh value is not infinite, and the result of the R s value is not equal to zero. The effects of the R s and R sh values will change the properties of the solar cells. In general, the R sh is not high enough to ignore the effect on the solar cell properties, but the R s resistance will significantly affect the solar cell properties. In designing the structure's characteristics, the builders must then consider the effect of the R s value. R s = 0 and I are given by If R s > 0 and R sh < infinity, V 0 = V L + IR s is given by where I ph is the photocurrent (A), I 0 is the reverse saturation current of the diode (A), q is the electron charge value (1.602 × 10 −19 C), k is the Boltzmann constant (1.381 × 10 −23 J/K), T m is the actual temperature at the PV module (K), R s is the series resistance (Ω), R sh is the shunt or parallel resistance (Ω), and n is the ideal factor of a diode according to the PV technology involved (chosen from Table 1). From Equation (16) and Figure 4 showing R s ' effect on solar cells, it can be seen that solar cells with a high R s value will suffer fewer short-circuit currents. The slope of the graph was also decreased, resulting in much less energy from the solar cells. Therefore, when creating solar cells, there should be a way to reduce R s to the minimum possible.

Effect of Solar Irradiance on the Solar Cells
The solar irradiance has a great effect on how the solar cell's electric current and power change, as shown in Figure 5. When V = 0, I is given by

Effect of Temperature on the Solar Cells
The temperature of the solar panel has a direct effect on the voltage. When the temperature is high, the voltage will drop and this will result in decreased power, as shown in Figure 6. When I = 0, V 0 = V oc is given by

Equation Using for PV Simulation Model
The basic structure of a solar cell consists of a p-n junction of a silicon semiconductor represented by a diode and current source in an equivalent circuit, as shown in Figure 3. When the solar irradiance falls onto the solar cell, it creates charges at the p-n junction, and moves by the electric field that occurs at the p-n junction. For this reason, it causes the voltage at both poles. When connected to the load, the current flowing into the circuit is directly proportional to the solar irradiance. Equation (19) shows that the solar cells' current and voltage properties are in the form of an exponential equation [7,8]: where I ph is the photocurrent (A), I 0 is the reverse saturation current of the diode (A), T m is the actual temperature at the PV module (K), R s is the series resistance (Ω), R sh is the shunt or the parallel resistance (Ω), and n is the ideal factor of a diode according to the PV technology involved (chosen from Table 1).
The PV system simulation model (1D5P) was created from Equation (19). It consists of a single diode (1D) and five main parameters (5P) [1][2][3]. The 5P consist of a current photo, I ph ; the ideal factor according to the photovoltaic technology involved, n; a parallel resistance, R sh , expressing a leakage current, I 0 ; and a series resistance, R s [7][8][9]. The PV system simulation model was analyzed using Kirchhoff's law, and was used to estimate the photovoltaic power output.
The photocurrent, I ph (A) depends on the solar irradiance and module temperature. However, changes a lot with the solar irradiance. Consequently, this photocurrent can be expressed as Equation (20): where I sc is the short circuit current (A), T m is the module temperature (K), T ref is the module temperature at the standard test conditions (STC) (298 K), µ sc is the temperature coefficients of I sc (A/ • C), G is the solar irradiance (W/m 2 ), and G ref is the solar irradiance at STC (1000 W/m 2 ). The module temperature directly affects the reverse saturation current of the diode I o , as shown in Equations (21)-(23): So : where I sc is the short circuit current (A), T m is the module temperature (K), T ref is the module temperature at STC (298 K), V oc is the open-circuit voltage (V), E g is the band energy gap of the solar cell, and n is the ideal factor according to the PV technology involved. It is impossible to ignore the series resistance (R s ) and the shunt resistance (R sh ). The form of the circuit that takes the series resistance (R s ) and the shunt resistance (R sh ) is shown in Figure 3. The solar cell connects in series to increase the voltage, and connects in parallel to increase the electric current. The voltage and current will vary according to R s and R sh .
The shunt or parallel resistance, R sh , is defined in Equation (24): where R sh is the shunt or parallel resistance (Ω), R s is the series resistance (Ω), P m is the

Preciseness Function Learning Model (PFL Model)
The proposed model is the data precision improvement process method based on a database that shows the application's behavior to be studied. The amount of database data is either large or small, depending on all of the application's behavior. The proposed model uses a process called the 'Preciseness function learning model' (PFL model). The process relies on various curve-fitting principles to bring out each format's strengths to improve the data accuracy. The structure and method of operation of the process are shown in Figure 7. The details and methods of each step can be explained as follows.
Step 1: The application's functional format contains an input to obtain the output, and takes the output as the input of the model in order to improve the data's accuracy. This can be written as Equation (25): where A 1 , A 2 , A 3 , . . . , A nipt are the input of the studied application's computational process, B is the output of the studied application's computational process, and E ipt is the input of the proposed model and the data that is used to improve the accuracy.
Step 2: The finding of the factors to improve the data's accuracy can be accomplished as follows: where f nk is the factor used to adjust the accuracy of the data, E opt is the output of the proposed model and the data that has improved the accuracy, B imp is the data that has been improved for accuracy, ψ(x) is the relative function of the input and output of the studied application with n db databases, and n db is the number of the database. In order to find the relative function of the input and output of the studied application, we analyzed the database's value into equations by a curve-fitting process in Table 2 (Exponential models, Fourier series models, Gaussian models, Polynomial models, and sum of sines models). Table 2. Curve-fitting functions of the database.

Types Functions
Exponential The A nipt ∈ E ipt is shown in Figure 8. Figure 8 shows the trendline from the database's estimation in the form of various equations with a curve-fitting process. The curve-fitting of the database in various forms has different curves, depending on the amount of data or the data distribution. A method of improving the data accuracy is the selection of the equation model using the curve-fitting process with the most accurate curve adjustment, which is chosen from the equation model with the R-square closest to 1, as summarized in Table 3.
The proposed model selects a curve-fitting function of the database with an R-square value closest to 1 from Table 3. The accuracy-improved data is given by where B imp is the accuracy improved data; ψ x (A nipt ) is the relative function of the input and output of the studied application with n db databases, from which the x function form is selected; E ipt is the input of the proposed model and the data used for the accuracy improvement, and n; where I d is currently generated by the diode (A), I o is the reverse saturation current of the db is the number of the database.  Step 3: The proposed model always updates its database after processing. That is, the proposed model learns to further optimize the processing precision. The Learning Method Structure process of the proposed model is shown in Figure 9. The details can be explained in equations, as follows: where D ndb is the database at the first precision improvement, n db is the number of the database at the first precision improvement, x i is the number of databases, and x n db is the average amount of databases.
where D m is the accuracy improved data, m is the number of the data, x j is the amount of data from the precision improvement, and x m is the average amount of data from the precision improvement. When Equation (31) + Equation (33), we obtain Equation (34), as follows: where n db+m is the number of the new database, and x n db +m is the average amount new databases. So : where D ndb+m is an updated database that contains the number of data n db+m . Through precision improvements from self-learning models, the data adds enhanced data to the database, in order to be more efficient in the next processing. This is given by where B imp is the accuracy improved data; ψ x (A nipt ) is the relative function of the input and output of the studied application with n db+m databases, and its selected x function form; E ipt is the input of the proposed model, with the data that is used to improve the accuracy; and n db+m is the number of the database. Figure 10 shows the structure of the photovoltaic simulation model with the preciseness reciprocation process. The structure describes the system of the model, which is divided into 2 parts: Part 1: Photovoltaic simulation model, which has two inputs (solar irradiance and module temperature) and one output (PV power). Part 2: Improving the accuracy of the photovoltaic simulation model's output by the proposed model (Preciseness Function Learning Model (PFL Model)).

Photovoltaic System Simulation Model with Preciseness Function Learning Model (PFL Model)
The reason to improve the data's accuracy is that the results from the PV power output of the PV simulation model are the ideal result due to the loss of energy in various fields. Photovoltaic systems have many components that can affect electricity production, which is the source of this research. The structure's database is the PV systems data in the central region, Thailand. The database should contain at least one year because the overall behavior of photovoltaic power generation in Thailand depends on the seasonal weather. Thailand has three seasons in a period of one year.
The function format of the application is the 1D5P PV systems simulation model, according to Equation (19), which is given by where A 1 is the solar irradiance (G, W/m 2 ), A 2 is the module temperature (T m , • C), and B = E ipt is the PV power output from the 1D5P PV systems simulation model (P max , MW). G ∈ P max is shown in Figure 11.   Table 4. The accuracy improved data is given by Equation (38). The simulation result is shown in Figure 12.
where B imp is the accuracy improved PV power output (P max , MW); a so is the model and a si is the intercept constant term in the data, which is associated with the i = 0 cosine term; w s is the fundamental frequency of the signal; and n sf is the number of harmonics terms in the series, in which 1 ≤ n sf ≤ 5.

Information and Simulation of the PV System
The case study applies the proposed model for the improvement of the PV system simulation model's accuracy using 2-year measured data (2018-2019) from the PV system in the central region of Thailand (14 • 10 78.1" north latitude and 100 • 16 94.9" east longitude). The PV system consists of 48,980 PV panels and 12 inverters; one string consists of 24 PV panels connected in series, and two series are connected in parallel; seven arrays consisting of 10 strings are connected in parallel; two arrays consisting of eight strings are connected in parallel, and nine arrays are connected in one inverter. The experiment of the PV system is as shown in Table 5. The PV system has monitoring systems for all of the parameters, which are recorded every 1 min. The pyranometer uses the KIPP&ZONEN band (CMP series) by installing it on the same plane as the PV panel, and the thermometer is installed under the PV panel. The PV panels uses the REC peak energy series band (REC245PE), as shown in Table 6.
In order to verify the accuracy of the proposed model with the application to the PV system simulation model, the testing and comparison of the simulation results with the actual measurement data can be divided into two conditions. These are the daily PV power output in two weather conditions (cloudy day and sunny day) and the effect of climate change for the electricity production of the photovoltaic systems. In this study, the model was designed to calculate the PV power output in different weather conditions, which are divided into seasons. These are in the summer (February-May), the rainy season (June-September), and the winter (October-January). The division of the seasons in Thailand was taken from the Thai Meteorological Department. The test uses data from photovoltaic systems in Thailand (2018Thailand ( -2019. In order to verify the model that was used to improve the accuracy, the most accurate results to compare the % RMSE [22] values with the measured data is given by where simulate i is the simulation data, measured i is the measured data and measured is the average value of the measured data.  Figure 13 shows a comparison of the PV power output between the proposed model and the actual measurement data of two weather conditions. The test results are as follows. The PV power output of the proposed model tends to be in the same direction as the data from the actual measurement data on a cloudy day. This shows that this model can accurately simulate the PV power output in spite of sudden climate change. The nRMSE (normalized RMSE) is deficient (3.19%), as shown in Figure 13a.

Daily PV System Simulation
The proposed model can accurately simulate the PV power output on a sunny day, and is very useful on days with the right weather conditions. The nRMSE (normalized RMSE) is lower than the cloudy day (1.79%), as shown in Figure 13b.
The PV power output of the proposed model and the actual measurement data are summarized in Table 7. The nRMSE (normalized RMSE) shows the accuracy of the proposed model.

Seasonal PV System Simulation
In order to compare the electricity generation data of the proposed model and the actual measurement, the conditions are divided into three seasons: the summer, the rainy season, and the winter. The climate in each season is different. The climate is an essential variable for the electricity generation of the PV system, including solar irradiance, ambient temperature, and module temperature, etc. In this research, the proposed model's accuracy is checked with the PV system simulation model application that calculates the PV system's electricity generation in all climatic conditions. In the test, this research randomly generated electricity for one week of the three seasons in 2019. Figure 14 compares the PV power output between the proposed model and the actual measurement data for the random week of the different seasons. The simulation results of the sampling of the photovoltaic system's electricity production, the simulation results, and the data from the actual measurements tend to show a similar trend. The daily electricity generation data for all three seasons have different weather conditions. However, the proposed model shows that this model can accurately simulate the electricity production in the different weather conditions.    The test results of the PV energy of the proposed model and the actual measurement data on monthly basis (2019) showed that the average PV energy is 1453 MWh/month, and that the PV energy is 17440 MWh/year. The average PV energy is 1450 MWh/month, and the PV energy is 17406 MWh/year in the actual measurement data. The nRMSE (normalized RMSE) ranges from 0.07% to 2.91%, which shows that this value is deficient. The average nRMSE (normalized RMSE) is 1.26% for 12 months (in 2019).
In the comparison of the PV energy of the 24 months data of the proposed model (2018-2019) with the actual measurements for all three seasons, the average nRMSE (normalized RMSE) is 2.12% in the summer, 0.49% in the rainy season, and 1.08% in the winter. These are the results of the proposed model.
In the rainy season, the model is at its highest accuracy. We found that the solar cells in this area performed at their best because the dust was washed off the front of the solar panels by the rain, and because the temperature accumulation of the solar panels is not as high as it is in the summer. Because of this, it causes relatively little energy loss.
In the winter, the model is less accurate than it is in the rainy season, but more accurate than it is in the summer because there is a lot of dust on the front of the solar panel, which is a waste of energy for the solar cells. However, no heat is accumulated on the solar cells; thus, the models presented can be processed better than those in summer.
In the summer, the model is less accurate than both seasons because the summer temperature is high. The heat accumulation of the solar panel causes the temperature to rise.
In the actual case of a solar farm, the panel cleaning is not performed every day, but there is a cleaning cycle. Therefore, the accumulated dust affects the solar cell's electrical power generation efficiency.
The main factors that affect the PV system's electricity production are solar irradiance, which varies according to the solar cell's current, and the panel's temperature, which varies according to the voltage of the solar cell. The proposed model has a method that uses the relationship of solar irradiance and electricity generation to help improve the accuracy process, but the module temperature is not used as a parameter in this model. For this reason, the accuracy of the model in the summer is lower than it is in the rainy season.

Comparison with the PV Model with the Weight Function
The precision of the proposed model was verified by the comparison with the PV Model with the weight function [1] and the actual data from the one-year measured data, as shown in Figure 16. The PV Model with the weight function [1] uses a method to quantify the difference between the simulated data and the actual measurement data in the polynomial equation form in order to improve the simulated data's accuracy. This method is accurate with a significant increase in the database, but as the database becomes more diffused, the accuracy of this method is lower.
On the other hand, the proposed model was designed to analyze various database behaviors diffused as equations in different forms of 'curve-fitting process'. The proposed model is a learning model that increases in accuracy every time it analyses and calculates data.
The results show that the proposed model has very high accuracy. The proposed model has a lower nRMSE than the PV Model with the weight function in all three seasons, as shown in Table 9.

Conclusions
This research presents a simulation of a method to improve the accuracy of mathematical models (the Preciseness Function Learning Model (PFL Model)) using a PV simulation model of the PV system, a precision test of the model, and a comparison of the 2-year data of the proposed model with the actual measurements of the PV system. The accuracy depends on the function selected from the curve-fitting process.
The results show that the proposed model has the highest accuracy in the rainy season. The average nRMSE (normalized RMSE) of the proposed model is very low (1.23%), and it ranges from 0.30% to 2.26%. It has been proven that this model is very accurate. The proposed model is a learning model that can optimize the higher precision with which the proposed model performs. It can be said that, after the accuracy improvement, the PV power output will be added to the proposed model database in order to enable the model to learn and improve the accuracy of the data in the next prediction results. In the future, we plan to develop models to improve the data accuracy by correlating more than one input to the outputs, in order to reduce the loss or error from the calculation during the data precision improvement.
Author Contributions: A.B., S.K., P.C. designed this experiment and prepared the manuscript. The experiments were carried out by A.B., S.K., P.C., K.S. and A.B., S.K., A.B., S.K., P.C., S.S., P.M., K.S., W.T., S.N. have analyzed the results and discussed the manuscript during the preparation. All authors discussed the results and implications and commented on the manuscript at all stages. All authors have read and agreed to the published version of the manuscript.
Funding: Details of organizations that funded the research and publication of article: