Solar Irradiance Database Comparison for PV System Design: A Case Study

: Effective design of solar photovoltaic (PV) systems requires accurate meteorological data for solar irradiance, ambient temperature, and wind speed. In this study, we aim to assess the reliability of satellite-based solar resource databases such as NASA, Solcast, and PVGIS by comparing them with ground-based measurements of global horizontal irradiance (GHI) from six locations in the Republic of Ireland. We compared satellite-and ground-based GHI data recorded between 2011 and 2012 and used Python-based packages to simulate solar power output for the six locations using both data types. The simulated outputs were then compared against metered power output from PV arrays at the sites. Ground-based GHI measurements demonstrate superior accuracy due to their acquisition at specific locations, offering increased spatial representativity. On the other hand, satellite GHI measurements, although reasonably accurate for many applications, cover broader regions with lower spatial resolution, leading to averaging effects that may not fully capture localized variations. This difference is reflected in the mean absolute percentage error (MAPE) values, with ground-simulated data showing low MAPE values, indicating strong alignment with reference observations, while satellite-simulated data exhibit a slightly higher MAPE, suggesting less precise estimates despite a strong correlation with ground-based measurements. This study demonstrates the relative reliability of satellite-and ground-based GHI data for accurate solar PV system design, emphasizing the practical implications for energy planners and engineers, and providing a strong enhancement for researchers working on forecasting solar energy yields using satellite databases. The Python-based PVLib package was utilized for the simulation, offering a robust framework for modeling and analyzing solar power systems, and its effectiveness in this context is discussed in detail.


Introduction
As the world's energy needs increase, there is a growing focus on renewable energy sources.Solar energy, in particular, is gaining attention thanks to advances in efficiency and cost-effectiveness [1].Harnessing solar energy for electricity generation offers a sustainable alternative to traditional fossil fuels.By reducing reliance on fossil fuels, solar power helps to stabilize energy costs and plays a significant role in addressing climate change by minimizing carbon dioxide (CO 2 ) emissions [2].
Understanding the environmental factors that affect the performance of photovoltaic (PV) solar systems is essential for engineers responsible for designing and optimizing these systems.The initial phase involves selecting the most suitable site for installing PV modules.It is vital for accurate meteorological data to be recorded with high temporal resolution, and derived values must adhere to established standard procedures [3].
Several recent studies have utilized open solar databases due to their accessibility and compatibility with various solar system simulation and design tools.A. Khamisani [4] conducted a case study of sizing PV-powered bus shelters using the PV WATTS tools by importing weather data from the database of the National Renewable Energy Laboratory (NREL) of the United States.M. Byamukama [5] used NASA's daily database to calculate the average peak sunshine hours that helped to determine the number of PV panels to be installed.Y. Muñoz [6] used the software tool PVsyst with the NASA database to size a grid-tied photovoltaic system.S. Nwokolo created a hybrid model that was used to examine how climate change might affect various solar PV technologies using machine learning.This was accomplished by using energy variables derived from the Australian Community Climate and Global System Simulation (ACCESS-CM2).Data were recorded from the Desert Knowledge Australia Solar Center (DKASC) [7].A case study in Vietnam focused on a new method for forecasting the energy output of a large-scale solar power plant based on long short-term memory networks.The dataset used in the case study was recorded from the Thanh Cong 1 solar power plant, which is situated in the Tay Ninh province in the southern part of Vietnam [8].
A variety of parameters significantly influence the performance of a solar system.These include ambient temperature, Diffuse Horizontal Irradiance (DHI), Direct Normal Irradiance (DNI), Global Horizontal Irradiance (GHI), and wind speed.Figure 1 provides an overview of the distinct solar radiation components in the atmosphere [9].It is also important to consider other meteorological factors such as rain intensity, humidity, and wind direction when designing a PV system.Ambient temperature is particularly crucial as elevated temperatures can lead to decreased efficiency of PV cells [10].A study in Nigeria demonstrated a direct correlation between ambient temperature and PV cell power output, highlighting the significance of considering temperature effects during the design stage [11].
Sustainability 2024, 16, x FOR PEER REVIEW 2 of 29 Several recent studies have utilized open solar databases due to their accessibility and compatibility with various solar system simulation and design tools.A. Khamisani [4] conducted a case study of sizing PV-powered bus shelters using the PV WATTS tools by importing weather data from the database of the National Renewable Energy Laboratory (NREL) of the United States.M. Byamukama [5] used NASA's daily database to calculate the average peak sunshine hours that helped to determine the number of PV panels to be installed.Y. Muñoz [6] used the software tool PVsyst with the NASA database to size a grid-tied photovoltaic system.S. Nwokolo created a hybrid model that was used to examine how climate change might affect various solar PV technologies using machine learning.This was accomplished by using energy variables derived from the Australian Community Climate and Global System Simulation (ACCESS-CM2).Data were recorded from the Desert Knowledge Australia Solar Center (DKASC) [7].A case study in Vietnam focused on a new method for forecasting the energy output of a large-scale solar power plant based on long short-term memory networks.The dataset used in the case study was recorded from the Thanh Cong 1 solar power plant, which is situated in the Tay Ninh province in the southern part of Vietnam [8].
A variety of parameters significantly influence the performance of a solar system.These include ambient temperature, Diffuse Horizontal Irradiance (DHI), Direct Normal Irradiance (DNI), Global Horizontal Irradiance (GHI), and wind speed.Figure 1 provides an overview of the distinct solar radiation components in the atmosphere [9].It is also important to consider other meteorological factors such as rain intensity, humidity, and wind direction when designing a PV system.Ambient temperature is particularly crucial as elevated temperatures can lead to decreased efficiency of PV cells [10].A study in Nigeria demonstrated a direct correlation between ambient temperature and PV cell power output, highlighting the significance of considering temperature effects during the design stage [11].In the southeastern region of the UK, research highlighted the significant impact of precipitation, including rain and snow, as well as clouds and dust, on the performance of PV systems [12].Clouds can cast shadows on PV panels, reducing the amount of sunlight reaching the panels and thus decreasing solar energy production.However, rain can have a positive effect by effectively washing away accumulated dust from the panels.When In the southeastern region of the UK, research highlighted the significant impact of precipitation, including rain and snow, as well as clouds and dust, on the performance of PV systems [12].Clouds can cast shadows on PV panels, reducing the amount of sunlight reaching the panels and thus decreasing solar energy production.However, rain can have a positive effect by effectively washing away accumulated dust from the panels.When snow accumulates on the panels, it can partially or completely block sunlight, leading to a reduction in power output [13].A study by Panjwani [14] has shown that humidity significantly affects the voltage and current produced by PV cells.The study revealed that power output decreases as humidity levels increase, emphasizing the importance of considering humidity when evaluating the efficiency of PV systems.Wind speed and direction are also significant factors.Specifically, wind may provide cooling for PV cells.This natural cooling process aids in reducing the overall temperature of PV cells, resulting in improved performance and higher power output, as illustrated in a study conducted by S. Chandra and S. Agrawal [15].Analyzing solar irradiance components is a significant prerequisite for determining the optimal size of a proposed solar PV system.This analysis provides valuable insights into solar irradiance levels at a specific location, offering information about the potential amount of sunlight reaching the Earth's surface on various time scales as well as the direct and diffuse fractions.Historical data aid engineers in determining the appropriate size of the PV array and evaluating the estimated cost of solar projects during feasibility studies.
Meteorological data, including solar irradiance, can be obtained from various providers offering satellite-derived information, with some resources providing global coverage and others limited to specific geographic areas.For example, NASA provides a comprehensive global historical database covering earth sciences and climate-related data, while NREL focuses primarily on data specific to the United States.Initial studies of solar systems involve examining historical weather conditions at the chosen location.To improve the accuracy of solar resource assessments, analyzing at least a decade's worth of historical data is crucial [16].The assessment of large solar power plant feasibility often relies on satellite remote sensing methods to gather the necessary data.
A variety of methods exists for evaluating historical weather data, particularly in relation to solar irradiance.One effective approach is the establishment of a dedicated solar irradiance measurement station near the intended location, providing precise ground-based data.These data serve as a valuable reference for assessing information collected from devices such as pyranometers and pyrheliometers.However, the installation, upkeep, and recalibration of ground monitoring stations for GHI and other weather variables can be expensive and may not be justified for smaller PV installations, and the duration of data recording may be insufficient to evaluate the interannual variability.In such cases, satellite measurements offer a potential alternative [17].
The objective of this research is to assess the reliability and appropriateness of satellitederived solar irradiance data for the optimization of solar PV systems in the Republic of Ireland.This study will undertake a comprehensive analysis of different solar resource data sources, such as NASA, Solcast, and PVGIS databases.The focus will be on validating satellite databases against ground-based measurements of GHI.The comparison will involve daily, monthly, and annual assessments to evaluate accuracy across various timeframes and spatial resolution.Statistical indicators, including root mean square error (RMSE) and correlation values, will be calculated to investigate the variations between ground-based and satellite data.This approach will provide insights into the subtle differences observed between these two data sources.Furthermore, the study will evaluate the implications for estimating solar power at these sites, with the goal of providing guidance on incorporating satellite-and ground-based data to improve the precision of solar energy system planning and design.
Figure 1 depicts the components of solar radiation, emphasizing direct radiation, diffuse radiation, and reflected radiation.It illustrates how clouds, air molecules, Earth's surface, and the sea interact with solar radiation.The diagram also shows how PV cells capture both direct and diffuse radiation for electricity generation.This is important for assessing the reliability of satellite-derived solar irradiance data for optimal solar PV system design, particularly in northern latitudes.This research compares satellite-derived and ground-based measurements of both direct and diffuse radiation.

Methods
The flowchart in Figure 2 represents the method that has been followed in this study.

Methods
The flowchart in Figure 2 represents the method that has been followed in this study.

Available Meteorological Data Resources
This section provides a summary of historical solar resource data from different databases, presenting an inventory of solar weather resources.Table 1 highlights variations in parameters such as time recording, time resolution, and spatial coverage across the selected databases.NASA and Solcast offer global data coverage, while other databases are limited to specific locations.It is worth noting that all the satellite data featured in this study are accessible for free.However, some other databases, such as those offered by the European Center for Medium-Range Weather Forecasts (ECMWF) and Solar Anywhere, may require a payment [16].Moreover, the accessibility of these databases varies, with some allowing direct access through web portals without the need for user accounts, as seen with NASA.In contrast, other databases require user registration to download data, like Solcast.

Available Meteorological Data Resources
This section provides a summary of historical solar resource data from different databases, presenting an inventory of solar weather resources.Table 1 highlights variations in parameters such as time recording, time resolution, and spatial coverage across the selected databases.NASA and Solcast offer global data coverage, while other databases are limited to specific locations.It is worth noting that all the satellite data featured in this study are accessible for free.However, some other databases, such as those offered by the European Center for Medium-Range Weather Forecasts (ECMWF) and Solar Anywhere, may require a payment [16].Moreover, the accessibility of these databases varies, with some allowing direct access through web portals without the need for user accounts, as seen with NASA.In contrast, other databases require user registration to download data, like Solcast.
Table 1 provides a detailed inventory of the solar resource data sources employed in the study.The table includes pertinent information for each data source such as the provider, data timeframe, type of model or measurement, underlying data source, time and spatial resolution, spatial coverage, calculation method, and the availability of various meteorological parameters (DHI, GHI, DNI, ambient temperature).The availability date and direct links for accessing data from each data source are included, facilitating easy retrieval of meteorological data for solar resource analysis.

Site Selection
Several critical criteria are required to be confirmed before determining which location should be chosen for any study, including the reliability and quality of the data provided.Ground data collected during the microgeneration field trial carried out for the Sustainable Energy Authority of Ireland (SEAI) from 2011 to 2012 were used for this study.According to the SEAI Microgeneration Pilot Field Trials Technical Report, 15 solar PV installations from 6 different manufacturers were included in field trials across the Republic of Ireland, with a focus on the 1-30 kW peak power range due to physical constraints such as array size, available roof area, and limitations in planning exemptions, primarily for domestic solar PV installations.Solar irradiance pyranometers were installed by Obelisk Energy Ltd. (model CS300, Campbell Scientific, Logan, UT, USA) to monitor and record the solar irradiance with time intervals of 15 min, and the Circutor MK-30 DC meter was used to measure the array DC output power, while the ISKRA MIS EC1-80 was used to measure the output AC power.The CS300 pyranometer has an accuracy of ±5% for daily total radiation and a spectral range of 300-1100 nm.Polycrystalline PV modules were installed in all sites [18,19].
The site selection criteria of this study prioritize sites from the SEAI study that provide reliable information, and several factors were used to inform the selection of a particular site.Geographic diversity provides a more thorough understanding of solar energy patterns by taking different latitudes and conditions into account.Furthermore, the availability of a large volume of ground-based data ensures an accurate comparison with satellite data.The selected sites are expected to provide a comprehensive view of solar energy dynamics by considering all of these variables while making the research more reliable to different geographical locations and climate conditions.
This study compares weather records from six different locations in the Republic of Ireland based on the SEAI study.The coordinates are presented in Table 2 and mapped in Figure 3 [20].Sites were chosen in Dublin, Waterford, Dundalk, and Mayo due to the high reliability and quality of data captured in these locations.This study's objectives were aligned with the period of high-quality data available during the 2011-2012 trial.According to the report provided by SEAI, these selected locations were recorded with a 100% full data capture, with no missing data during the specified period.The data recorded for the Limerick and Wexford sites were deemed to be less accurate.However, these locations were included in the study to demonstrate the performance of satellite data when there are errors present in the ground-based data.Despite acknowledging concerns about the accuracy of the data recorded for Limerick and Wexford, their utilization as benchmarks may be justified based on certain contextual considerations.When there is a lack of data, researchers or analysts can decide to use the data available, while recognizing its shortcomings and emphasizing the need for attention in analysis.Such records, even though they are thought to be less reliable, may however offer useful data when used relatively or comparatively.Such records may still be appropriate for specific analytical purposes if the main goal is to find trends, patterns, or variations rather than focusing on absolute figures.  2 presents the geographical coordinates of the six selected locations in the Republic of Ireland where solar irradiance and PV performance data were gathered for the study.The spatial distribution contributes to an understanding of solar energy patterns across varied climatic and environmental conditions within the region.
Figure 3 depicts the geographical distribution of the six study sites along with the levels of GHI across Ireland.The color scheme on the map corresponds to the long-term average GHI from 1994 to 2018, varying from 803 to 1095 kWh/m 2 .The map clearly shows that the chosen sites encompass a wide range of GHI levels, ensuring a diverse dataset for the study.The map is obtained from the World Bank Group's ESMAP program [20].
Figure 3 depicts the geographical distribution of the six study sites along with the levels of GHI across Ireland.The color scheme on the map corresponds to the long-term average GHI from 1994 to 2018, varying from 803 to 1095 kWh/m 2 .The map clearly shows that the chosen sites encompass a wide range of GHI levels, ensuring a diverse dataset for the study.The map is obtained from the World Bank Group's ESMAP program [20].

Downloading the Weather Database for the Selected Site
A Typical Meteorological Year (TMY) is a dataset that includes weather variables such as air temperature, wind speed, and solar irradiance, among others, at a specific location.The TMY dataset records weather data on an hourly or sub-hourly time series, providing valuable information for scientific analysis and evaluation of the availability of renewable energy resources or for assessing building energy consumption [21].The oldest style (TMY1) is the first-generation dataset produced by Sandia National Laboratories in 1978.These datasets, which provide representative weather data, are typically developed from diverse sources and methodologies.Recognizing the increasing need for standardized TMY datasets, they introduced a global TMY dataset, spanning 38,947 meteorological stations worldwide.This dataset was created using the Chinese Standard Weather Database (CSWD) method and is based on the ERA5 atmospheric reanalysis product, a fifthgeneration dataset from the European Centre [22].Stored as a 55 GB collection of compressed CSV files, it is readily accessible through a dedicated website, facilitating tailored TMY data downloads based on geographic coordinates.Rigorous validation procedures have affirmed the suitability of ERA5 as a reliable data source and validated the accuracy

Downloading the Weather Database for the Selected Site
A Typical Meteorological Year (TMY) is a dataset that includes weather variables such as air temperature, wind speed, and solar irradiance, among others, at a specific location.The TMY dataset records weather data on an hourly or sub-hourly time series, providing valuable information for scientific analysis and evaluation of the availability of renewable energy resources or for assessing building energy consumption [21].The oldest style (TMY1) is the first-generation dataset produced by Sandia National Laboratories in 1978.These datasets, which provide representative weather data, are typically developed from diverse sources and methodologies.Recognizing the increasing need for standardized TMY datasets, they introduced a global TMY dataset, spanning 38,947 meteorological stations worldwide.This dataset was created using the Chinese Standard Weather Database (CSWD) method and is based on the ERA5 atmospheric reanalysis product, a fifth-generation dataset from the European Centre [22].Stored as a 55 GB collection of compressed CSV files, it is readily accessible through a dedicated website, facilitating tailored TMY data downloads based on geographic coordinates.Rigorous validation procedures have affirmed the suitability of ERA5 as a reliable data source and validated the accuracy of the generated TMY data.This standardized TMY-ERA5 dataset is invaluable in building system design, energy consumption simulations, and climate studies, especially in regions with limited ground meteorological stations or historical meteorological records.It serves as an invaluable reference for TMY datasets across various domains [23].It is important to note that datasets cannot be used interchangeably because they may differ in time formats, elements, and even units.
The essential parameters required in this study to analyze the PV power output are ambient temperature, DHI, DNI, GHI, and wind speed.The PVGIS-SARAH2 database was selected to conduct the study.The database (TMY3) that PVGIS-SARAH2 uses consists of hourly values over a period of several years based on satellite-derived data and reanalysis [24].Satellite data were chosen to match the period of ground measurements of GHI in the SEAI dataset [19].The PVGIS-SARAH2 database has been chosen for this study due to its features that align with the study's objectives.The database stands out due to its comprehensive coverage, precise solar radiation data representation, and high spatial resolution, proven reliability, global accessibility, and suitability for the study's regional focus.

Energy Simulation Software
Several software packages can be used for educational or commercial purposes that simulate or model the performance of solar PV systems.Some of the most used tools that can be used for simulating the performance of PV systems include: PVsyst [25], SAM [26], HOMER [27], RETScreen [28], MATLAB [29], and PVLib-python [30].These tools offer various features and capabilities tailored to different aspects of PV system design and analysis, enhancing their utility for both researchers and practitioners.Each software package has unique strengths, making them suitable for different research needs and commercial applications in the field of solar energy.
PVLib-python is a highly useful tool that was developed by Sandia National Laboratories for PV modeling.It offers a range of functions that can be used for simulating and analyzing the performance of photovoltaic energy systems.It is a preeminent choice for research focused on analyzing and simulating PV energy systems due to its open-source accessibility and established reliability.Its suite of functions caters to diverse research needs, including solar resource assessment, PV system performance modeling, and financial analysis.Additionally, PVLib-python is highly customizable to suit distinct research contexts and objectives.The tool is implemented using the Python programming language, which is widely used in scientific and engineering communities [31].The software package also includes a comprehensive tutorial that provides users with step-by-step instructions on how to use the various functions included in the tool [32].The comprehensive documentation and active community support further enhance its usability and adaptability for various projects.By leveraging PVLib-python, researchers can conduct detailed and accurate simulations that support robust analysis and decision-making in solar PV system design.
The code was run using Python version 3.8 and executed on a MacOS system with 16 GB of installed memory and a 2.3 GHz 8-Core Intel Core i9 processor running Ventura 13.5.2.By importing the weather data into the PVLib-python tool, the power output for the PV model can be calculated easily.Some additional outputs were also computed using the PVLib functions which are: solar position, solar irradiance, PV temperature, and performance metrics such as the efficiency of the PV system.The performance of the simulations was validated against metered power output from PV arrays, ensuring the reliability of the model.This approach allows for precise tuning of the PV system parameters to match real-world conditions and improve simulation accuracy.
Solar position data, including apparent zenith and azimuth angles, were computed using the "analysis [33].Location" class.Subsequently, it calculated the orientation of a single-axis solar tracker and computed the Plane of Array (POA) global irradiance based on tilt and azimuth angles.The cell temperature of the solar panels was estimated using the Single-Axis Photovoltaic Module (SAPM) temperature model.The code further determines the power output by considering solar arrays of identical sizes installed at various locations, considering a temperature coefficient, the tilt angle adjusted as per the site setting, and the inverter efficiency also set as per the actual value [34].It then produces a corresponding plot and saves the calculated array power data as a CSV file.The outputs of the PVLib functions, which include solar position, irradiance, PV module temperature, and performance metrics, are essential for measuring the efficiency of photovoltaic solar power plants.To understand how sunlight availability affects energy generation, information about solar position and irradiance is essential.PV temperature data are useful in assessing how temperature affects panel efficiency [35].System efficiency is measured by performance indicators, such as the Performance Ratio or Capacity Factor.Statistical indicators of the relationships between ground-and satellite-based indicators are appropriate metrics for this investigation.These detailed calculations enable a comprehensive analysis of the PV system's performance under different environmental conditions.The resulting data are crucial for optimizing system configurations and enhancing overall efficiency.

Comparison Using Statistical Analysis
Various statistical methods and criteria were employed in this investigation to gauge the accuracy of GHI measurements.These methods included root mean square error (RMSE), coefficient of correlation (R), coefficient of determination (R 2 ), mean absolute error (MAE), mean absolute percentage error (MAPE), and mean bias error (MBE).The mathematical expressions (1)-( 6) were utilized to calculate monthly and annual means for GHI.Here, "a n " represents the measured GHI values, "b n " represents the satellite-derived GHI values, and "b n " denotes the mean of the satellite GHI values.
Lower values of RMSE (Equation ( 1)) indicate a better agreement between ground and satellite values [36].The coefficient of determination (R) is represented by Equation (2).R values, ranging from −1 to +1, indicate the degree of linearity between the datasets.Values closer to +1 or −1 indicate a strong correlation or anticorrelation, respectively, while values close to 0 indicate a weak correlation [37].Furthermore, R 2 , Equation (3), measures the goodness of fit of a regression model to the data.Ranging between 0 and 1, a higher R 2 value (closer to 1) indicates a better fit of satellite values to ground measurements [38].The MAE (Equation ( 4)) indicates the average absolute deviation between satellite and ground values [39].The MAPE (Equation ( 5)) expresses the relative difference between ground and satellite models as a percentage with lower percentages indicating a closer match [40].The MBE (Equation ( 6)) assessed the accuracy by comparing ground measurements with the satellite-derived database to determine any bias between the two datasets [41].These statistical measures collectively formed a robust framework for not only evaluating the similarity of the data but also for understanding the relationships between variables, measuring the goodness of fit and ensuring the alignment of predictions with actual data.This comprehensive approach enabled a thorough analysis and validation of the models employed in this research.

Global Horizontal Irradiance Comparisons
GHI values obtained from PVGIS-SARAH2 and ground-based measurements at six locations in Ireland were compared.The values are expressed as "errors" meaning the difference between ground-based values and corresponding satellite-derived values evalu-ated across various temporal averaging windows, including hourly, daily, monthly, and annual averages.The daily average GHI was calculated over a year, as well as MAE and correlation coefficients (R values) for each temporal scale.Additionally, the study simulated the energy production of a solar PV system by employing Python to predict the energy output based on GHI data from both ground-based and PVGIS-SARAH2 sources.MAE and R values for the simulation results are presented to provide a clearer picture of the comparative performance of solar PV systems based on different irradiance data sources.This comprehensive comparison highlights the strengths and limitations of each data source, offering valuable insights for optimizing PV system designs.By understanding these differences, stakeholders can make more informed decisions regarding the selection of solar irradiance data for their specific applications.
Figure 4 compares the daily and monthly averages of GHI in 2011 between the groundbased and satellite databases for six locations.Upon observing the graphs, it is challenging to ascertain the absolute accuracy of values, as differences emerge between the groundbased measurements and the satellite database.Further analysis reveals noticeable differences between them.To evaluate the accuracy of GHI measurements in comparison to ground-based records and satellite data, daily averages, monthly averages, and annual averages were examined across all locations.Table 3 presents both the highest and lowest examples of errors in daily mean GHI.As an illustrative example, in the city of Dublin, on 25 May (day 145), the satellite database reported a daily average GHI value of 104.12 W/m 2 , while the ground station recorded a notably higher daily average of 244.9 W/m 2 , resulting in the highest error of 140.83 W/m 2 .Much better accuracy was observed at the beginning of April, specifically on day 93, where the ground station registered a daily average GHI value of 154.97 W/m 2 , closely aligned with the satellite value of 154.91 W/m 2 .These disparities in recorded GHI values between the ground station and satellite data underscore the importance of validating solar irradiance data.The highest mean errors in the table may not necessarily represent frequent daily discrepancies, as these values were unique to specific days, influenced by various factors, including potential inaccuracies in the ground-measured hourly data recorded for those days.
Table 3 displays the highest and lowest errors in mean daily GHI for the six study locations.Variations in daily errors highlight the temporal and spatial fluctuations in solar irradiance prediction accuracy.Table 4 presents a comparison of the highest and lowest errors in monthly mean GHI for various locations.The errors were measured in Watts per square meter (W/m 2 ) and varied across different months and locations.The highest errors were observed in specific months, such as July for Dublin and June for Waterford, while Dundalk had the highest error of 21.54 W/m 2 in April.Similarly, Mayo and Limerick had their highest errors in August and July, respectively.On the other hand, Dublin had the smallest error of 0.07 W/m 2 in June, followed by Mayo, Dundalk, and Waterford.Wexford and Limerick also had their lowest errors in December.This analysis highlights the seasonal differences between the datasets across multiple locations.Figure 4 illustrates a comparison of daily and monthly average GHI values obtained from both ground-based measurements and satellite data for the six study locations.The left subplots depict the daily average GHI values throughout the year 2011, illustrating the daily variations and patterns.On the right side, the subplots display the monthly average GHI values, offering a broader view of the seasonal trends and disparities in irradiance between the two data sources.The comparison reveals that, although there is a general agreement between the satellite-and ground-based data, there are noticeable discrepancies, especially in the daily averages.These differences emphasize the significance of validating satellite-derived GHI data with ground-based measurements to ensure precise solar energy resource assessment and PV system design.
Figure 5 presents a monthly average comparison of simulations for GHI using groundmeasured and satellite-recorded GHI values for 2011 in Dublin.The majority of groundmeasured GHI values were found to be higher than the corresponding satellite records, explaining why the average power output calculated from ground measurements was greater than that from satellite records, as shown in Figure 5. Differences observed in the daily averages of GHI when comparing satellite-and ground-based measurements can be attributed to several factors.Ground measurements were taken directly at the Earth's surface, while satellite data were recorded remotely.Variations in atmospheric conditions, such as clouds and aerosols, impact ground measurements differently than satellite data, which consolidates information across broader atmospheric columns [42].Conversely, instrument calibration for GHI measuring devices could affect the accuracy of ground measurements, resulting in differences between them.
Figure 6 shows daily average GHI values for a sample month of March 2011, for ground-based measurements and satellite data.The graph illustrates the daily variations in GHI for both data sources throughout the month.The blue line signifies the ground-based GHI measurements, while the orange line signifies the satellite-derived GHI values.Both datasets exhibit similar trends, with peaks and troughs occurring at corresponding times, indicating a general agreement between the ground and satellite measurements.Nonetheless, some noticeable discrepancies exist, where the satellite data either underestimate or overestimate the GHI in comparison to the ground-based measurements.These differences underscore the significance of validating satellite-derived data with ground-based obser-vations to enhance the accuracy of solar irradiance assessments and ensure dependable performance predictions for solar PV systems.
records, explaining why the average power output calculated from ground measurements was greater than that from satellite records, as shown in Figure 5. Differences observed in the daily averages of GHI when comparing satellite-and ground-based measurements can be attributed to several factors.Ground measurements were taken directly at the Earth's surface, while satellite data were recorded remotely.Variations in atmospheric conditions, such as clouds and aerosols, impact ground measurements differently than satellite data, which consolidates information across broader atmospheric columns [42].Conversely, instrument calibration for GHI measuring devices could affect the accuracy of ground measurements, resulting in differences between them.Both datasets exhibit similar trends, with peaks and troughs occurring at corresponding times, indicating a general agreement between the ground and satellite measurements.Nonetheless, some noticeable discrepancies exist, where the satellite data either underestimate or overestimate the GHI in comparison to the ground-based measurements.These differences underscore the significance of validating satellite-derived data with groundbased observations to enhance the accuracy of solar irradiance assessments and ensure dependable performance predictions for solar PV systems.Appendix A presents monthly statistical error calculations of GHI at six sites, comparing ground-based data with satellite data.The objective is to assess the performance of the satellite data in relation to the ground-based measurements on a monthly basis.Upon comparing the data, a strong correlation coefficient is observed between the ground-based and satellite data, reaching 0.99 in some instances, such as in November at Waterford and Dundalk.The strong correlation coefficient signifies that the satellite data closely mirror the ground-based data, implying a reliable and accurate source of information for applications that only require monthly means.However, a weaker correlation coefficient is evident in certain months, such as April in Dundalk, where the value drops to 0.66.This lower value suggests greater variability or potential discrepancies between the measurements.A noticeable disparity in the percentage error between the ground-based and sat- Appendix A presents monthly statistical error calculations of GHI at six sites, comparing ground-based data with satellite data.The objective is to assess the performance of the satellite data in relation to the ground-based measurements on a monthly basis.Upon comparing the data, a strong correlation coefficient is observed between the ground-based and satellite data, reaching 0.99 in some instances, such as in November at Waterford and Dundalk.The strong correlation coefficient signifies that the satellite data closely mirror the ground-based data, implying a reliable and accurate source of information for applications that only require monthly means.However, a weaker correlation coefficient is evident in certain months, such as April in Dundalk, where the value drops to 0.66.This lower value suggests greater variability or potential discrepancies between the measurements.A noticeable disparity in the percentage error between the ground-based and satellite data is observed, which can be quantified using the MAPE, as defined in Equation (6).
By examining the Dublin, Waterford, Dundalk, and Mayo locations, which possess accurate ground-based data, it is essential to note that while MAPE values are calculated for reference, they may not be the most suitable metric for evaluating irradiance measurements.MAPE typically considers the peak electrical output as its denominator and might not provide the most meaningful insights when assessing solar irradiance.Therefore, it is advisable to primarily use other metrics for evaluating irradiance data.However, MAPE values ranging from 6.71% to 31.68% are presented for these locations, indicating the relative accuracy of the satellite data across different sites and months.A lower MAPE still suggests a closer alignment between the ground-based and satellite data, which could potentially translate to a smaller margin of error in the satellite measurements.
Typically, satellite datasets more closely match ground-based data when analyzed on a monthly rather than a daily basis.This is primarily attributed to the fluctuating nature of satellite data, which tends to display higher or lower values during a single day compared to the ground-measured data.Upon conducting an annual comparison in Appendix B and Figure 7, it becomes evident that there is a lower level of error in comparison to the monthly averages.The MAPE for the four locations with highly accurate ground records (Dublin, Waterford, Dundalk, and Mayo) falls within a range of 2.68% to 7.70%.This indicates that the predictions are relatively accurate when considering the annual data.It is worth noting that a relatively strong positive linear relationship exists between the datasets.This is substantiated by a correlation coefficient exceeding 0.98 across all cases.Moreover, the RMSE exhibits a range of 0.45 to 5.71 W/m 2 .The relatively low RMSE values indicate that the comparison has a very minimal level of error, further supporting the overall accuracy of the satellite on an annual basis and reinforcing its potential for use for annual energy resource assessments.It is worth noting that some MAPE values may appear very high, indicative of small values in the denominator of the formula.Additionally, the inclusion of bias as a performance metric provides insights into the mean difference between the measured and satellite-derived GHI values.

Output Power Comparison
The performance of a simulation model was assessed using PVLib-python packages to simulate the power output of PV arrays based on ground-based and satellite-based data.The simulated power output for both satellite and ground databases was calculated for all six locations and analyzed comparatively to gauge the accuracy of satellite data.In addition, the ground-based metered power measurements were compared with the ground power values simulated using the PVLib model to validate its accuracy.The findings revealed patterns in seasonal variations of solar power results from the two datasets.Winter months exhibited higher levels of power output simulated with ground-based data compared to satellite data, while the opposite trend was observed during summer months.These seasonal discrepancies could be attributed to the differences in spatial resolution and data acquisition methods between satellite-and ground-based measurements.(new) Further investigation into these factors is necessary to understand the underlying causes of these variations and improve the accuracy of solar power simulations.
This study suggests that these observations warrant further exploration into the factors that influence solar irradiance and energy production.Identifying and mitigating the sources of error in both data types can lead to more reliable and consistent power output predictions, enhancing the overall performance of solar PV systems.This research emphasizes the need for continuous refinement of simulation models and data integration techniques to achieve optimal results in solar energy production.Figure 8 and Table 5 present the simulated power for ground-and satellite-based measurements in Dublin.This research emphasizes the need for continuous refinement of simulation models and data integration techniques to achieve optimal results in solar energy production.
(Dublin, Waterford, Dundalk, and Mayo) falls within a range of 2.68% to 7.70%.This indicates that the predictions are relatively accurate when considering the annual data.It is worth noting that a relatively strong positive linear relationship exists between the datasets.This is substantiated by a correlation coefficient exceeding 0.98 across all cases.Moreover, the RMSE exhibits a range of 0.45 to 5.71 W/m 2 .The relatively low RMSE values indicate that the comparison has a very minimal level of error, further supporting the overall accuracy of the satellite on an annual basis and reinforcing its potential for use for annual energy resource assessments.It is worth noting that some MAPE values may appear very high, indicative of small values in the denominator of the formula.Additionally, the inclusion of bias as a performance metric provides insights into the mean difference between the measured and satellite-derived GHI values.

Output Power Comparison
The performance of a simulation model was assessed using PVLib-python packages to simulate the power output of PV arrays based on ground-based and satellite-based data.The simulated power output for both satellite and ground databases was calculated for all six locations and analyzed comparatively to gauge the accuracy of satellite data.In addition, the ground-based metered power measurements were compared with the ground power values simulated using the PVLib model to validate its accuracy.The findings revealed patterns in seasonal variations of solar power results from the two datasets.Winter months exhibited higher levels of power output simulated with ground-based data compared to satellite data, while the opposite trend was observed during summer months.These seasonal discrepancies could be attributed to the differences in spatial resolution and data acquisition methods between satellite-and ground-based measurements.(new) Further investigation into these factors is necessary to understand the underlying causes of these variations and improve the accuracy of solar power simulations.
This study suggests that these observations warrant further exploration into the factors that influence solar irradiance and energy production.Identifying and mitigating the sources of error in both data types can lead to more reliable and consistent power output predictions, enhancing the overall performance of solar PV systems.This research emphasizes the need for continuous refinement of simulation models and data integration techniques to achieve optimal results in solar energy production.Figure 8 and Table 5 present the simulated power for ground-and satellite-based measurements in Dublin.This research emphasizes the need for continuous refinement of simulation models and data integration techniques to achieve optimal results in solar energy production.The graph in Figure 8 displays the monthly average power output for a 1.3 kW solar PV system in Dublin over the course of 2011.The graph compares three datasets: ground-based measurements, ground-based simulated values, and satellite-simulated values.The blue line represents the actual ground-based power measurements, the orange line shows the simulated power output based on ground-based GHI (Global Horizontal Irradiance) data, and the gray line indicates the simulated power output based on satellite-derived GHI data.The patterns across all three datasets are similar, with a peak in power output during the summer months (May through August) and a decline during the winter months (November through February).The close alignment of the ground-based measurements with both simulated datasets indicates that the PVLib-python model effectively captures the seasonal variations in solar power output.However, minor discrepancies are visible, particularly in the transition months, suggesting that while satellite data can provide reasonable estimates, ground-based data offer slightly more precise power output predictions.This comparison highlights the importance of utilizing accurate ground-based measurements for solar PV system simulations to ensure reliable performance predictions.
In Dublin, within the given scenario, the statistical analysis monthly, as calculated in Table 6, reveals a high correlation coefficient (R) of 0.999 and a corresponding coefficient of determination (R 2 ) of 0.999 between the ground-based reference data and the groundsimulated data.The high R 2 value, which explains that nearly 99.9% of the variance in the ground-based data can be accounted for by the ground-simulated data, indicates that the ground-based measurements were of good quality and that the modeling framework effectively captures the underlying factors influencing solar power generation on the ground.This level of explanatory power is a critical indicator of the model's accuracy.Furthermore, the low mean absolute percentage error (MAPE) of 2.1% signifies that the ground-simulated data's absolute percentage error, on average, is minimal when compared to the metered power measurements.This demonstrates that the ground-simulated data closely approximate the actual ground-based measurements if monthly average values are required and validates the modeling approach.
In practical terms, these results indicate that PVLib is a highly accurate tool for predicting and simulating ground-based solar power output.This level of accuracy can be immensely valuable for a range of applications, such as solar energy system design, performance evaluation, and forecasting.Decision-makers and researchers can have confidence in the model's ability to provide reliable estimates of solar power generation, allowing for more informed and efficient solar energy-related decisions.While there is a substantial positive correlation between the satellite-simulated and ground-based power outputs (R = 0.996, R 2 = 0.993), the slightly higher mean absolute percentage error (MAPE = 4.2%) for the satellite data compared to the ground-based data (MAPE = 2.1%) suggests that using ground-based data to perform simulations of power output is more accurate.This reinforces the notion that ground-based GHI data prove to be the most dependable source for solar power output measurements, especially when precise and trustworthy data records are available, as demonstrated in the cases of four locations (Dublin, Waterford, Dundalk, and Mayo).However, this does not apply to Limerick and Wexford due to accuracy issues with their data records, as indicated in the SEAI report.The report highlights that pyranometer data at the Limerick site were unreliable, while at the Wexford site, the presence of tall trees causing over-shading resulted in inaccurate records.These site-specific issues underline the necessity of maintaining high-quality measurement equipment and site conditions to ensure data reliability.When such conditions are not met, satellite databases can provide a useful alternative for solar energy assessments, ensuring continuous and reliable data coverage.
reference) data compared to ground-simulated data, as opposed to ground-based (reference) data compared to satellite-simulated data.This suggests that ground-simulated data closely match the reference measurements, underscoring the superior accuracy and dependability of ground-based data for simulating solar power output.

Discussion
The findings of this study underscore the important role of the accuracy of data sources in predicting solar power output, critical for various applications in solar energy utilization.The comparison between ground-based measurements and satellite-derived data reveals nuances in their accuracy and reliability across different temporal scales and geographical locations.The analysis indicates that while daily assessments of satellite data may exhibit considerable fluctuations, when values are presented as monthly averages, the underlying variability is smoothed out and apparent errors appear much lower.This smoothing effect is beneficial for long-term planning and performance assessment, as it provides a more stable and reliable basis for decision-making.Understanding these temporal dynamics is essential for effectively utilizing satellite data in solar energy projects.
The study highlights the significance of ground-based GHI data, especially in locations with precise and trustworthy records, such as Dublin, Waterford, Dundalk, and Mayo.However, caution is warranted when relying solely on ground-based data, as evidenced by the accuracy issues in Limerick and Wexford, where unreliable or problematic records were identified.Including these two sites was specifically intended to assess how the satellite data would behave when faced with missing or erroneous recorded data.The performance metrics, such as MAPE and RMSE, highlight the overall accuracy of the predictions.MAPE values within the range of 2.68% to 7.70% for locations with reliable ground records signify acceptable accuracy.The correlation analysis indicates a robust positive linear relationship between the datasets, reinforcing the association between ground-based measurements and satellite-derived data.However, the slightly higher MAPE for the satellite data suggests that, overall, using ground-based data for simulating power output proves to be more accurate.These findings emphasize the need for high-quality, reliable ground-based data to achieve the most accurate solar power predictions.Where such data are unavailable, satellite data can still provide valuable insights, albeit with slightly reduced precision.
The utilization of PVLib-python packages for simulations revealed a notable correlation between ground-based reference data and ground-simulated power outputs across the prior locations, particularly if coarse time resolution data such as monthly averages are sufficient for the end user.For example, in Dublin, good agreement was observed between the model outputs and metered outputs of the PV system, with R = 0.999, R 2 = 0.999, and a low mean absolute percentage error (MAPE = 2.1%).This high level of accuracy is crucial for solar energy system design, performance evaluation, and forecasting, instilling confidence in the reliability of the PVLib model.Satellite-simulated data also exhibited a positive correlation (R = 0.996, R 2 = 0.993) with ground-based power outputs, although the slightly higher MAPE (4.2%) indicates a slightly less accurate estimation compared to the ground-based simulations.Despite their marginally higher MAPE, satellite databases provide a more reliable option in areas where ground-based data quality is compromised or unavailable, as demonstrated in various scenarios.These results highlight the versatility and robustness of the PVLib-python tool in different data contexts, making it a valuable asset for solar energy research and applications.The ability to use both ground-based and satellite data effectively broadens the applicability of the PVLib-python tool across diverse geographic and environmental conditions.
This study also highlights the limitations of ground-based data in instances where unreliable pyranometer data and shading from tall trees resulted in inaccuracies.In such cases, satellite databases emerge as potential alternatives for solar energy assessment.Indeed, accurate data sources are crucial when assessing solar energy.While ground-based measurements, when reliable, remain the gold standard for precise estimations, the PVLib model showcases commendable accuracy in replicating ground-based solar power outputs.Moreover, the study emphasizes the feasibility of utilizing satellite databases as viable alternatives in situations where ground-based data reliability is compromised.This dual approach ensures continuity and reliability in solar power output predictions, even in the face of data quality challenges.Future work should focus on integrating and refining these data sources to enhance overall accuracy and reliability in solar energy assessments.
By incorporating detailed technical characteristics of the instruments used for terrestrial measurements, explaining the high quality and reliability of terrestrial meteorological data for the selected period, and including a comparative analysis of different software tools, this study provides a comprehensive evaluation of data sources.This ensures a clear understanding of their respective advantages and limitations, thereby guiding the effective design and optimization of solar PV systems.

Conclusions
This study aimed to evaluate the reliability and suitability of satellite-derived solar irradiance data for designing optimal solar PV systems in northern latitudes.A comprehensive assessment was conducted using the PVGIS-SARAH2 database for the years 2011 and 2012.The study validated satellite databases against ground-based measurements of solar irradiance across six specific locations in Ireland, particularly focusing on Global Horizontal Irradiance as a key parameter.The analyses conducted involved daily, monthly, and annual comparisons between satellite-derived and ground-based data.Statistical measures such as

Figure 2 .
Figure 2. Flowchart of the method used in this study.

Figure 2 .
Figure 2. Flowchart of the method used in this study.

2 )
Apr May Jun July Aug Sep Oct Nov Dec Monthly Averages GHI (W/m

Figure 4 . 2 )
Figure 4. Daily and monthly averages of GHI for ground-based and satellite records.

Figure 4 .
Figure 4. Daily and monthly averages of GHI for ground-based and satellite records.

Figure 5 .
Figure 5. Monthly power output comparison for a 1.3 kW PV system based on simulation for ground-measured and satellite-database.

Figure 6
Figure 6 shows daily average GHI values for a sample month of March 2011, for ground-based measurements and satellite data.The graph illustrates the daily variations in GHI for both data sources throughout the month.The blue line signifies the groundbased GHI measurements, while the orange line signifies the satellite-derived GHI values.Both datasets exhibit similar trends, with peaks and troughs occurring at corresponding times, indicating a general agreement between the ground and satellite measurements.Nonetheless, some noticeable discrepancies exist, where the satellite data either underestimate or overestimate the GHI in comparison to the ground-based measurements.These differences underscore the significance of validating satellite-derived data with groundbased observations to enhance the accuracy of solar irradiance assessments and ensure dependable performance predictions for solar PV systems.

Figure 5 .
Figure 5. Monthly power output comparison for a 1.3 kW PV system based on simulation for ground-measured and satellite-database.Sustainability 2024, 16, x FOR PEER REVIEW 18 of 29

Figure 6 .
Figure 6.Daily average of GHI in March 2011.

Figure 6 .
Figure 6.Daily average of GHI in March 2011.

Figure 7 .
Figure 7.Comparison of annual averages of GHI from satellite-and ground-based measurements.

Figure 7 .
Figure 7.Comparison of annual averages of GHI from satellite-and ground-based measurements.

Figure 8 .
Figure 8. Monthly power output comparison for a 1.3 kW system in Dublin.

Figure 8 .
Figure 8. Monthly power output comparison for a 1.3 kW system in Dublin.

Table 1 .
Inventory of solar resource data source.

Table 1 .
Inventory of solar resource data source.

Starting Period Ending Period Model/ Meas- ured Underline Data Source Time Resolu- tion Spatial Coverage Spatial Resolution Calcula- tion Method DHI GHI DNI Ambi- ent Temp.
This dataset, provided by NASA, covers the period from 1 January 1981 to 31 December 2020 and contains 40 years of Typical Meteorological Year (TMY) data.The data have a 30 min temporal resolution and are available at a spatial resolution of either 1 • × 1 • or 10 km × 10 km, with global coverage.The dataset includes DHI, GHI, DNI, and ambient temperature data, all accessible in CSV format at no cost.• NSRDB: The dataset, provided by NREL, spans from 1 January 2017 to 31 December 2019 and contains hourly and half-hourly meteorological data.It encompasses longitudes 25 • W to 175 • W and latitudes 20 • S to 60 • N, with a spatial resolution of 4 × 4 km.These freely available data include DHI, GHI, DNI, and ambient temperature.• Solcast: The dataset provided by Solcast covers the period from 1 January 2007 to 26 November 2021 and provides half-hourly meteorological data on a global scale with a spatial resolution of 1-2 km.The dataset includes DHI, GHI, DNI, and ambient temperature.It is available for free in CSV format.
• NASA/POWER CERES/MERRA2: • PVGIS-SARAH: The provided dataset is from PVGIS and spans from 1 January 2005 to 31 December 2016.It offers hourly data for Europe, Africa, Asia, and parts of South America, with a spatial resolution of 5 km.The data include DNI, GHI, and ambient temperature and are accessible in CSV format at no cost.• PVGIS-CMSAF: Another dataset sourced from PVGIS, covering the period from 2 January 2007 to 1 January 2017.It offers hourly data for Europe, Africa, Asia, and portions of South America, with a spatial resolution of 2.5 km.The dataset encompasses DHI, GHI, and ambient temperature, and is available in CSV format at no cost.• PVGIS-ERAS: The dataset, sourced from PVGIS, spans from 1 January 2005 to 31 December 2016, and offers hourly information for Europe at a spatial resolution of 25 km.It encompasses DHI, GHI, and ambient temperature.This dataset is freely accessible in CSV format.• PVGIS-COSMO: The dataset provided by PVGIS covers the period from 2 January 2005 to 1 January 2015 and offers hourly data for Europe at a spatial resolution of 5 km.The dataset includes DHI, GHI, and ambient temperature, all available for free in CSV format.• HelioClim-3 Solar Radiation: Supplied by SoDa, the dataset covers the period from 1 February 2004 to 31 December 2006.It offers 15 min data spanning from −66 • to +66 • latitude and longitude, with a spatial resolution of 3 km.The dataset includes DNI, GHI, DHI, and ambient temperature.Access to the data is free during certain periods and paid during others.• Copernicus Atmospheric Monitoring Service (CAMS) McClear: SoDa provides 1-min data coverage from 1 February 2004 to 6 February 2022, encompassing Europe, Africa, the Atlantic Ocean, and the Middle East.The data include DHI, GHI, DNI, and ambient temperature and are available in CSV format.The spatial resolution varies, and the data are freely accessible.

Table 3 .
Highest and lowest differences between ground-measured and satellite mean daily GHI for six locations.

Table 4 .
Highest and lowest differences in monthly mean GHI between the two data sources for six locations.

Table 5 .
Monthly average power output comparison for a 1.3 kW system in Dublin.

Table 5 .
Monthly average power output comparison for a 1.3 kW system in Dublin.

Table 6 .
Monthly statistical models analysis.

Table 6
contains a comprehensive statistical analysis of the monthly power output data for six different locations in Ireland, namely Dublin, Waterford, Dundalk, Mayo, Limerick, and Wexford.The table compares ground-based reference measurements with satellite-simulated values, as well as ground-based reference measurements with groundsimulated values.The results show a stronger correlation and lower MAPE for groundbased