Generation of Data-Driven Expected Energy Models for Photovoltaic Systems

: Although unique expected energy models can be generated for a given photovoltaic (PV) site, a standardized model is also needed to facilitate performance comparisons across ﬂeets. Current standardized expected energy models for PV work well with sparse data, but they have demonstrated signiﬁcant over-estimations, which impacts accurate diagnoses of ﬁeld operations and maintenance issues. This research addresses this issue by using machine learning to develop a data-driven expected energy model that can more accurately generate inferences for energy production of PV systems. Irradiance and system capacity information was used from 172 sites across the United States to train a series of models using Lasso linear regression. The trained models generally perform better than the commonly used expected energy model from international standard (IEC 61724-1), with the two highest performing models ranging in model complexity from a third-order polynomial with 10 parameters ( R 2 adj = 0.994) to a simpler, second-order polynomial with 4 parameters ( R 2 adj = 0.993), the latter of which is subject to further evaluation. Subsequently, the trained models provide a more robust basis for identifying potential energy anomalies for operations and maintenance activities as well as informing planning-related ﬁnancial assessments. We conclude with directions for future research, such as using splines to improve model continuity and better capture systems with low ( ≤ 1000 kW DC) capacity.


Introduction
The increasing penetration of photovoltaic (PV) systems within the energy markets has established the need for evaluating and ensuring high system reliability. In particular, a large emphasis has been placed on monitoring algorithms that can contextualize observed energy generation at a site with information about how the system would have performed in a nominal state [1]. The latter are commonly estimated through expected energy models. Expected energy models are incorporated into many PV performance monitoring tasks, including anomaly detection [2][3][4][5], financial planning [6], fleet-level (site vs. site) comparisons [7], degradation analysis [7], and the evaluation of extreme weather effects [8]. The comparison of observed energy values to those derived from expected energy models serves as the basis for informing both tactical (i.e., short-term tasks such as field repair) and strategic (i.e., long-term activities such as site planning) operations and maintenance (O&M) activities.
Expected energy models can vary from asset-level to site-level estimates [9]. Assetlevel models typically focus on using parameters provided by the manufacturer (e.g., maximum power) [9,10]. However, such approaches do not always work well for in-field performance since the parameters were developed in standardized test conditions and thus do not reflect operational conditions [11]. In response to these limitations, empirical methods that use field observations and regression methods have emerged to derive parameters across non-standardized test conditions (e.g., [12,13]). At the site-level, most expected energy models leverage the correlation between power production and meteorological covariates [14,15]. For example, the standard expected energy model from the International Electrotechnical Commission (IEC) uses irradiance and site capacity information to develop an expected energy estimate [15]. Similarly, the PVUSA model trains a regression model for a given site by estimating power production using local irradiance, temperature, and wind speed conditions [16]. Industry research shows that most expected energy estimates tend to be overestimate production by a median of 3% but could be up to 20% [17]. Although the mismatch between observed and expected generation are well-recognized [18], limited attention has been given to date for improving the accuracy of expected energy models at the site-level, especially suited for fleet-level (i.e., site vs. site) comparisons.
This work aims to address this knowledge gap by generating a standardized, interpretable data-driven expected energy model that can be used for fleet-level comparisons. Although gradient-boosted and neural network-based methods have demonstrated significant successes for output performance [19][20][21], they often lack in model interpretability. In particular, models with high complexity can hide prediction biases or other vulnerabilities [22]. Thus for this work, we opted for more interpretable, regression-based models to increase the transparency of the implemented methods. In addition to identifying a more robust alternate for expected energy modeling, the associated publication of code used for training models (in the open source software pvOps) enables the extension of these methods to develop site-specific expected energy models for PV systems anywhere in the world or to other renewable energy systems. Such advancements in expected energy model estimates are needed to continue supporting better planning and field O&M activities, both of which ultimately influence the sustainability of PV sites. The following sections describes the data processing and model construction activities (Section 2), the performance of trained models (Section 3), and summarize primary findings (Section 4).

Methodology
The data-driven expected energy model training activities were supported by Sandia National Laboratories' PV Reliability, Operations, and Maintenance (PVROM) database [23]. Information about the PVROM database, as well as data processing, model training, and model evaluations, are described in the following Sections.

Data
The PVROM database contains 1.3 million data points of hourly production data across 176 sites in the United States [23], spanning multiple states ( Figure 1) and generally ranging between 2017 and 2020. The database contains hourly measurements of expected energy in kiloWatt-hours (kWh), irradiance (Watts per square meter; W m 2 ), ambient temperature, and module temperature; site-level direct current (DC) capacity is provided by the industry partners. The DC capacity (C DC ) for the sites within the database span from 37.8 kilowatts (kW) to 130,000 kW; a majority of the sites (140) are under 10,000 kW, with 67 of those sites under 1000 kW. A subset of the sites (100) contain industry-partner-provided expected energy estimates generated from proprietary models; these values serve as a basis for model validation activities (see Section 2.5).

Preprocessing
Data quality issues stemming from measurement errors and system anomalous conditions (reflecting local field failures, such as communication loss) could introduce signal variations in field data that would hinder model performance. Problematic data convolute the relationships between features, making it more difficult to measure the true parameter estimates; these potential irreducible errors are decreased through numerous data quality filters ( Figure 2). Missing values (i.e., NaN or None values) were removed prior to applying data quality filters. An evaluation of these missing values revealed that a majority of them (~88%) occurred during nighttime hours (~7 p.m. to 8 a.m.), indicating that some sites captured night-time entries as null ( Figure 2). After removing these missing values,~900 K data points remained, which were then subject to a series of data quality filtering steps. Data preprocessing activities included both data quality-and anomaly-related filters. Data quality filters were conducted independently; only data points that passed all quality-based filters were subject to the anomaly-based filters.
Data were filtered to ensure they are within nominal sensor ranges, using thresholds following [24] and the IEC 61724-1 standard [15]. Specifically, we retained data that met the following criteria: Energy (E) > 0 kWh; • Ambient temperature (T amb ) ≤ 50 • C and module temperature (T mod ) ≤ 90 • C.
Wind speed was not consistently available from partners and thus was excluded from analysis. Although available temperature data were used in the preprocessing steps, they are not used as a predictor variable in the regression models, since they ate not included in current standard models [15].
Flatlining values-determined by periods where consecutive data changed by less than a threshold-were flagged for removal using the pecos package [25], which follows the IEC 61724-3 standard [26]. Specifically, four consecutive hours with either ∆E < 0.01% of the site's capacity or ∆I < 10 W m 2 were filtered. Lastly, inverter clipping, which occurs when the DC energy surpasses an inverter's DC energy rating, was addressed by mathematically observing plateaus in the energy signal using the pvanalytics package [27]. Dropping energy measurements during inverter clipping, which manifest as a static value across high irradiance levels, would create a better linear fit. After data quality checks, 429 K data points across 150 sites remained ( Figure 2).
Data points that passed all quality checks were also assessed for system-level anomalies. These anomalies likely reflect abnormal operating conditions (i.e., local failures) and thus require removal to ensure the trained baseline energy models reflect nominal system performance. Anomalous entries were detected using a comparison of observed energy to irradiance and site capacities ( Figure 3). The comparison of observed energy and irradiance filter focuses on removing data where the E-I ratio (λ) is outside its nominal distribution by 3 standard deviations {λ : λ < µ λ − 3σ λ ∪ λ > µ λ + 3σ λ }, where µ λ and σ λ are the mean and standard deviation of the E-I ratio, respectively [28]. This filter was implemented for each site separately to capture site-specific variations (including system capacity) and resulted in the removal of 70 K data points ( Figure 2). The second system anomaly filter focused on removing sites with mismatches between observed energy and site capacity. Namely, if a site's maximum recorded energy was over 1.2 × C DC or under 0.7 × C DC , then all data points for that site were excluded from subsequent analysis. This method filtered 23 sites; 50%+ of these sites were under 1000 kW, and only 1 was over 10,000 kW. Approximately 26 K data points were removed with this filter, resulting in a final dataset that contained 332 K data points across 127 sites for model training and testing activities ( Figure 2). The age of the sites within the final dataset ranged from newly installed sites up to 10 years, with a majority being less than 5 years in age ( Figure A2). Anomalous data points (visualized as Xs) are often lower than non-anomalous values within the distribution-derived bands (red lines).

Variable Standardization
The specific inputs used for model training mimic commonly available parameters used in current expected energy models (e.g., [15]), such as irradiance and site capacity. However, with covariates at different scales (e.g., {0 W m 2 < I < 1.2 × 10 3 W m 2 } while {1 × 10 2 kW< C DC < 1.3 × 10 5 kW}), variable standardization is required to reduce model sensitivity to parameter scales. In particular, without standardization, weights generated for each parameter are more likely to reflect scalar nuances rather than the relative importance of the parameter to the outcome of interest. Variable standardization centers the data by subtracting data points in a feature from its associated mean value (µ) and then scales the data by dividing by the associated standard deviation (σ)-i.e., Z = µ−μ σ X . The resulting standardized variables have a mean of zero and a standard deviation of one. This process makes parameters easier to rank in terms of influence; the variable with the larger parameter holds a more important effect on the output response. Thus, variable standardization also aids in the interpretability of the derived parameters, especially when variable interactions (e.g., I × C DC ) are introduced. The mean and standard deviation parameters used to standardize irradiance, capacity, and energy values are captured in Table 1.

Model Design and Training
Similar to other machine learning models, regression techniques leverage input data to learn relationships and use those relationships to predict unseen quantities. These relationships are generally contained in model parameters (β), which map predictors, as summarized in a design matrix X, to an outputŶ = Xβ + with residual model error . Many different regression techniques exist; these techniques typically vary in the structure of the cost function, which quantifies the error between predicted and expected values. This cost function (C) is usually captured as a summation of loss functions (calculated on each data point) across the training set. The setβ, which renders the smallest cost, is defined as the learned parameters, mathematically notated as: A popular regression model is the ordinary least squares (OLS), which defines its best model (β OLS = arg min θ SSE) with an objective function equal to the sum of squared errors (SSE): where n is the number of samples, p is the number of predictors, and x ij is the i th value for the j th explanatory variable. As shown in the equation, the SSE sums the squared difference between each sample (y) and its associated model estimate (ŷ). High emphasis is naturally placed on reducing high-error samples. Therefore, outliers can have a large effect on the learned parameters, so data preprocessing steps are required for robust model development. Additionally, OLS renders non-zero coefficients on allβ, which can create small, insubstantial parameters which are likely components of the training dataset and therefore contribute to model overfitting and thus should be removed from the model. Alternate approaches to OLS include the Theil-Sen regressor [29], which is robust against outliers since it chooses the median of the slopes of all lines between pairs of points, as well as techniques such as Lasso regression [30] that explicitly address model overfitting by reducing model complexity (i.e., the number of parameters used). For this analysis, the latter was selected since Lasso regression models are able to incorporate both parameter regularization and residual sum of squares into the loss function. The cost function for Lasso regressionβ lasso = arg min θ (SSE + α ∑ p j=1 |β j |) incorporates an L1 regularization term α ∑ p j=1 |β j |, which penalizes the magnitude of the β terms. This penalization tends to shrink coefficients to zero, rendering a more parsimonious model; we use an α = 0.003 for defining the impact of the regularization on the regression kernel. Specifically, the penalization acts as a bias, which in turn can reduce overall error due to the bias-variance tradeoff [31].
Standardized variables are passed into Lasso regression to learn a linear model, which relates the input variables to energy. Multiple combinations of input variables were used to train the regression models (more details below). For all models, a randomized (80-20%) split is utilized to partition the preprocessed, standardized data into train and test partitions, respectively.
In addition to individual parameter influences, interactions and temporal factors were incorporated as input features to capture nuances within the datasets. Interaction parameters, which allow the effect of one parameter on the response variable to be weighted by the value of another variable, are introduced by including terms which are the product of two or more predictor variables. For example, Figure 4 shows that the relationship between E and I does vary across C DC . Thus, the inclusion of an I and C DC interaction term may be helpful in predicting the generated energy. The suite of interaction combinations are instantiated using polynomial models up to the third order (i.e., degree d = 3). In a model with d = 2 and 2 covariates, the initiated regression model would take the following form: Notice that a d = 2 also includes d = 1 parameters (i.e., β 4 x 1 and β 5 x 2 ). This remains true for all values of the polynomial power (e.g., for a model initiated with d = 3, terms from d = 2 and d = 1 are also included). Two interaction polynomial orders are tested: a second-order (d = 2) and a third-order (d = 3) ( Table 2). The particular interaction noted above (I × C DC ) is captured in multiple models, including an additive model with a single interaction term (Table 2).
In addition to interactions, temporal factors are used to capture a variable's changing effect on the energy generated over time. For instance, the correlation between I and E changes over the course of the year due to spectral irradiance effects [32,33]. Therefore, allowing the model to capture time-variant nuances may be important for capturing such nonlinearities. Three temporal based conditions were explored: seasonal (four per year), monthly, and hourly. A model with two predictor variables and monthly temporal-based variable conditions would be instantiated as: where the a and b parameters are coefficients describing the effect of parameter x 1 and x 2 , respectively, when conditioned on a month of the year. For instance, a jan describes the effect of x 1 on the y response variable during the month of January. The indicator function 1 t masks the predictor variable to ensure it is within its timeframe. With the various combinations of interactions and temporal conditions, a total of 13 regression kernels were evaluated ( Table 2; see Appendix A for some of the mathematical formulations).
(a) (b) Figure 4. Correlation between energy production and irradiance for raw data (a) and preprocessed data (b) for different site capacities. Higher correlations in the preprocessed data indicate interaction between DC capacity and irradiance for energy production.
Polynomial with Time-weighted

Model Evaluation
Three metrics were used to evaluate the performance of the trained expected energy models: logarithmic root mean squared error (log RMSE), coefficient of determination (R 2 ), and percent error (δ). Both partner-provided expected energy values and those calculated by the leading standardized expected energy model (i.e., IEC 61724) were used as reference values for model evaluations.
The root mean squared error (RMSE) is a common goodness-of-fit statistic used for model evaluation. The RMSE is expressed as: where y i andŷ i are the measured and predicted values of the response variable, and n is the number of samples. RMSE is in the same units as the response variable (i.e., kWh).
Lower RMSE values indicate a better, lower predicted error. Because the error can be quite large in magnitude (10 0 to 10 10 ), a logarithmic transform is applied to facilitate evaluations. Because the magnitude of the error is closely connected to a site's capacity, the log RMSE cannot be used to compare model performance between sites unless the sites are similar in size. The coefficient of determination (R 2 ), however, can be used to compare model performance across different site sizes. Specifically, R 2 is calculated as: whereȳ is the average of the y values. R 2 denotes the proportion of variability in the response explained by the model with a value of 1, indicating a perfect fit. R 2 was used to compare trained model outputs with partner-generated expected energy values, whose underlying model structures were unknown. Generally, however, R 2 is not well-suited for comparing models across varying numbers of parameters. Thus, when comparing the 13 trained models to one another, we utilize an adjusted R 2 adj metric, which checks whether the added parameters contribute to the explanation of the predictor variable and penalizes models with unnecessary complexity [34]. Low-effect parameters (i.e., β ≈ 0) reduce the model's overall fit score. The adjusted R 2 adj is calculated as follows: where n is the number of samples, and p is the number of predictors.
Finally, δ was used to capture the directionality of error (i.e., overprediction vs. underprediction): The log RMSE and R 2 were implemented to evaluate model performance at both site and fleet (i.e., across multiple sites) levels, while δ was only implemented at the fleet level; all metrics were reported on the test dataset. T-tests were used to evaluate significance in performance variations between the trained and reference values.

Results and Discussion
Data processing activities generally increased the correlations between the predictor variables (i.e., irradiance and capacity) and the response variable (i.e., energy) ( Table A1). The processed data were inputted into a total of 13 trained models-ranging in model complexity (pre-lasso) from 3 parameters for the 'simple additive' model pre-lasso to 151 parameters for the 'third-order-hour' (see Table 3). Generally, the number of parameters were lower for all models post-lasso fit, except for the 'simple additive' and 'additive interaction' models, likely indicating the already sparse construction of these models. Table 3. This table describes the parameterization and performance of all of the models evaluated in the results of this paper. Wins are summarized by showing the percentage of sites where a given model was the top performer according to the associated goodness-of-fit metric (Adj. R 2 or log RMSE); the IEC model was used as the reference value for log RMSE calculations. The thirdorder interactions model and basic model perform consistently well, a conclusion also found on the heatmaps ( Figure 5). Additionally, because lasso regression was leveraged, the models decrease in size after training the model.   Initially, the various models were trained using data across all system sizes. However, this approach demonstrated systemic underperformance for low-capacity systems (<1000 kW DC capacity). Specifically, the best trained models (i.e., 'third-order interactions' and 'additive interaction') outperformed the IEC model in terms of log RMSE when tested on every single system above 1300 kW DC capacity; however, 12 of 34 systems below a 1300 kW DC capacity underperformed relative to the IEC model. This result likely reflects the varying relationships between the site DC capacity and the energy generated; systems of higher capacity tend to receive a higher maximum energy generated per DC capacity ( Figure A1). To better deal with this varying linearity, two separate sets of models were trained: one for models under 1000 kW DC capacity and another over.
Across both high-capacity and low-capacity systems, models with the I × C DC interaction term perform better than those without the interaction term (i.e., 'hour', 'month', 'seasonal', and 'simple additive' models). For example, two of the top-performing models (across both high-capacity and low-capacity systems) are the 'simple additive' and the 'third-order interactions' models, both of which contain this interaction term (Table 3 and Figure 5). The 'additive interaction' trained (AIT) model has four parameters: (8) where i and c define the standardized irradiance and capacity variables, respectively, as defined in Table 1. The 'third-order interactions' model, on the other hand, contains these four terms as well as higher-order interactions (e.g., irradiance 2 × capacity, capacity 3 ). Although the variables within both of these two models are similar to the IEC standard, the inclusion of the interaction term, which highlights that the linear relationship between I and E is moderated by C DC (Figure 4), likely explains the superior performance of these models relative to that standard. The heatmaps of the log RMSE values highlight the evaluation metric's dependence on site capacity (Figure 5a,c), while the adjusted R 2 heatmaps show consistent performance across site capacity (Figure 5b,d). The vertical concentration of dark bars likely reflects data quality issues not addressed by data preprocessing steps. A comparison of the associated partner generated expected energy estimates to those predicted from models also demonstrates that the AIT-derived estimates have lower average percent errors than the other models and the IEC (Table A2). Further evaluation of 2 years of records at a single site demonstrates that the AIT-derived estimates have a lower standard deviation and do not overestimate as much as the IEC-derived estimates ( Figure A3). Given its parsimonious nature, the AIT model is subjected to further evaluation for both highand low-capacity systems.

High-Capacity Systems
For high-capacity systems, a significant (almost uniform) difference is found in both the log RMSE and R 2 values between the AIT and the IEC reference models (Figure 6a,b). Across the sites, the AIT model improves the goodness of fit by 0.42 in R 2 (IEC: 0.501; AIT: 0.93) and 1.16 in log RMSE (IEC: 6.99; AIT: 5.83). Generally, there are very few systems for which the IEC model performs better than the trained models ( Table 3). The percent error (δ), on average, of the AIT model (3.65) is significantly lower than the IEC model (20.86) for high-capacity systems. An evaluation of percent error shows that the AIT generally performs well (i.e., δ ≈ 0 and thinner standard deviation bars) for most irradiance levels, except at the two extremes (i.e., <200 and >1100 W m 2 ) (Figure 7). The difference in performance relative to the IEC model is especially pronounced for larger system sizes (Figure 6a,b). Although the AIT model appears to show a small improvement over the partner-provided values (Figure 6a,b), a t-test concluded that the distributions of both the log RMSE (p-value: 0.78) and R 2 (p-value: 0.48) are not significantly different.
(a): High-capacity systems: (b): Low-capacity systems: Figure 6. Model evaluation using log RMSE and R 2 metrics for high-capacity systems (a) and lowcapacity systems (b). Data points reflect site-level summaries of associated test data while dotted lines reflect best line fits to support visual pattern identification. The R 2 metric was used for this analysis (vs. adjusted R 2 ) since partner-provided model architectures are unknown. The 'additive interaction' regression model (in red) is comparable to the partner-provided proprietary values (green) and consistently performs better than the IEC standard (blue), especially at higher capacity values.
(a) High-capacity systems: (b): Low-capacity systems: Figure 7. Percent error as a function of irradiance shows that the 'additive interaction' model outperforms the IEC standard across both high-capacity (a) and low-capacity (b) systems. Lines indicate mean values, while shaded region captures one standard deviation. The 'additive interaction' model performs best (δ ≈ 0) at 500-1100 W m 2 and at 200-1000 W m 2 for high-capacity and low-capacity systems, respectively.

Low-Capacity Systems
The model performance of the AIT for low-capacity systems was generally comparable to that of high-capacity systems, although the improvements were not as high. Across all low-capacity sites, the AIT model's goodness of fit improved by 0.165 in R 2 (IEC: 0.74; AIT: 0.90) and 0.61 in log RMSE (IEC: 3.35; AIT: 2.75) (Figure 6a,b). Out of the 31 low-capacity sites (comprising 50K hours), the IEC-based model outperformed the trained models in 4 systems (Table 3). In some of the low-capacity systems, the measured energy is much higher than expected ( Figure 5). The tendency of the IEC model to overpredict likely describes why this model performs better for some of the lower-capacity systems.
The δ, on average, is 4.42 and 15.37 for the AIT model and IEC model, respectively. Similar to the high-capacity system, the percent error values are greater at the extremes (Figure 7b). However, the standard error is generally higher in the low-capacity systems, as evidenced by wider standard deviation bars across the irradiance levels (Figure 7b).

Limitations and Future Work
The methodological approach of this analysis was strongly guided by available data. However, future work could extend these methods to consider: (1) energy generation at finer resolutions, (2) additional co-variates, and (3) alternate model formulations. For example, scaling could be used to consider alternate frequencies (beyond the hourly intervals considered in this study) post-evaluation. The methods used in this analysis explicitly omitted variables not included in current standard models (e.g., [15]). However, future assessments could more explicitly incorporate co-variates such as temperature, wind speed, and even age of the site. The latter would especially enable active consideration of degradation influence, which can influence long-term energy generation of PV sites [35]. Additional co-variates (such as type of inverters and modules) could also be included in subsequent iterations to capture more subtle impacts associated with differing site designs. Finally, future work could consider alternate model formulations (e.g., splines) to improve model continuity and better capture energy generation for smaller system sizes.

Conclusions
This work demonstrates the opportunities for leveraging data-driven, machine learning methods to generate more robust expected energy models. Generally, when compared to partner-provided values, the trained regression models outperform the IEC standard, especially in high-capacity systems. Detailed evaluation of the parsimonious AIT or 'additive interaction' model, in particular, demonstrated significant potential for use as a standardized, fleet-level expected energy model. The specific code used to train the regression models as well as the AIT model have been integrated with pvOps, an open source Python package which supports the evaluation of field data by PV researchers and operators; pvOps can be accessed at https://github.com/sandialabs/pvOps, accessed on 22 December 2021. Although this work presents findings specific to PV systems, the general methodologies can be applied to any domain that uses expected energy models to support site planning and O&M activities. Ongoing evaluations and improvements of these standardized expected energy models will continue to increase the accuracy and precision of site-level PV performance evaluations, which is critical to supporting reliability and economic assessments of PV operations and maintenance.  Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: The raw data was procured under non-disclosure agreements and thus, cannot be shared. However, an anonymized version of the filtered dataset used in this study analysis can be found within the article and Supplementary Information.

Acknowledgments:
The authors would like to thank our industry partners for sharing data in support of this analysis as well as Sam Gilletly for their assistance with reviewing an earlier version of this manuscript. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525. The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: standardized E (see Table 1) i standardized I (see Table 1) c standardized C (see Table 1 Table A1. Correlations with energy generation and regression model parameters pre-and post-data processing. Correlations with energy production are generally comparable with irradiance across raw and filtered data, while correlations for site capacity are significantly higher for the filtered data than the raw data.  Figure A1. Relationship between DC capacity (kW) and hourly generated energy (kWh). The blue dots show a site's DC capacity versus its maximum recorded energy generated in a single hour. Although the trends are largely linear, the slopes differ for sites smaller than 1000 kW (blue dashed lines) and sites larger than 1000 kW (orange dashed lines). Since slight deviations in slope can render large prediction error, we train two separate model based on site size.

Appendix A.2. Top-Performing Trained Models
Mathematical equations associated with the trained regression models. In general, these model formulations contain more parameters and do not perform as well as the additive interaction model. High-capacity refers to systems with greater than or equal to 1000 kW in C DC while low-capacity refers to systems smaller than 1000 kW in C DC .