Calibration and Validation of the EPIC Model for Maize Production in the Eastern Cape, South Africa

: Crop models are useful tools to evaluate the e ﬀ ects of agricultural management on ecosystem services. However, before they can be applied with conﬁdence, it is important to calibrate and validate crop models in the region of interest. In this study, the Environmental Policy Integrated Climate (EPIC) model was evaluated for its potential to simulate maize yield using limited data from ﬁeld trials on two maize cultivars. Two independent ﬁelds at the Cradock Research Farm were used, one for calibration and one for validation. Before calibration, mean simulated yield was 8 t ha − 1 while mean observed yield was 11.26 t ha − 1 . Model calibration improved mean simulated yield to 11.23 t ha − 1 with a coe ﬃ cient of determination, ( r 2 ) = 0.76 and a model e ﬃ ciency (NSE) = 0.56. Validation with grain yield was satisfactory with r 2 = 0.85 and NSE = 0.61. Calibration of potential heat units (PHUs) and soil-carbon related parameters improved model simulations. Although the study only used grain yield to calibrate and evaluate the model, results show that the calibrated model can provide reasonably accurate simulations. It can be concluded that limited data sets from ﬁeld trials on maize can be used to calibrate the EPIC model when comprehensive experimental data are not available.


Introduction
With the world population projected to reach over 9 billion people by the year 2050, agriculture is expected to meet the rising demand for food and fiber from limited land and water resources [1,2]. However, increases in agricultural production should not come at the expense of land and water resources as they are critical resources needed for food production. A key area of interest in agronomic research, therefore, is to find agricultural land management strategies that maximize food production without degrading land and water resources (also known as sustainable intensification). In order to develop such strategies, field experiments have been used to investigate the impacts of different management strategies on crop yields and the environment [3][4][5]. Field experiments are sources of reliable data for establishing causal relationships between agricultural land management practices and real-world observed measurements [6,7]. However, field experiments are often expensive, time-consuming, and labor-intensive.
Crop growth simulation models are alternative methods that offer a quicker and less expensive way of investigating the effects of agricultural land management practices on crop yields and the environment. A modelling approach can provide reasonably reliable results in developing agricultural land management strategies, provided the models are calibrated and validated using reliable observed field data [8,9]. For example, crop models have been applied to refine management practices, such as fertilizer application and water usage at the farm and plot scales [10]. Further, crop models have been Farm is situated within the Great Fish River valley catchment (Figure 1), where intensive irrigation and commercial farming are practiced. Soils and climate within the research farm are representative of the broader catchment. The soil is classified as a fine-loamy mollic ustifluvent [39] of alluvial origin, characterized by high sand and silt contents of the alluvial and colluvial material derived from Beaufort sediments in the Eastern Cape. Table 1 shows the general characteristics of the soil profile used in the study. Cradock has an average annual rainfall of 341 mm with the majority of the rainfall occurring in February and March (late summer). Cradock is mostly a farming town, situated along the Great Fish River where water from the river is used for irrigation purposes. Declining soil productivity and increasing water scarcity due to low rainfall and frequent droughts have led to an increase in the use of irrigation water and fertilizer inputs on most farms to maintain agricultural productivity. Consequently, the increased use of fertilizer and irrigation water has led to a deterioration of environmental water quality in commercial farming areas [40]. Research Farm is situated within the Great Fish River valley catchment (Figure 1), where intensive irrigation and commercial farming are practiced. Soils and climate within the research farm are representative of the broader catchment. The soil is classified as a fine-loamy mollic ustifluvent [39] of alluvial origin, characterized by high sand and silt contents of the alluvial and colluvial material derived from Beaufort sediments in the Eastern Cape. Table 1 shows the general characteristics of the soil profile used in the study. Cradock has an average annual rainfall of 341 mm with the majority of the rainfall occurring in February and March (late summer). Cradock is mostly a farming town, situated along the Great Fish River where water from the river is used for irrigation purposes. Declining soil productivity and increasing water scarcity due to low rainfall and frequent droughts have led to an increase in the use of irrigation water and fertilizer inputs on most farms to maintain agricultural productivity. Consequently, the increased use of fertilizer and irrigation water has led to a deterioration of environmental water quality in commercial farming areas [40].

Field Experiment
Field trials on maize were carried out at Cradock Research Farm for the Agricultural Research Council (ARC) by the Cradock research manager from 1999 to 2003. The field trials were carried out to evaluate the potential yield and cultivar stability of different high-yielding maize hybrid cultivars under the semi-arid conditions of the Eastern Cape. Two maize hybrid cultivars, CRN 3760 and PHB 30H22, were chosen for the modelling study as they had complete records for grain yield and agricultural management for the period 1999 to 2003. The maize varieties selected for the modelling study were grown on two independent fields with similar soils at the Cradock Research Farm but managed according to the same irrigation and fertilizer regime. A randomized block design (RBD) [41], with three replications, was used throughout. Plant population was at 50,000 plants per hectare with a row spacing of 0.9 m.
A standard management plan developed by ARC was used to schedule agricultural management practices, including irrigation amount and timing, fertilizer amount, and planting densities. Irrigation type was flood irrigation with the crop receiving a maximum of 600 mm irrigation water per growing season. Nitrogen fertilizer was applied at a rate of 195 kg N ha −1 season −1 . Soil tillage was done using a power plow, and common weed and pest control were carried out as needed. Table 2 shows the typical irrigation and fertilizer amounts used during the trial period. This agricultural management plan was used throughout the trial period from 1999 to 2003. During the trial period, the same management practices were performed around the same time each year with only minor changes according to prevailing local weather conditions. It should be noted that although agricultural management practices were recorded during the trial period, most variables that would be useful in evaluating model performance, such as soil organic carbon contents, leaf area index, and nitrogen content in grain, were not recorded during the trial period. Only the final grain yield was recorded, and this limited the observed data that could be compared with model outputs.

Model Description
The EPIC model simulates approximately 80 crops with the model using unique parameter values for each crop [18]. In the crop growth routine, crop yield is estimated as a function of the potential and water-limited harvest index (HI, WSYF), biomass to energy ratio (WA), planting density (PD), photosynthetic active radiation (PAR), and vapor pressure deficit (VPD) [26]. Potential biomass is adjusted to actual biomass through daily stress caused by extreme temperature, water, and nutrient stress or inappropriate aeration. Values of context-specific parameters such as potential heat units accumulated by a crop from its sowing to maturity (PHU), harvest index (HI) and optimum temperature (OT) need to be adjusted according to the region and context in which the model is to be applied.

Data Sources
Daily weather data for the Eastern Cape, which included precipitation, maximum and minimum temperature, solar radiation, and relative humidity for the years 1980-2010 were obtained from the publicly available AgMERRA [42] climate dataset at 0.5 × 0.5 arc-degree spatial resolution. Soil data (bulk density, cation exchange capacity, texture, and electrical conductivity) were obtained from records of previous soil analyses done at the Cradock Research Farm. However, some soil parameters required to set up the EPIC model were missing, and these were obtained from the Harmonized World Soil Database (HSWD) [43]. The missing soil parameters in the EPIC soil file were then adjusted with values from the HSWD based on expert opinion [44]. Agricultural management data, such as fertilizer application, irrigation amount, and planting and harvesting dates during the trial period, were obtained from the Cradock Research Farm manager [44].
Initial PHUs were estimated from long term (1980-2010) daily maximum and minimum temperature values and optimal temperature for maize growth using the PHU calculator at Purdue University, Indiana, USA. Base (minimum temperature for plant growth) and OT were set to 8 • C and 25 • C, respectively, according to values in the ARC Maize production guideline [45]. The number of days from planting to maturity were obtained from the maize hybrid variety producer [46]. Planting and harvesting dates recorded during the trial period were also used as inputs. Together, the long-term maximum and minimum temperature data, base and optimum temperature, as well as the number of days from planting to maturity were used to estimate the potential heat units required from planting to maturity based on the following heat unit formulas: Based on this calculation, initial potential heat units were set to 2340 and the duration of the growing season set to 180 days.
Annual grain yield and management practices including tillage, fertilization, sowing, planting, irrigation, and harvesting dates were recorded on site. Data for the period 1999-2003 from one field site were used to calibrate the model while data from the second field site were used for model validation. The agricultural management plan provided by the Cradock Farm manager was used as input for fertilizer application timing, irrigation scheduling, and planting and harvesting times. Based on the management plan, the corresponding crop operation schedules, including tillage, fertilizer application, irrigation timing, planting, and harvesting dates were designed in EPIC's Operations Schedule file for each site.

Model Setup
The EPIC-IIASA modelling framework [47] was used in this study. Obtained data sets were converted to simulation grids at a resolution of 5 × 5 arc-min. The modelling scheme was set up by combining available GIS layers on soil, relief, and weather [12,19]. The model was constructed for the whole of the Eastern Cape and divided into homogenous response units according to physical properties given by the intersection of site properties, such as elevation and soil texture. Subsequently, a zone raster was defined, consisting of homogenous simulation units and weather grids upon which the model was run [47]. For this study, the simulation grid in which Cradock was located was chosen for model simulations. One soil profile adjusted for soil properties experienced in the study area (see Section 2.1) was therefore used to run the simulations in the model. The Priestly-Taylor method for potential evapotranspiration (PET) was used in the model for estimation of PET. The Priestly-Taylor method was chosen because it gave PET values close to previously reported values for PET in the region compared to other methods of estimating PET [48].
The model was run for 31 years from 1980 to 2010, corresponding to the length of the weather records. Simulated crop yields were compared to observed yields from the period 1999 to 2003 with the initial 19 years serving as a warm-up period for equilibrating soil functions, water erosion, as well as soil nutrient depletion. Irrigation and fertilizer application were set to manual scheduling and input into the operations schedule file based on dates recorded during the field trials ( Table 2).

Parameters Identification
During calibration, few adjustments were made to the default parameters to reflect local crop cultivars and site conditions. Earlier studies in semi-arid conditions by [49], [50], and [20] have found simulated crop yields to be sensitive to: (i) potential heat units (PHU, Equation (2)); (ii) planting density (PD), the number of plants per unit area; (iii) biomass to energy ratio (WA), defined as the potential growth per unit of intercepted photosynthetically active radiation; (iv) harvest index (HI) or ratio of economic yield to above-ground biomass; and (v) microbial decay rate. These parameters were selected for calibration to adjust simulated yields to correspond to observed yields as closely as possible. The choice of parameters to calibrate was based mainly on observed available data and also on suggestions from EPIC developers following [51].

Calibration Procedure
Calibration was done according to the steps in Figure 2 adapted from [12].

Simulation 0: Simulation with Default Parameters
The default maize crop parameter dataset (provided with EPIC version 0810) was used as the starting basis to establish a modified parameter set for maize yield simulation. The default maize parameters were modified using data from the calibration period (1999)(2000)(2001)(2002)(2003) and values from literature to account for the local context. The modified parameter set was then used to run a simulation for the period 1980 to 2010. HI and WA were set to their default values of 0.5 and 40 kg ha −1 MJ −1 m 2 , respectively. PHUs were set to 2340 according to calculations using the growing season length and the maximum and minimum temperatures. Irrigation and fertilizer application were both set to manual and input into the operations schedule file according to the management records obtained from the Cradock Station manager. Planting and harvesting dates were also taken from management records. Planting density was set to 5 plants per square meter based on management records.

Simulation 1: Parameter Adjustment
Model parameters influencing soil organic carbon and crop growth were adjusted based on literature, site history [44], and expert knowledge [52]. The value that gave the lowest root-mean-square error (RMSE, Equation (3)) between observed yields and simulated yields was selected as the final calibration value. Table 3 shows the parameters adjusted and the values before and after calibration. The microbial decay rate coefficient Parm (20) was set to 1 after values reported in previous EPIC modelling frameworks, such as EPIC-BOKU [53]. The microbial decay rate coefficient impacts carbon mineralization, which affects crop yield [49]. Century slow humus transformation rate (Parm 47) and exponential coefficient in equation expressing tillage effect on residue decay rate (Parm 52) affect carbon dynamics and must be estimated to simulate nitrogen supply correctly. The minimum HI under water stress (WSFY) was set to 0.01 from 0.4, which gives a stronger weight to water stress in the model's calculation of HI [11]. Steps followed in the calibration process (adapted from [12]).

Simulation 0: Simulation with Default Parameters
The default maize crop parameter dataset (provided with EPIC version 0810) was used as the starting basis to establish a modified parameter set for maize yield simulation. The default maize parameters were modified using data from the calibration period (1999)(2000)(2001)(2002)(2003) and values from literature to account for the local context. The modified parameter set was then used to run a simulation for the period 1980 to 2010. HI and WA were set to their default values of 0.5 and 40 kg ha −1 MJ −1 m 2 , respectively. PHUs were set to 2340 according to calculations using the growing season length and the maximum and minimum temperatures. Irrigation and fertilizer application were both set to manual and input into the operations schedule file according to the management records obtained from the Cradock Station manager. Planting and harvesting dates were also taken from  Figure 2. Steps followed in the calibration process (adapted from [12]).

Simulation 2: PHU Adjustment
PHUs were adjusted in steps of 5 to match observed yields as closely as possible. The PHUs that gave the lowest RMSE between observed and simulated yields were selected as the final calibrated PHU.

Simulation 3: HI Adjustment
HI has been shown to vary across locations and management practices [59]. In the HI adjustment simulation, HI was adjusted from 0.4 to 0.8 in steps of 0.05 to explore the effects of varying HI on crop yields.

Simulation 4: WA Adjustment
WA is used in the model for converting energy to biomass. Different values of WA were changed by steps of 5 to explore the influence of WA on crop growth. WA has been shown to significantly affect crop yield and should be one of the last parameters to be adjusted [18].

Statistical Analyses
To evaluate model efficiency in predicting observed yields, the following statistics were computed: Root-mean-square error (RMSE), the coefficient of determination (R 2 ), Nash-Sutcliffe efficiency (NSE) and percent bias (PBIAS).
where n is the sample number, O mean and S mean are the observed mean and simulated mean values, respectively. Oi and Si are the observed and predicted values of the ith observation (i = 1 to n), respectively. For RMSE, values closer to zero imply a good fit between observed and simulated yields [60]. A value of zero for RMSE means that the model predicts the observations with perfect accuracy. The coefficient of determination, R 2 , ranges from 0 to 1, with higher values indicating less error variance [16]. NSE ranges from negative infinity to 1. A value of NSE equal to 1 represents a perfect model fit, and negative NSE values indicate that the mean observed value is a better predictor than the simulated value [16]. PBIAS measures the tendency of simulated data to be larger or smaller than the observed data. It has an optimal value of 0, with positive values indicating underestimation and negative values indicating overestimation [61]. Differences in mean values between observed and simulated values were evaluated using the Student's t-test in Excel 2016. Model performance was considered satisfactory if r 2 ≥ 0.6, PBIAS ≤ ±25% and NSE ≥ 0.4 following [62].

Simulation with Default Parameters
The simulation with default parameters showed an overall underestimation of observed yields with PBIAS = 17.6, r 2 = 0.02, RMSE = 3.65 t ha −1 , and NSE = −3.3. Simulated yields ranged from 7 tonnes per hectare (t ha −1 ) to 8.3 t ha −1 while observed yields ranged from 9 t ha −1 to 14 t ha −1 . The model underestimated crop yields for all years, as shown in Figure 3.   Table 3.

PHU Calibration
The initial PHU calculated from long-term weather records gave a RMSE of 1.86 t ha −1 between the observed and simulated yields. Increasing PHU value improved model simulations with the PHU value of 2480 giving the lowest RMSE value of 1.17 t ha −1 . Further adjustments of PHU above 2480 did not yield any improvement of RMSE. Following up on the parameter adjustment in the previous section, calibrating PHU brought model simulations within the criteria set for satisfactory model calibration (r 2 > 0.6 and PBIAS < +/−25%) and further calibration of the crop parameters HI and WA was not performed. In this calibration simulation with PHU = 2480, simulated crop yields ranged from 10 t ha −1 to 12 t ha −1 while observed yields ranged from 9 t ha −1 to 14 t ha −1 (Figure 4).  Table 3.

PHU Calibration
The initial PHU calculated from long-term weather records gave a RMSE of 1.86 t ha −1 between the observed and simulated yields. Increasing PHU value improved model simulations with the PHU value of 2480 giving the lowest RMSE value of 1.17 t ha −1 . Further adjustments of PHU above 2480 did not yield any improvement of RMSE. Following up on the parameter adjustment in the previous section, calibrating PHU brought model simulations within the criteria set for satisfactory model calibration (r 2 > 0.6 and PBIAS < ±25%) and further calibration of the crop parameters HI and WA was not performed. In this calibration simulation with PHU = 2480, simulated crop yields ranged from 10 t ha −1 to 12 t ha −1 while observed yields ranged from 9 t ha −1 to 14 t ha −1 (Figure 4). value of 2480 giving the lowest RMSE value of 1.17 t ha −1 . Further adjustments of PHU above 2480 did not yield any improvement of RMSE. Following up on the parameter adjustment in the previous section, calibrating PHU brought model simulations within the criteria set for satisfactory model calibration (r 2 > 0.6 and PBIAS < +/−25%) and further calibration of the crop parameters HI and WA was not performed. In this calibration simulation with PHU = 2480, simulated crop yields ranged from 10 t ha −1 to 12 t ha −1 while observed yields ranged from 9 t ha −1 to 14 t ha −1 (Figure 4). Final PHU calibration results showed a coefficient of determination (r 2 ) between simulated and observed yields of 0.76 ( Figure 5). A Nash-Sutcliffe efficiency of 0.56 and a PBIAS = 0.31% were considered to be satisfactory and did not require further efforts in calibrating HI and WA in the model. RMSE decreased from 3.65 t ha −1 in the default simulation to 1.17 t ha −1 in the PHU calibrated simulation (Table 4).  Final PHU calibration results showed a coefficient of determination (r 2 ) between simulated and observed yields of 0.76 ( Figure 5). A Nash-Sutcliffe efficiency of 0.56 and a PBIAS = 0.31% were considered to be satisfactory and did not require further efforts in calibrating HI and WA in the model. RMSE decreased from 3.65 t ha −1 in the default simulation to 1.17 t ha −1 in the PHU calibrated simulation (Table 4).  In the validation site, observed maize yields ranged from 9 t ha −1 to 14 t ha −1 while simulated yields ranged from 10 t ha −1 to 12 t ha −1 . The model slightly overestimated maize yields for three out of the five validation years. The year 2000 had exceptionally high observed yields (14.01 t ha −1 ), which

Validation
In the validation site, observed maize yields ranged from 9 t ha −1 to 14 t ha −1 while simulated yields ranged from 10 t ha −1 to 12 t ha −1 . The model slightly overestimated maize yields for three out of the five validation years. The year 2000 had exceptionally high observed yields (14.01 t ha −1 ), which were under-simulated by the model. The year 2003 had low observed yields, which were overestimated by the model (Figure 6).  Table 5 shows a summary of the model statistics for the validation site. A Student's t-test comparing the observed and simulated mean grain yields showed that the observed mean yield was not significantly different from the simulated mean yield (p = 0.9) at the 95% significance level.  Table 5. Mean simulated and observed maize grain yield in tonnes per hectare (t ha −1 ), Nash-Sutcliffe Efficiency (NSE) and root-mean-square error (RMSE) and PBIAS for the validation simulation. Figure 6. Comparison of observed yields in tonnes per hectare (t ha −1 ) and simulated yields in the validation simulation with the calibrated model.

Observed
The coefficient of determination, r 2 , between observed and simulated yields was 0.85, as shown in Figure 7. Model performance was satisfactory with NSE = 0.61, RMSE = 1.06 t ha −1 , and PBIAS = −1.02. Table 5 shows a summary of the model statistics for the validation site. A Student's t-test comparing the observed and simulated mean grain yields showed that the observed mean yield was not significantly different from the simulated mean yield (p = 0.9) at the 95% significance level.  Table 5 shows a summary of the model statistics for the validation site. A Student's t-test comparing the observed and simulated mean grain yields showed that the observed mean yield was not significantly different from the simulated mean yield (p = 0.9) at the 95% significance level.  Table 5. Mean simulated and observed maize grain yield in tonnes per hectare (t ha −1 ), Nash-Sutcliffe Efficiency (NSE) and root-mean-square error (RMSE) and PBIAS for the validation simulation.

Observed
Simulated RMSE Figure 7. Linear regression of simulated crop yields (t ha −1 ) on observed maize yields for the validation period.

Discussion
In this current study, the EPIC crop model was evaluated for its possible use as a decision support tool in irrigation and fertilizer management of crops in a semi-arid condition in South Africa. The success of crop growth simulation models depends on the real-world accuracy of simulating crop yields and other variables of importance. The calibration results of this study displayed a reasonable agreement between observed and simulated crop yields.
Effective parameter estimation is essential in accurately reproducing field conditions. Wang [49] considered that, although the majority models might be effectively used in many environments, uncertainty about many of the parameters persists, and their estimation is important in obtaining useful model results. In this study, the simulation with default parameters gave a poor agreement between observed and simulated yields, indicating the necessity for calibration. However, adjusting parameters related to carbon dynamics using site history and expert knowledge greatly improved model simulations, demonstrating the importance of calibration with site-specific parameters and giving weight to [63] and [12] assertions that detailed data on a local scale can improve the reliability and accuracy of model simulations. Model uncertainty can thus be reduced by using site-specific data.
Adjusting PHU improved crop yield simulations to values nearer to observed yields. This improvement in model simulations is in agreement with studies by [12] and [64], which showed that refinement of PHUs to the specific region could significantly improve the agreement between simulated and observed yields. PHU is directly linked to the growth of biomass and its allocation to final yield, hence the significant effect of PHU adjustment on crop yields. The results of the present study showed that simulated yields were closest to observed yields when PHU was 2480. This value is within the range of PHUs reported in the literature for maize. For example, experiments conducted in the USA by [54], showed that the PHUs required for the maturity of maize vary between 1000 and 2900. The Agricultural Research Council's maize information guide states that maize normally takes 120 days from planting to maturity, but this value is generally for the warmer traditional maize-growing areas in South Africa, such as Kwa-Zulu Natal [65]. The Cradock region is relatively cooler compared to other maize growing areas, which may explain the long duration of the growing season.
By default, the potential HI of maize for the EPIC model is set to 0.5, which is typical for improved high yield maize varieties [28]. The value of 0.5 was taken as the final calibrated value and is the same as the HI values used by [19] and [66]. The biomass to energy ratio (WA) is a known parameter influencing crop yields [49]. During PHU calibration, a RMSE of 1.17 t ha −1 and PBIAS = 0.31 between observed and simulated yields were observed, indicating that no further adjustment of WA and HI were needed as acceptable criteria set for model performance had been satisfied. The default value of 40 kg ha −1 MJ −1 m 2 was therefore adopted as the final calibration value. This value is the same as that used by [67] and [19]. WA increases yield through biomass changes and should be adjusted last based on experimental data as it can significantly change the rate of crop growth and final crop yield [18]. The HI value of 0.5 is also close to values reported in studies in nine states in the USA [66] and the value of 0.48 reported by [49].
Although the model simulated observed yields correctly, in some years the model overestimated low yields. This is in agreement with studies by [68] and [69] that found that EPIC tended to overestimate low yields. Kiniry [55] indicated that overestimation of plant available water at field capacity could cause EPIC to overestimate yields in dry years and suggested measuring the maximum depth of water extraction using local cultivars. However, this was beyond the scope of the present study. In 2003, when low maize yields were observed, management records show that the 2003 trials suffered heavy weed infestations. While EPIC successfully simulates water and fertilizer effects on plant growth, currently the model does not accurately account for the competition from weeds [22]. This may explain why the model over-simulated the lower yields observed in 2003. Although EPIC has a pest damage factor, it is only represented as an estimate rather than a detailed process in the model [18].

Conclusions
The results of the study suggest that limited data from field trials on maize that only include grain yield and agricultural management dates can be used for the calibration of the EPIC model under the semi-arid conditions of South Africa. The evaluation of the EPIC model with observed independent field trial data was reasonably accurate, given the limited data available for model evaluation. However, it is important to calibrate parameters related to carbon dynamics and PHUs according to local conditions as soil and carbon-related parameters, and site-specific PHUs can significantly improve model simulation results. Further studies using the calibrated model that evaluate different crop management options, such as deficit irrigation and fertilizer application timing, should be carried out in the Eastern Cape. Field trials on maize and other crops are also carried out across many sites in South Africa by seed producers to evaluate the stability and potential yield of crop varieties under different weather and soil conditions. Availability of such datasets presents opportunities for the calibration and validation of crop models before their application. Crop model users should make an effort to work with researchers who carry out field trials on crop varieties to ensure collection of detailed data needed for model calibration and validation that are not usually collected by seed producers.