Evaluation of the ‘Irish Rules’: The Potato Late Blight Forecasting Model and Its Operational Use in the Republic of Ireland

Potato late blight caused by Phytophthora infestans is one of the most important plant diseases known, requiring high pesticide inputs to prevent disease occurrence. The disease development is highly dependent on weather conditions, and as such, several forecasting schemes have been developed worldwide which seek to reduce the inputs required to control the disease. The Irish Rules, developed in the 1950s and calibrated to accommodate the meteorological network, the characteristics of potato production and the P. infestans population at the time, is still operationally utilized by the national meteorological agency, Met Éireann. However, numerous changes in the composition and dynamics of the pathosystem and the risks of production/economic consequences associated with potato late blight outbreaks have occurred since the inception of the Irish Rules model. Additionally, model and decision thresholds appear to have been selected ad hoc and without a clear criteria. We developed a systematic methodology to evaluate the model using the empirical receiver operating curve (ROC) analysis and the response surface methodology for the interpretation of the results. The methodology, written in the R language, is provided as an open, accessible and reproducible platform to facilitate the ongoing seasonal re-evaluation of the Irish Rules and corresponding decision thresholds. Following this initial analysis, based on the available data, we recommend the reduction of the thresholds for relative humidity and an initial period duration from 90% and 12 h to 88% and 10 h, respectively. Contrary to recent reports, we found that the risk of blight epidemics remains low at temperatures below 12 ◦C. With the availability of more comprehensive outbreak data and with greater insight into the founder population to confirm our findings as robust, the temperature threshold in the model could potentially be increased from 10 ◦C to 12 ◦C, providing more opportunities for reductions of pesticide usage. We propose a dynamic operational decision threshold between four and 11 effective blight hours (EBH) set according to frequency of the disease outbreaks in the region of interest. Although the risk estimation according to the new model calibrations is higher, estimated chemical inputs, on average, are lower than the usual grower’s practice. Importantly, the research outlined here provides a robust and reproducible methodological approach to evaluate a semi-empirical plant disease forecasting model.


Introduction
Potato late blight (PLB) caused by Phytophthora infestans (Mont.) de Bary [1] is amongst the most destructive diseases of potato crops [2]; due to its fast reproductive cycle and aggressiveness, if left untreated, it can rapidly lead to the total destruction of the crop, either in the field or in storage, following harvest [3]. In Ireland, historical outbreaks of potato blight have had a significant cultural and economic impacts, and are partly attributed to mass starvation and the subsequent migration of large portions of the population fleeing from famine during the 1840s [4]. In Ireland alone, an estimated €5 million is spent annually on fungicides to control PLB, whilst globally the cost of control and losses are estimated to exceed €1 billion annually [5]. Although P. infestans can form overwinter oospores, under Irish conditions that is not believed to occur (Louise Cooke, personal communication), and typically the pathogen overwinters in infected tubers (in dumps, volunteers or infected seeds) [6]. The rate of late blight epidemic progression is highly dependent on the weather, with temperature, relative humidity and precipitation being the most important variables, the latter two closely being related [7]. Prolonged periods of humid and cool weather provide conditions favorable for pathogen sporulation [8]; short-lived sporangia subsequently spread through a mixture of rain splash and wind dispersal [9]. The disease impacts yield both indirectly and directly; indirectly, by reducing photosynthetic surface, and directly, when zoospores washed from foliage infect tubers in the ground [6].
Since the late 1970s, increasing globalization has resulted in the worldwide migrations of pathogen genotypes of both mating types, leading to the displacement of dominant, older clonal lineages or genotypes commonly referred to as US-1 [10]; this has facilitated the development and spread of new lineages, some of which demonstrate an increased aggressiveness [11]. This rise of new genotypes has introduced changes in the ecology of P. infestans [12][13][14][15]. The increasing genetic variability of P. infestans is likely reducing the durability of late blight resistance based on R gene stacks [16,17]. Although the structure of the Irish P. infestans population shows little genetic variation, it is dominated by a few clonal genotypes comprised of the more aggressive EU_13_A2 and EU_6_A1 strains [18,19]. New genotypes have established in Ireland and have been reported in higher frequencies in recent years [20][21][22]. In addition, the majority of potato production in Ireland is based on more susceptible potato cultivars, guided by market demand [23]. Population diversification, coupled with the influence of climate change [24], has led to increased difficulties controlling PLB [25,26]. Presently, due to the high risk of PLB epidemics in high-input agriculture, associated with increased aggressiveness of the pathogen, intensive fungicide regimes are routinely used; in Western Europe this equates to more than 10 applications per season [5,27], while in some countries crops can receive as many as 20 fungicide applications [28]. The need to develop late blight forecasting models for use as decision support tools has been long acknowledged as one of few integrated pest management (IPM) approaches available for PLB management, motivated by both environmental and economic factors [29][30][31]. In response to the environmental challenges posed from increased pesticide usage, the European Community Directive 128/2009 on the Sustainable Use of Pesticides provides strict guidelines for the sustainable use of plant protection products in order to reduce risks to human health and the environment [32]. Reliable disease forecasting offers the potential to reduce yield losses and crop inputs during unfavorable blight weather conditions, while also supporting an ex post facto justification for the use of plant protection products [3,31] in compliance with national and international regulations. Kessel et al. [33] have shown the necessity for environmental risk prediction to guide low input chemical protection to prevent the resistance breakdown of currently resistant potato cultivars. Forecasting systems that involve numerous alerts have been shown to be useful in that regard, when applied on a pathosystem involving a valuable crop and rapid disease [34].
At their core, crop disease forecasting systems employ algorithms, either mechanistic (fundamental) or empirically based, to predict disease cycles. Mechanistic based models are developed from laboratory experiments in controlled environment chambers, greenhouses or fields, and describe one or more segments of the host-parasite relationship as influenced by the environment [34]. Initially, the

Data and Methods
The paper is structured as follows; initially we provide an overview of the site and available biological and weather data. The Irish Rules model is described, after which we present the evaluation of the model's parameter thresholds, currently employed operationally. Proposed model modifications and identified decision thresholds are further assessed by a comparison of treatment frequency and dose reduction. A schematic of the workflow is outlined in Figure 1. The list of frequently used abbreviations is presented in Table 1.

Site Description
Oak Park, Co. Carlow, Ireland (latitude: 52.8560 and longitude: −6.9121), a Teagasc (Irish agricultural advisory body) research center, is located in the south east of Ireland. Soils are composed of light limestone gravelly soils and heavy textured soils derived from limestone till.
Typical weather conditions calculated over the growing period (April to October) for the period 2007 to 2016 indicate that average daily relative humidity values were typically high throughout the potato growing season, which is a characteristic of Irish conditions more generally. The mean temperature over the period was 13 • C, relative humidity was 80.2% and average sum of precipitation was 398.31 mm. The night time temperatures during the early part of the potato growing season were low, with averages of 6.6 • C in April and 9.2 • C May.

Biological Data
Planting dates and primary disease outbreak data were acquired from the Teagasc breeding program field trial records for the period 2007 to 2016. The breeding program trials consisted of 25-60 potato varieties in all years, representing all levels of susceptibility to potato late blight. Trials were laid out in randomized complete block design, with six blocks and plots of 20 plants. The seeds were propagated in accordance with the seed certification scheme of the Irish Department of Agriculture, Food and the Marine (DAFM) to ensure no latently infected tubers; the P. infestans inoculum originated from natural sources. Crop rotation was undertaken on a five-year cycle. Plots did not receive any fungicide treatments. All plots were visually inspected for disease occurrence on a weekly basis, from crop emergence, and generally more frequently during periods of humid weather. That data provided the information about the disease outbreaks used in the model analysis and evaluation outlined herein.
Planting dates in the biological data are somewhat later compared to the usual agricultural practice in Ireland, which is suitable for the analysis because the healthy green tissue is present throughout summer. Outbreak dates vary from 26th of June to 23rd of August.

Weather Data
Hourly weather data for the historical period under investigation, was acquired from the Met Éireann synoptic weather station at Oak Park. The weather variables obtained include the hourly air temperature (°C) and relative humidity (%) at 2m and the total hourly precipitation (mm). The trial sites were within a radius of 500 m of the weather station in all years and were located on flat ground with no physical barriers in between them.
Availability of good quality weather and biological data is crucial for successful calibration and evaluation of plant disease forecasting models [50]. Quality control of the available weather data and appropriate imputation of missing values is often disregarded in agriculture, which could lead to imprecise or incorrect results [47,51]. Post-processing of the weather data undertaken as part of this study included checking for duplicate entries and recording values outside of 'expected' ranges, determined using histograms. The data had only 6 missing values for both precipitation and temperature, and 7 missing values for relative humidity, over the period of interest. These short intervals of consecutive hours of missing data for temperature and relative humidity were imputed by spline interpolation using the Forsythe, Malcolm and Moler method [52], as suggested by Shah et al. [53].

The IR Model and Its Operational Use
According to original Irish Rules [41], illustrated in Figure 2, periods with temperatures ≥ 10°C and relative humidity ≥ 90% provide the necessary environmental conditions considered conducive for potato late blight. These periods are further split into: • Sporulation period-the initial stage considered necessary for the formation of sporangia is set to a minimum of 12 consecutive hours; • Infection period-starts after the 12 hour sporulation period is completed. If the surface of the plant is not wet at the beginning of the infection period, effective blight hours (EBH) begin accumulating from the 16th hour (12 h sporulation period + 4 h = 16 h); when the surface of the plant is wet at the beginning of the infection period, the effective blight hours' (EBH) accumulation is reduced by a period of 4 h (16 h − 4 h = 12 h). The leaf (surface) wetness (LWt) is considered present if there was a considerable amount of precipitation (≥ 0.1 mm) during the time window of 3 h before and 3 h after the 12th consecutive hour of sporulation. The infection period lasts until conditions (temperatures ≥ 10°C and relative humidity ≥ 90%) are not broken for more than 5 consecutive hours, required for spore survival.
Hours fulfilling these criteria are termed effective blight hours (EBH). The risk of potato late blight outbreak estimation is based on the longevity of the infection period, expressed as a sum of the EBH. • Infection period-starts after the 12 hour sporulation period is completed. If the surface of the plant is not wet at the beginning of the infection period, effective blight hours (EBH) begin accumulating from the 16th hour (12 hours sporulation period + 4 hours = 16 hours); when the surface of the plant is wet at the beginning of the infection period, the effective blight hours' (EBH) accumulation is reduced by a period of 4 hours (16 hours − 4 hours = 12 hours). The leaf (surface) wetness (LWt) is considered present if there was a considerable amount of precipitation (≥ 0.1 mm) during the time window of 3 hours before and 3 hours after the 12th consecutive hour of sporulation. The infection period lasts until conditions (temperatures ≥ 10 ℃ and relative humidity ≥ 90%) are not broken for more than 5 consecutive hours, required for spore survival.
Hours fulfilling these criteria are termed effective blight hours (EBH). The risk of potato late blight outbreak estimation is based on the longevity of the infection period, expressed as a sum of the EBH. Simplified presentation of the Irish Rules algorithm. The operational warning threshold scale is presented as the "traffic light" scheme, ranging from green (no warning), to yellow when the warning considered and red when the warnings are issued without delay.
Currently, the warning system is used operationally by the national meteorological service, Met Éireann. The IR are utilized in their original form to support the blight warning service with issuing spray advice [44,46]. The decision on issuing a blight warning and its termination is determined by the meteorological officer on duty after visual inspection of the IR model outputs based on a 10-day numerical weather prediction (NWP) model forecast from the European Centre for Medium-range Weather Forecasts (ECMWF). Warnings are disseminated through the Met Éireann web portal, radio and television weather broadcasts and mobile application. Operationally, a decision threshold to issue a blight warning is considered for an accumulation of 12 EBH. Additionally, if a continuous spell of mild, humid and damp weather lasting 24 hours or more is expected, a blight warning may be considered, even if it does not explicitly meet the warning criteria. Blight warnings are typically Simplified presentation of the Irish Rules algorithm. The operational warning threshold scale is presented as the "traffic light" scheme, ranging from green (no warning), to yellow when the warning considered and red when the warnings are issued without delay.
Currently, the warning system is used operationally by the national meteorological service, Met Éireann. The IR are utilized in their original form to support the blight warning service with issuing spray advice [44,46]. The decision on issuing a blight warning and its termination is determined by the meteorological officer on duty after visual inspection of the IR model outputs based on a 10-day numerical weather prediction (NWP) model forecast from the European Centre for Medium-range Weather Forecasts (ECMWF). Warnings are disseminated through the Met Éireann web portal, radio and television weather broadcasts and mobile application. Operationally, a decision threshold to issue a blight warning is considered for an accumulation of 12 EBH. Additionally, if a continuous spell of mild, humid and damp weather lasting 24 h or more is expected, a blight warning may be considered, even if it does not explicitly meet the warning criteria. Blight warnings are typically issued 2 to 6 days in advance and include information about areas likely affected, duration of the spell and opportunities for spraying, where possible. The decision threshold of 12 EBH was established from operational experience since the 1950s (for example, between 1950 and 2000, a network of blight scouts reported on regional blight outbreaks and the progress of epidemics) although it was not systemically documented. Currently, the May 1st is the 'Zero date'. a date threshold after which warnings are considered valid.

Model Thresholds under Evaluation
The IR is a set of processes mimicking the 'behavior' of a mechanistic model. The transition between these processes is determined by an empirically derived set of thresholds, which ultimately influence the risk estimation expressed as the duration of an infection period. Four of these primary thresholds were subjected to a sensitivity analysis. The environmental thresholds for relative humidity (RHt), temperature (Tt) and the duration of period considered as necessary for the inoculum production-the sporulation duration threshold (SDt), were varied from −3 to +3 units of their respective default values ( Table 2). To assess the leaf surface wetness indicator, the default estimation using rain (>0.1 mm) was compared to using the combined rain and relative humidity thresholds as indicators (Rain > 0.1 mm and RH ≥ 90). The model was run using all combinations of model variable thresholds. Outputs were then combined with the hourly weather data for further analysis.

Analysis of Diagnostic Performance
The period considered in the sensitivity analysis was from planting date to the recorded disease outbreak in each season, which was further split in two segments ( Figure 3): -No infection period: Considered the period when the healthy (susceptible) host was present, but no infections were observed. This period lasted from emergence, which was estimated to start three weeks after planting, to 14 days prior to the first observation of the disease in the field. Specificity or true negative rate was measured during this period. It was considered that each warning during this period activated a chemical treatment which provided protection for the subsequent period of 7 days, and was considered as a false positive (FP). True negatives (TN) were calculated as a proportion of the remaining period, when fungicide protection was not recommended. -Warning period: Considered a period when infections occurred and was assigned a 10-day time window, starting 14 days and ending 4 days prior to the disease being observed in the field. A risk warning of disease outbreak 10 days ahead has been reported as an optimum warning time [54], and a period of four days was considered to be a minimum incubation period. Sensitivity or true positive rate was assessed during 'warning period'. Warning periods where the value of the warning threshold was reached and would trigger a fungicide treatment, is considered as a true positive (TP) and if the warning was not issued false negatives (FN).
Agronomy 2019, 9, x FOR PEER REVIEW 8 of 25 treatment, is considered as a true positive (TP) and if the warning was not issued false negatives (FN).

Figure 3.
Simplified schematics illustrating the temporal split of the data for the diagnostic performance calculation (source [55]).
Contingency tables were created with sensitivity and specificity values from a confusion matrix (as shown in Table 3) for each evaluated disease warning decision threshold for all model outputs. The range of decision thresholds used as cut-off points, or the level of risk leading to treatment, was from 1 to 18 EBH.

Receiver Operating Characteristic (ROC) Curves
The performance of each model was assessed using receiver operating characteristic curves (ROC). An ROC curve is a graphical technique for assessing model predictive ability through the relationship of specificity and sensitivity [56,57]. Empirical ROC curves were constructed with cutoff points for different thresholds on a discrete scale. Specificity (i.e., 1-specificity) on the x-axis and the sensitivity on the y-axis was plotted for each cut-off point. The accuracy of the model was evaluated based on the area under the ROC curve (AUROC), serving as a single measure of the discriminatory ability of the model [58,59]. The area under the curve (AUROC) was calculated for model outputs using the trapezoidal rule [59]. In general, an AUROC of 0.5 suggests no discrimination (i.e., the model is no better than a random predictor); as the value of AUROC approaches 1, the better the predictive value of the model are [60]. Contingency tables were created with sensitivity and specificity values from a confusion matrix (as shown in Table 3) for each evaluated disease warning decision threshold for all model outputs. The range of decision thresholds used as cut-off points, or the level of risk leading to treatment, was from 1 to 18 EBH.

Receiver Operating Characteristic (ROC) Curves
The performance of each model was assessed using receiver operating characteristic curves (ROC). An ROC curve is a graphical technique for assessing model predictive ability through the relationship of specificity and sensitivity [56,57]. Empirical ROC curves were constructed with cut-off points for different thresholds on a discrete scale. Specificity (i.e., 1-specificity) on the x-axis and the sensitivity on the y-axis was plotted for each cut-off point. The accuracy of the model was evaluated based on the area under the ROC curve (AUROC), serving as a single measure of the discriminatory ability of the model [58,59]. The area under the curve (AUROC) was calculated for model outputs using the trapezoidal rule [59]. In general, an AUROC of 0.5 suggests no discrimination (i.e., the model is no better than a random predictor); as the value of AUROC approaches 1, the better the predictive value of the model are [60].

Evaluation of Leaf Wetness Estimation
In order to evaluate the LWt estimation, two indicators are evaluated; values of hourly rain (rain > 0.1 mm) and rain and humidity (rain ≥ 0.1 mm and RH ≥ 90%). The model runs were split in two groups, with each run having a measure of the LWt indicator in each group. The difference between paired samples was calculated; and normality of the sample distribution was visualized using a histogram and density plot, and assessed with the Shapiro-Wilk test. Where samples did not conform to a normal distribution, a non-parametric paired two-sample Wilcoxon rank sum test was carried out to assess the difference between groups. Model outputs with higher performing LWt indicators were kept for further analysis.

Evaluation of Main Variable Thresholds
Boxplots were used for the initial visualization of change in model accuracy, with a change in each factor level of a single variable. A polynomial surface, using locally estimated scatterplot smoothing (LOESS), was fitted to the AUROC data using each variable as a predictor and all possible interactions, to model the trend of AUROC response with change for each variable threshold individually.
An orthogonal polynomial regression model was used to study the sensitivity of the AUROC with the change in the model variable thresholds and their interactions. Polynomial response models have shown to be useful for summarizing relationships [61]. The response surface methodology [62] consists of a group of mathematical and statistical procedures used for approximating the functional relationship between a selection of control variables which have an influence on the response variable [63]. The polynomial models were fitted sequentially, starting from first order and adding higher degree terms up to the fourth order. Model fits were assessed with their respective R 2 -value and an R 2 adjusted-value, a Shapiro-Wilk test of residuals and examination of the fitted surface, until overfitting was indicated on the response surface plane. Additionally, the non-parametric local regression (LOESS) was used to obtain predicted values for the four-dimensional response surface, using RHt, Tt and SDt, and all three and two-way interactions as the predictors. The extent of agreement was then compared between polynomial regressions and the LOESS regression to aid in choosing the degree of the polynomial regression, measured using the concordance correlation coefficient [64]. The lowest-degree polynomial that accomplished the required degree of approximation was subsequently adopted. The higher degree polynomial models offer increased flexibility in the response surface, but they need to be fitted with caution due to the potential to 'overfit' these models [61].
The fitted polynomial equation was then expressed in the form of three-dimensional (3D) surface plots, in order to visualize the interaction between the changes in thresholds (Table 2) and the response variable. The graphical representation provides a method to visualize the relationship between the response and experimental levels of each variable, and the types of interactions between the test variables.
Due to awareness of constraints of the limited data set, we relied on current knowledge of PLB disease epidemiology as a guide for interpretation of the results. Hence, a suite of model versions was selected based on the results of the sensitivity analysis, which were subjected to further examination based on the position and grouping of the cut-off points in the ROC space. Defining an optimal decision threshold is not a trivial task [56]. The high cost of false negatives (FN) associated with potential onset of PLB epidemics [14,65] predetermines that the decision threshold should lie closer to the upper right-hand corner of the ROC curve, in order to minimize the associated risk of the disease development [66].

Treatment Frequency and Dose Reduction
The crop risk prediction model is useful only if it provides the same level of protection as the standard practice, while reducing necessary costs and labor [32,34]. In this theoretical study, we do not account for the differences in the active ingredient or the type of the fungicide, but merely try to associate a reasonable estimation of possible reductions in the number of treatments or/and dose reduction with predictive power at predefined decision thresholds. After defining the 'optimum' sets of model thresholds, it was necessary to compare the number of treatments and the pesticide usage recommended by the model versions compared to standard growers' practice. This was done in order to determine if the recommended model parametrizations are economically and environmentally viable. Currently, spray intervals range from 5 to 7 days under Irish conditions, which are the intervals we accounted for in this study. We evaluated three model parametrizations-the IR with the default parameters (Section 2.2.3, The IR Model and Its Operational Use), and two improved parametrizations as identified in the subsequent analysis.
We assume that planting starts the day after the daily average soil temperature is greater than 8 • C for three consecutive days after the 1 April. This is a common practice in Ireland, in line with recommendations from the national advisory body, Teagasc. Farmers typically start fungicide treatments as soon as the emergence progresses over 50% and continues until the potatoes' above-ground potato haulm completely die off, typically three weeks after desiccation. Here, we assume that the growing season lasts 120 days. However, the pesticide protection continues during these three weeks, until the potato above-ground potato haulm is desiccated.
The difference between standard growers' practice and model versions is evaluated in two ways: 1.
Reduction in the number of treatments, split into: • Model guided: A fungicide treatment is applied every time the warning threshold is reached with a minimum period of 5 days prior to following treatment, and; • Model and calendar guided: A minimum of 5 and maximum of 10 days between treatments.
The sum of recommended treatments is calculated for all decision thresholds and seasons. The resulting summaries are presented visually as point graphs. A LOESS curve was fitted to estimate the minimum decision threshold where the protection according to the model is for fewer treatments then the usual 5 or 7-day practice.

2.
Dose reduction based on 7-day calendar treatment. Currently, Irish growers do not rely on the operational warnings issued by the Met Éireann, but do increase the dose or use stronger, often less environmentally friendly, formulations during those periods identified as at risk. Possible dose reductions are calculated for the usual 7-day calendar treatment. The dose reductions are based on the maximum risk calculated by the model during the 7-day period between treatments.
The maximum dose is applied if the risk is over 12 EBH, which is the current warning decision threshold in Ireland.
The full analysis can be reproduced using code and data archived at https://mladencucak.github. io/AnalysisPLBIreland/.

Evaluation Leaf Wetness Estimation
A Wilcoxon signed rank test showed that there was a significant difference (p < 0.001) between AUROC values for the models. Using the combined estimators RH ≥ 90% and rain > 0.1 mm as indicators for leaf wetness, were significantly higher than using only rain > 0.1 mm. The median AUROC for the method based on rain and RH thresholds was 0.735 compared to 0.695 for the method only using rain indicator (Figure 4).
AUROC for the method based on rain and RH thresholds was 0.735 compared to 0.695 for the method only using rain indicator (Figure 4).

Figure 4.
Group median difference between models with leaf wetness estimation using rain (> 0.1mm) or the combined rain and relative humidity (rain > 0.1mm and RH ≥ 90%) as an estimator.

Evaluation of Main Variable Thresholds
Scatterplots with LOESS smoothing and boxplots indicated a non-linear relationship with change in each factor level. The AUROC was found to increase when the thresholds for relative humidity and sporulation duration were reduced. Conversely, an increase in the temperature threshold resulted in an improvement in the predictive power of IR. For all variables, levels of predictor variables showing an increase in AUROC also show higher levels of dispersion, indicating the necessity to investigate the interactions.
A statistically significant cubic polynomial model (F3,323 = 105.9, p < 0.0001) was fitted to the AUROC data with the proportion of variance explained by the model of 0.8617 and 0.8535 for R 2 and adjusted R 2 values, respectively (Table 4). Diagnostic plots of residuals versus order of the data and histogram indicated no violation of the normality assumption. The forth order polynomial model showed only a slight increase in the R 2 and adjusted R 2 values, while the Shapiro-Wilks test indicated a lack of normality in the distribution of the residuals. Visual assessment of the response surface plotted with the 4th order model indicated a potential overfitting problem. Linear and quadratic fits had lower R 2 and adjusted R 2 values and were considered unsuitable. In addition, the predictions from the third-order polynomial model agreed the most with the local non parametric regression (concordance correlation coefficient [64] of 0.9896 (95% CI: 0.9876; 0.9913)), hence this model was deemed to adequately reproduce the behavior of the response surface.

Evaluation of Main Variable Thresholds
Scatterplots with LOESS smoothing and boxplots indicated a non-linear relationship with change in each factor level. The AUROC was found to increase when the thresholds for relative humidity and sporulation duration were reduced. Conversely, an increase in the temperature threshold resulted in an improvement in the predictive power of IR. For all variables, levels of predictor variables showing an increase in AUROC also show higher levels of dispersion, indicating the necessity to investigate the interactions.
A statistically significant cubic polynomial model (F 3,323 = 105.9, p < 0.0001) was fitted to the AUROC data with the proportion of variance explained by the model of 0.8617 and 0.8535 for R 2 and adjusted R 2 values, respectively (Table 4). Diagnostic plots of residuals versus order of the data and histogram indicated no violation of the normality assumption. The forth order polynomial model showed only a slight increase in the R 2 and adjusted R 2 values, while the Shapiro-Wilks test indicated a lack of normality in the distribution of the residuals. Visual assessment of the response surface plotted with the 4th order model indicated a potential overfitting problem. Linear and quadratic fits had lower R 2 and adjusted R 2 values and were considered unsuitable. In addition, the predictions from the third-order polynomial model agreed the most with the local non parametric regression (concordance correlation coefficient [64] of 0.9896 (95% CI: 0.9876; 0.9913)), hence this model was deemed to adequately reproduce the behavior of the response surface. The 3D response surface for the AUROC against any two independent variables while keeping the third independent variable at −3, 0 and +3 levels, respectively, is presented in Figure 5. In total, nine 3D response surfaces were obtained by considering all possible variable combinations.    5a-c depicts the interaction between RHt and Tt, keeping SDt at its −3, 0 and +3 levels. Figure 5a shows that AUROC increased with increasing Tt, up to 12 • C, and reduced RHt to 88% when SDt was set at 9 h. If SDt is kept at the threshold of 12 h, a decrease in the AUROC is evident (Figure 5b), while an increase in SDt to 15 h results in a significant reduction in model accuracy (Figure 5c).
It can be observed from Figure 5d-f, that the accuracy of the model increases with an increase in Tt and a reduction in SDt. The area of AUROC of above 0.85 is achieved with the reduction of sporulation period for 2 h and an increase of the temperature threshold of 2 • C. This effect on the AUROC is reduced below 0.85 with SDt at the default threshold (12 h); while increasing SDt results in a large reduction of the AUROC, to the level of an unacceptable prediction model. Figure 5g-i shows the interaction between RHt and SDt, keeping Tt at its −3, 0 and +3 levels. Increasing Tt positively influences the model accuracy. Over the range of the Tt factor levels, the area with the highest AUROC values is 0.79 by 0.83 by 0.87, associated with a temperature threshold reduced to 7 • C, the default threshold and the one increased to 13 • C.
Overall, results indicate that reducing RHt to 88% and SDt to 10 h and increasing Tt to 12 • C results in the largest improvements in the overall predictive performance of the model (Figure 5a,b,i). Variations in Tt do not have the same magnitude effect on the model accuracy, as the manipulations of RHt and SDt do. Figure 6a-c depicts the ROC curves for the individual, selected model variable thresholds for RHt (88%), SDt (10 h) and Tt (12 • C ), respectively. Adjusted RHt and SDt provides improvement in terms of model specificity, with the grouping of cut-off points moving upwards in the ROC plane and having no associated FN; overall accuracy displays some improvement. Overall, adjusting RHt (Figure 5b) resulted in the greatest improvement in the model accuracy, with sensitivity of 0.8 and high corresponding decision threshold scale of 3-9 EBH. Practically, this means that the risk accumulation of up to 9 EBH was necessary for the onset of the disease in eight (out of 10) years. Adjusting Tt only influenced the model performance with the sensitivity similar to the default model variable thresholds, having two FN predictions, indicating that the change in Tt had the least impact on the improvement in model performance (Figure 5c).  The performance of the IR model with default variable thresholds is presented in Figure 6d. ROCs for the existing IR variable thresholds revealed a lack of specificity, with no risk accumulation in two years, while the current operational blight warning threshold was reached in only four out of ten years.
A model with variable thresholds recommended by the analysis of the response surface (SDt = 10 hours, RHt = 88% and Tt = 12 °C ), hereafter referred to as the optimized model (Figure 6e), shows improved performance, with no FP. The disease outbreak was correctly indicated in all years of the study, although the sensitivity dropped significantly in six years (corresponding to decision The performance of the IR model with default variable thresholds is presented in Figure 6d. ROCs for the existing IR variable thresholds revealed a lack of specificity, with no risk accumulation in two years, while the current operational blight warning threshold was reached in only four out of ten years. A model with variable thresholds recommended by the analysis of the response surface (SDt = 10 h, RHt = 88% and Tt = 12 • C), hereafter referred to as the optimized model (Figure 6e), shows improved performance, with no FP. The disease outbreak was correctly indicated in all years of the study, although the sensitivity dropped significantly in six years (corresponding to decision thresholds higher than 5 EBH), indicating that the maximum acceptable decision threshold for this model variation is 4 EBH, corresponding to sensitivity of 0.9.
An additional model variation was chosen for further analysis, hereafter referred to as the low risk model, with optimized SDt (10 h) and RHt (88%); Tt kept at the original, default threshold of 10 • C (Figure 6f). This was guided by the limited impact of changing the temperature threshold on the specificity of the model (Figure 6c), limitations related to the size of the biological data set used in the evaluation, a lack of knowledge of the pathogen founder population and the risk associated with possible disease outbreak. The ROC curve for this model showed improvement in the sensitivity of the model, with eight years having up to 11 EBH accumulations. While a drop is evident in the AUROC value due to loss in specificity, the grouping of decision threshold points higher in the ROC plane allows consideration for another decision threshold as high as 11 EBH with a Sensitivity of 0.8.

Treatment Frequency and Dose Reduction
Assuming the usual calendar spray practice was followed during the period investigated, the number of treatments calculated for the seven and five-day calendar spray programs were 15 and 22, respectively. The decrease in the number of recommended treatments with the increasing decision threshold approximated with the LOESS curve is presented in Figure 7. All model versions provide a reduction in the number of treatments compared to the standard five-day calendar treatment. Assuming the usual calendar spray practice was followed during the period investigated, the number of treatments calculated for the seven and five-day calendar spray programs were 15 and 22, respectively. The decrease in the number of recommended treatments with the increasing decision threshold approximated with the LOESS curve is presented in Figure 7. All model versions provide a reduction in the number of treatments compared to the standard five-day calendar treatment.
The number of treatments according to the default parametrization of IR is lower than the calendar practice across the range of decision thresholds (7a,d). In the case of the optimized model, the LOESS curve does not intersect any of the growers practice lines, indicating that the average number of treatments recommended by the model is lower than any grower practice schedule across the range of decision thresholds (Figure 7b,d). Given that the optimum decision threshold should not be lower than 4 EBH, the optimized model still provides an opportunity for a reduction in the number of treatments in all but one year and as low as five per season when compared to the seven-day program.
The number of treatments advised by the low risk model when the decision threshold set to 3 EBH (lowest observed risk accumulation prior to the disease outbreak) is lower than the 7-day treatment interval on average. However, this is not the case in years such as 2012 or 2007, when the number of treatments with a decision threshold of 7 EBH is close to the seven-day treatment frequency. However, the possibility to set higher decision threshold provides more opportunities for reducing the number of treatments, in the range from 5 to 11 EBH, with an average ranging from 12 to as low as six for the five-day strategy, and 13 to 10 for the five to ten day strategy.  The number of treatments according to the default parametrization of IR is lower than the calendar practice across the range of decision thresholds (Figure 7a,d). In the case of the optimized model, the LOESS curve does not intersect any of the growers practice lines, indicating that the average number of treatments recommended by the model is lower than any grower practice schedule across the range of decision thresholds (Figure 7b,d). Given that the optimum decision threshold should not be lower than 4 EBH, the optimized model still provides an opportunity for a reduction in the number of treatments in all but one year and as low as five per season when compared to the seven-day program.
The number of treatments advised by the low risk model when the decision threshold set to 3 EBH (lowest observed risk accumulation prior to the disease outbreak) is lower than the 7-day treatment interval on average. However, this is not the case in years such as 2012 or 2007, when the number of treatments with a decision threshold of 7 EBH is close to the seven-day treatment frequency. However, the possibility to set higher decision threshold provides more opportunities for reducing the number of treatments, in the range from 5 to 11 EBH, with an average ranging from 12 to as low as six for the five-day strategy, and 13 to 10 for the five to ten day strategy.
The cumulative proportion of the total fungicide applied using model guided strategy and the number of treatments is compared to the 7-day calendar practice in Figure 8. All model versions provide reductions in both the total dose and the number of treatments applied. The reductions are lowest for the year 2012 which was one of the most severe 'blight years' on record [86]. Overall, the highest mean dose reduction is achieved by the default IR (0.248), followed by optimized IR (0.33); the lowest mean dose reduction is expectedly associated with the low risk IR (0.436). lowest for the year 2012 which was one of the most severe 'blight years' on record [86]. Overall, the highest mean dose reduction is achieved by the default IR (0.248), followed by optimized IR (0.33); the lowest mean dose reduction is expectedly associated with the low risk IR (0.436).

Discussion
We presented an evaluation of the operational algorithm for potato late blight risk forecasting in Ireland. To evaluate the selected algorithm, a sensitivity analysis of the threshold values associated with the most important variables were assessed using empirical ROC curves derived from 10 years

Discussion
We presented an evaluation of the operational algorithm for potato late blight risk forecasting in Ireland. To evaluate the selected algorithm, a sensitivity analysis of the threshold values associated with the most important variables were assessed using empirical ROC curves derived from 10 years of historical weather and disease observation data. Guided by the results of the sensitivity analysis, current epidemiological knowledge and PLB risk awareness, we identified two improved sets of model parameters and a range of operational thresholds. Finally, three disease control strategies, two based on these improved model thresholds and currently using model parametrization, are compared to standard growers' practice.
Crossovers between empirical and mechanistic models are a common approach in crop disease forecasting [42]; the IR model is one example. Mechanistic algorithms are a function-based estimation of conditions for the development and completion of several (or a single) segments of disease development, while in the IR, these segments are limited to a threshold-based prediction of their completion. The threshold selection is often based on estimates by the model developer and may not be an accurate representation of the complex nature of biological processes [87]. Such algorithms have their appeal in their simplicity, although biological processes, such as the developments of disease epidemics, do not have a binary state but are a part of a complex system that encompasses soft transitions between minimum, optimum and maximum states [88]. The semi mechanistic form of the IR adopted at the time for operational use in Ireland, required a number of simplifying assumptions. These favored more "conservative" variable thresholds, to reduce the frequency of warnings. Our results indicate that the previously defined default thresholds of the Irish Rules are no longer fit for risk prediction in the new PLB pathosystem, and are based on the available data.
This study is in agreement with older reports stating that blight epidemics in Ireland are not initiated before the second half of June [27] due to low night temperatures [89]. Average minimum daily temperature in Oak Park was low in April and May, 4.5 • C and 7.2 • C respectively, providing a potential explanation for the low pathogen activity during this period. Lower temperatures in the early stages of potato development can provide a certain level of protection until the plants reach a level of maturity where they are more resistant to attack [90]. This has been challenged in recent times due to the rise in aggressiveness of newer pathogen strains active over a wider range of environmental controls [12,28,86]. The Irish Rules model uses a hypothetical lower temperature threshold of 10°C without an upper boundary, consistent with a number of early prediction models employed in Northern Europe [37,40,[91][92][93]. Our results indicate that the development of P. infestans under typical Irish weather conditions is low if the temperature is less than 12°C. However, considering a relatively small gain in overall model accuracy, a more comprehensive evaluation would be necessary prior to recommending increasing the current temperature threshold. Previous research from areas with a diverse pathogen population cautions that blight epidemics will progress even if temperatures are lower than 10°C, under extended humid periods, although the rate of this progress is low [94,95]. Additional years of data and knowledge of the founder population would be required to ensure that this is a robust conclusion, suitable for deployment on an operational basis.
Evidence exists for reducing the relative humidity threshold and duration of initial sporulation period. The diagnostic performance of the optimized model versions with these factors provides a 'safer sleep' for the farmer. Our results are in agreement with the report from Fennoscandia rejecting a relative humidity threshold of 90% as a development threshold [96]. This threshold has been adjusted in a number of models used throughout Europe; i.e, the French model, Milsol, uses a threshold of 86% (Gaucher, personal communication) and the Danish Blight management uses a threshold of 88% (Hansen, personal communication). There are a number of reasons to opt for lower risk when deciding on which reported relative humidity threshold should be considered blight favorable, such as accuracy of measurements, distance between weather data source and the production area, topography of the area, physiological and phenological differences in crop haulm density and shaded areas of the production fields [49,97,98].
Leaf wetness estimation is one of the key parameters in agricultural meteorology controlling pathogen infection and determining disease development rates [99,100]. In agricultural field conditions, leaf wetness may result from rain, fog, irrigation or distillation from the soil [101]; our results indicate that a simple use of a precipitation threshold is not an appropriate estimator of leaf wetness in this context and should be supplemented with an additional estimator based on a simple empirical model for RH. Due to lack of in field measurements, we used a 'reverse' approach to test the validity of proposed estimation method by comparing the leaf wetness estimation to the disease occurrence [102]. This estimation method has been successfully employed in a number of DSS worldwide [99,103].
A low risk of three and four EBH was predicted by both the optimized and low risk models, prior to the disease onset during two of 10 years studied, 2011 and 2014. Possible reasons for this are the proximity and strength of the inoculum source or the aggressiveness of the pathogen lineage initiating the epidemics. The specific P. infestans lineages that initiated the epidemics in our data is not known, but we can hypothesize that these infections were initiated by the more aggressive strains. Additionally, epidemics in both years were initiated later in the season, on July 28th and August 1st, possibly coinciding with a shift in the structure of pathogen population, increasing the probability that the infections were initiated by a more aggressive strain. Limited findings from our monitoring of the founder population at Oak Park, from 2016 to 2018, show that the epidemics are predominantly initiated by the older clonal EU_8_A1 genotype, while the population structure changes in favor of new genotypes EU_6_A1 and EU_13_A2 over the course of the season. This is in agreement with recent experimental evidence regarding the establishment of the new P. infestans genotypes under Irish conditions [20][21][22] exhibiting an increase in aggressiveness [12][13][14][15]17]. Hence, we can recommend 4 EBH as the minimum decision threshold to be considered under conditions of high disease pressure or if the outbreak of aggressive strain of the pathogen is reported.
The optimized model offers significant potential to increase the model specificity and consequentially, reduction in the number of required treatments, compared to the low risk model in the high sensitivity range of the ROC curve, between 0.9 and one. The difference between the optimized and low risk model, calling for caution, is the grouping of the cut-off points corresponding to the decision thresholds above 5 EBH. A number of decision thresholds for the low risk model are closer to the higher sensitivity area (5-11 EBH at 0.8 sensitivity) compared to the optimized model (all cut-off points higher than 5 EBH correspond to 0.6 sensitivity). Thus, determining a higher decision threshold, which requires less treatments, is possible for the low risk model at 0.8 sensitivity, although defining an exact threshold is difficult since values from 5 to 11 EBH correspond to the same sensitivity value. In the case of small sample sizes, the crude empirical estimate has the disadvantage of providing the same sensitivity values for different specificity values. The robust methodology and highly reproducible coding example allow for the regular updating and evaluation of the model, leading to clearer definitions of the risk and/or benefit associated with each decision threshold as the new data becomes available.
We have shown that on average, the use of risk prediction models offers a possibility for reducing fungicide inputs compared to standard Irish growers' practice. Possible reductions in the dose and the number of treatments exhibit variation across the period studied. This reflects the nature of agricultural production and further empowers the need for IPM approach to defining the treatment intervals. While spray intervals should be longer than seven days, most of the time these intervals could be justifiably reduced during parts of the 'blight year.' Currently, operational decision thresholds for issuing blight warnings are not clearly defined and based on experience. Here we provide an estimation of risk associated with decision thresholds in the higher sensitivity range. The accumulation of EBH needed to issue the warning at sensitivity levels of 0.9 and one is the same for both the optimized and low risk models, with the optimized model providing greater opportunities for reducing the number of fungicide treatments and/or the dose. However, an important advantage of the low risk model is related to the sensitivity, 0.8, providing more certainty in model outputs if the warning is considered at a higher decision threshold, from 5 to 11 EBH. Such situations may be considered when other factors necessary for the disease development are estimated lower, such as earlier part of season, more resistant varieties or low number of reported disease outbreaks in region. The adoption of decision support systems and utilization into everyday practice could have numerous benefits for growers, such as optimization, as well as, justification of fungicide inputs [54]. Our findings indicate that the original Irish Rules model parameters need to be altered for two model variables, which inevitably will result in an increase in the frequency of warnings. Optimization of the control program does not necessarily mean reduction in the number of treatments, and an effective forecasting scheme could advise at least as many fungicide treatments as the standard growers practice during seasons with blight favorable conditions [104], which often occur with typical weather conditions experienced in Ireland. The decision on the level of risk acceptable by a grower is a complex one, made according to price of treatment, value of production, legislative restrictions [96] and the need for reduction to prevent the development of fungicide resistance [33]. Hence, here we do not make a recommendation for the exact decision threshold but elaborate on possible reductions and varying levels of risk deemed acceptable by a producer. Met Éireann issues regional warnings and these warnings, and the quality of these warnings, could be improved with information regarding the disease outbreaks and rapid identification of the pathogen lineage, due to reasons outlined above. Decision support at the synoptic level is not a silver bullet to provide an ultimate solution for optimal environmentally friendly disease control, but merely another tool to get closer to it. Unfortunately, if it is not utilized as such, and in an inappropriate manner it can lead to an opposite effect. Plant disease models are often parochial in nature, evaluated by researchers who developed them, and are often used without calibration when employed in agroecosystems different from those they were developed for [31,105]. The interdisciplinary nature of the work related to decision support in crop protection, requiring skills and knowledge in informatics, mathematics, meteorology, agronomy and biology are often a limiting factor for the sustainable development of this branch of plant disease epidemiology [49]. One possible way to overcome some of the obstacles is acceptance of open and reproducible methods. The importance and need for open-science in the field of phytopathology has been reported as a way include recruitment of experts from different fields, the application of cutting-edge methods and timely replication of data analyses to increase the robustness of the findings [106]. Some of the relevant examples are coming from other fields of research, related to potato late blight. The development of our understanding and knowledge of P. infestans population diversity has been empowered with POPPR, a widely used R package for enabling easier genetic analysis of clonal populations [107]. Moreover, Sparks et. al. [108] evaluated the possible implications of the climate change on potato late blight in the future. These do not have only a scientific value, but represent a significant contribution to the education of a new generation of phytopathologists, who will need to be equipped with such knowledge and skillsets to be able to keep up with the 'fight' against ever-evolving plant pathogens.
Easily accessible tools are necessary for validation and calibration of risk models using historical data prior to field evaluation in other climatic regions or re-evaluation in the in the original ecosystem, which could potentially save a considerable amount of time and money and lead to more sustainable use of decision support in plant protection. To the best of our knowledge, this is the first completely reproducible evaluation of a crop disease risk prediction model, implemented in a single computing environment, within a freely accessible software language. Such work, it is hoped, will empower the sustainable development of potato late blight and crop disease forecasting in general.

Conclusions
The results have shown the there is a need to revisit the parameters of the Irish Rules model, proposed for the different ecosystem and operational abilities at the time, and the operational use of the model. On the basis of the work presented here, we recommend the reduction of variable thresholds for relative humidity from 90% to 88% and sporulation duration from 12 to 10 h; and adopting an additional leaf wetness indicator, incorporating both precipitation (≥0.1 mm) and relative humidity (≥90%). Our analysis indicated that very little blight development was occurring at temperatures lower than 12 • C; however, we do not recommend this increase operationally due to the lack of certainty associated with the small data sample size and the high risk related to possible disease outbreaks and wider decision threshold range in the high sensitivity area of the ROC space for the low risk model. However, the thresholds identified here should be continuously evaluated after each growing season, facilitated here by the development of the methodology and associated model evaluation code. Our recommendation for the operational application of the model is to use the range of 4 to 11 EBH and set the threshold dynamically during the season based on the reports of the frequency of the disease outbreaks in the region of interest. Future development of the Irish PLB warning system should include rapid in-season identification of pathogen genotype distribution to be used as a guide for selection of the decision threshold.
Representation of the complex aetiology of P. infestans is omitted or generalized with synoptic empirical prediction algorithms, and other components of this pathosystem, such as pesticide protection status, crop resistance [7], quality of meteorological network coverage and distances between production field and weather stations [54], crop phenological stage [6] and pathogen genotype [11,28]. Understanding complexities of the agroecological system under investigation is crucial for interpreting results of the analysis we have implemented. Small data sets may carry high variability due to a limited number of observations [57]. Hence, we add a note of caution when employing the model proposed here.
The exact methodology used in the development of early models, such as the IR, is not always clear, but the assumption is that they were a product of empirical, often trial and error based methodologies and weather data available at the time (Yuen and Mila, 2015). Hence, the recommendation for future development is to explore the possibility of redesigning currently employed models to facilitate the transition from the threshold based binary estimation of stages of host parasite interaction, to a more realistic one, based on a functional relationship between host, parasite and the environment. Future work on development of risk prediction algorithms, should also take into consideration additional uncertainty introduced by forecasted weather data, avoiding the usual practice in crop disease modelling where models are developed with observed weather data and applied on forecasted weather with no evaluation of the impact of weather-forecast uncertainty on model predictions [49]. Approaches in IPM cannot be limited to a single discipline's efforts. The vast amount of data available nowadays that are currently under-utilized provides a number of opportunities for smarter farming [109] The challenge still remains in front of the end user to adequately employ information provided in the decision-making process with an awareness or knowledge of characteristics of variety grown, growth stage, control measures used, risk from surrounding areas, accessibility of active ingredients, etc. The often hard-earned confidence by the final user could be maintained through constant evaluation of the system and adequate education regarding the appropriate use of decision support tools.