Modeling Regulatory Threshold Levels for Pesticides in Surface Waters from E ﬀ ect Databases

: Regulatory threshold levels (RTL) represent robust benchmarks for assessing risks of pesticides, e.g., in surface waters. However, comprehensive scientiﬁc risk evaluations comparing RTL to measured environmental concentrations (MEC) of pesticides in surface waters were yet restricted to a low number of pesticides, as RTL are only available after extensive review of regulatory documents. Thus, the aim of the present study was to model RTL equivalents (RTLe) for aquatic organisms from publicly accessible ecotoxicological e ﬀ ect databases. We developed a model that applies validity criteria in accordance with o ﬃ cial US EPA review guidelines and validated the model against a set of manually retrieved RTL (n = 49). Model application yielded 1283 RTLe (n = 676 for pesticides, plus 607 additional RTLe for other use types). In a case study, the usability of RTLe was demonstrated for a set of 27 insecticides by comparing RTLe and RTL exceedance rates for 3001 MEC from US surface waters. The provided dataset enables thorough risk assessments of surface water exposure data for a comprehensive number of substances. Especially regions without established pesticide regulations may beneﬁt from this dataset by using it as a baseline information for pesticide risk assessment and for the identiﬁcation of priority substances or potential high-risk regions.Dataset License: CC-BY-SA Dataset: The dataset can be found in Supplemental Material, add link to RTLe_V01.xslx. Dataset License: CC-BY-SA


Summary
In modern agriculture, pesticides are frequently applied to minimize crop losses due to pest organisms, fungal diseases, or the growth of weeds (see Oerke [1], for review). To achieve these agronomic benefits (see Popp et al. [2], for review), pesticides as biologically active substances are regularly and globally released into the environment to treat large areas of croplands [3]. Thereby, pesticides are transported to non-target freshwater systems [4][5][6]. Hence, it is important to regulate pesticides and to assess the likelihood of unacceptable ecological effects in aquatic ecosystems. For instance, elaborated environmental risk assessments (ERA) are conducted for pesticides in the US, prior to granting permission of use and, at regular intervals thereafter, for reregistration (FIFRA 40 CFR [7], Part 152; [8]). For risk characterization, estimated environmental concentrations are compared to toxicity endpoints from peer-reviewed and non-peer-reviewed laboratory effect studies. Thereby, a pesticide is characterized using different ecologically relevant surrogate species (FIFRA 40 CFR [7], Part based on two publicly accessible databases maintained by the US EPA: The US EPA ECOTOXicology knowledgebase (ECOTOX, https://cfpub.epa.gov/ecotox/index.cfm) [31] and the Office of Pesticide Programs Pesticide Ecotoxicity Database (OPP, https://ecotox.ipmcenters.org) [32]. Both databases contain a large number of effect studies (ECOTOX:~940,000 entries; OPP:~32,000 entries) and are relevant for the US ERA (also see Section 3.1). However, the quality of the studies listed in the databases varies substantially, so that not all studies reliably describe the toxicity of pesticides. To exclude studies that are unlikely to be used in ERA due to low reliability, filter criteria were formulated based on the provided information on test characteristics (see Tables A1 and A2) and in accordance with official validity criteria [10][11][12][13][14]33]. One rather theoretical approach to systematically filter large aquatic toxicity databases was, for instance, published by Beasley et al. (i.e., SIFT method) [34], and was recently applied to a large set of collected aquatic toxicity data with a focus on industrial chemical risk assessment [35]. In the present study, we developed a similar approach focusing on pesticide risk assessment by translating official validity criteria from standardized test protocols [10][11][12][13]33] and the US EPA guideline for the review of open literature studies [14] into an SQL code. Criteria considered from the test protocols included, e.g., study durations, measured effect type, and endpoint type for the respective organism group. From the EPA guideline, e.g., proper control, chemical analysis, and physical units have been extracted. The SQL code was used to query local versions of the ECOTOX and OPP databases. After filter application, the remaining studies were used for threshold estimations (i.e., RTLe) based on the most sensitive, available endpoint, as done during manual RTL derivation [16]. To set up the model, different combinations of filter criteria were tested and evaluated in terms of the precision of estimates (basic, mid, and full model, Tables A1 and A2, Figures A1-A3). To evaluate the precision of the modeled RTLe, the estimated threshold levels were compared to 94 manually retrieved RTL (i.e., the calibration dataset). The model was validated by comparing the RTLe/RTL ratios of the calibration data (n = 94, see Table S1) to the RTLe/RTL ratios of an independent set of pesticides (i.e., validation dataset, n = 49, see Table S1). Furthermore, we demonstrate in a case study using 27 insecticide compounds, how RTLe can be used in actual risk evaluations. The RTLe model resulted in 1283 RTLe, 676 RTLe of pesticides (i.e., fungicides, herbicides, insecticides) plus 607 of other chemicals (see 2.1, Table S2) that might also be of interest for environmental evaluations. This list will be regularly updated to reflect potential changes in the underlying effect databases.
With the present study, we have developed, validated, and applied a dynamic model that estimates a large number of RTLe for freshwater systems based on reproducible validity criteria. With the compiled list of RTLe, comprehensive evaluations of pesticide monitoring data and associated risks in surface waters are enabled for the first time for a large number of pesticides, as well as further chemicals with potential environmental impacts.

The Dataset
The dataset contains RTLe for freshwater organisms for 1283 chemicals in ug/L (Table S2). A classification into use types applying the MAGIC graph [36] revealed that 676 of these substances are used as pesticides (herbicides, fungicides, or insecticides). Further, 260 substances were classified as other chemicals (e.g., microbiocides, solvents, plant growth regulators, repellents) and 347 substances remained unclassified. In addition, the CAS number, RTLe, source database, and use type classification are provided (Table 1). In addition to the publication of a Microsoft Excel ® spreadsheet containing the compiled data, by which the most recent version of the dataset becomes accessible for the public (https://static.magic.eco/rtle2), all RTLe will be added to the MAGIC graph [36] and are available through the chemical search interface (https://static.magic.eco/rtle1), and updated there regularly.

Data Quality
The RTLe from the calibration dataset (n = 94, Table S1) were validated against an independent sample of RTL (n = 49, Table S1) to test the model's performance. Model transferability was confirmed by non-parametric hypothesis testing (Wilcoxon rank sum test: ECOTOX database: W = 2148.5, p = 0.628; OPP database: W = 1527.5, p = 0.474), which showed no significant difference in predictive accuracy between calibration and validation data ( Figure 1, Figure A5). The proportion of RTLe that correctly estimated the RTL after combining the queries of both databases ( Figure 1) was 48.9%. Further, 95.6% of RTLe were within ± 1 orders of magnitude, while 87.6% lay within ± 0.5 orders of magnitude (i.e., a factor of 3.2). Thus, for the vast majority of data, minimal deviance from official RTL can be assumed, underlining the robustness of the presented model. Estimates for only few substances (n = 5, i.e., tebufenozide, fenthion, heptachlor, methyl-parathion, sulfometuron-methyl) underestimated the RTL by more than 0.5 orders of magnitude. Whenever the RTL was underestimated, this means that a more sensitive endpoint was listed in the databases for which the applied exclusion criteria did not apply. This might be due to expert judgements, as these remaining endpoints after filter application were probably omitted during regulatory risk characterizations by risk assessors for reasons currently not reflected by the model (as not all relevant test characteristics are consistently encoded in the databases). As this was the case for only a few substances ( Figure 1, RTLe/RTL < 1), the overall probability to underestimate the RTL remains low.
Albeit model estimates were on average significantly higher (based on 95% CIs) than the original RTL (bootstrapped mean with 95% CI: 1.21 (1.04-1.42 95% CI; median = 1)), RTLe would rather underestimate the risk if used in risk evaluations, resulting in a low likelihood of false-positive risk indications. Since only 88 (ECOTOX database) and 78 (OPP database) of the relevant, most sensitive endpoints for the 143 manually retrieved RTL were included in the respective databases (i.e., exact match between the endpoints used for RTL and RTLe compilations), correct estimates were inherently not achievable for all pesticides. However, for some pesticides, deviations were only marginal (deviation < ± 0.1 orders of magnitude, excluding 0, n = 16 (ECOTOX database) and n = 26 (OPP database). In some cases, the underlying endpoints might still originate from the same study (identifiable by the reported study ID and same test organism), e.g., in case of tribufos, methiocarb, and formetanate HCl. For instance, the minor deviations could result from the rounding of endpoints if different numbers of digits behind the comma were reported in the document and the database (as likely for formetanate HCl; 0.09 parts per million (ppm) in the document [37], and 0.0868 parts per million in the OPP database). Larger deviations could, however, be based on recalculations of endpoints by EPA staff (e.g., to account for the percentage of active ingredient if original studies were conducted with formulations). Additionally, it became apparent that endpoints were more likely to not be included in the respective database the newer the regulatory document was (see Figures A6  and A7). Hence, there appears to be a time-lag between the update of the databases and the publication of regulatory documents.  A ratio < 1, shows an underestimation of the RTL by the RTLe, a ratio > 1 an overestimation. Triangles indicate whether RTLe were based on data provided in the Office of Pesticide Programs Pesticide Ecotoxicity Database (OPP); circles refer to the US EPA ECOTOXicology knowledgebase (ECOTOX). Not all relevant endpoints that were listed in the regulatory documents were also listed in the respective databases. Pesticides, with included endpoints are displayed in blue, pesticides without the listed, most sensitive, relevant endpoint in the respective database are displayed in red.
If only pesticides with a relevant, most sensitive endpoint that is also included in the database were considered, the precision of estimates improved significantly. A strong association between the two variables was indicated as the bootstrapped mean ratio approached 1 (mean ratio RTLe/RTL of 0.94 (0.84-1.05 95% CI; median = 1)) and minimal deviations for the majority of pesticides were observed ( Figure 1). Additionally, RTLe for pesticides whose relevant, most sensitive endpoint for RTL compilation was not included in the databases tend to overestimate the RTL (i.e., RTLe/RTL > 1) with a bootstrapped mean ratio with 95 CI of 1.75 (1.27-2.48 95% CI; median = 1.16). These additional analyses demonstrate a good performance of the model particularly for the pesticides with the included relevant endpoint for RTL compilation, indicated by a high precision of estimates. For relevant, most sensitive endpoints from the regulatory document, that were used for RTL compilation but were not included in the respective databases, even though it was inherently not possible to obtain correct estimates, RTLe rather overestimated the RTL. Hence, the use of these RTLe would not result in false positive risk estimations. Thus, overall, the model precision is high with a tendency to slightly overestimate the RTL. Consequently, based on the uncertainty associated with these modeled thresholds, ecological effects in surface waters could also occur at concentrations below the respective RTLe, and are very likely if RTLe are exceeded (cf. Stehle and Schulz [16]).

Case Study
The model was applied for 27 insecticides from an extensive dataset of quantified (i.e., ≥ limit of quantification) insecticide surface water concentrations (i.e., MEC, n = 3001, 1962-2017) in the United States described in detail in Wolfram et al. [15]. The RTLe for the insecticides used in this case study are, in terms of model precision, a representative sample of all RTLe, for which the model was calibrated and validated (Wilcoxon rank sum test: W = 1429, p = 0.749). Risk evaluation (i.e., the calculation of exceedance rates) of MEC using RTLe showed only marginal differences of 3.8% (95% CI 2. 2-5.4) if compared to the evaluation of MEC using RTL. In detail, bootstrapped median exceedance rates lay at 55.8% (95% CI 51.8-59.8) and 52.0% (95% CI 48.0-56.0) if insecticide MEC were compared to RTL and RTLe, respectively ( Figure 2). No statistically significant difference (based on 95% CIs) between the exceedance rates were observed if MEC were compared to RTL or RTLe. This underlines the model's ability to accurately identify environmental risks. However, using RTLe, the risk was slightly underestimated, such that using the modeled thresholds resulted in a less conservative risk evaluation (blue line, Figure 2). In conclusion, we found it being representative of the entirety of RTLe and providing comparable risk evaluation based on actual field measurements.

Case Study
The model was applied for 27 insecticides from an extensive dataset of quantified (i.e., ≥ limit of quantification) insecticide surface water concentrations (i.e., MEC, n = 3001, 1962-2017) in the United States described in detail in Wolfram et al. [15]. The RTLe for the insecticides used in this case study are, in terms of model precision, a representative sample of all RTLe, for which the model was calibrated and validated (Wilcoxon rank sum test: W = 1429, p = 0.749). Risk evaluation (i.e., the calculation of exceedance rates) of MEC using RTLe showed only marginal differences of 3.8% (95% CI 2. 2-5.4) if compared to the evaluation of MEC using RTL. In detail, bootstrapped median exceedance rates lay at 55.8% (95% CI 51.8-59.8) and 52.0% (95% CI 48.0-56.0) if insecticide MEC were compared to RTL and RTLe, respectively ( Figure 2). No statistically significant difference (based on 95% CIs) between the exceedance rates were observed if MEC were compared to RTL or RTLe. This underlines the model's ability to accurately identify environmental risks. However, using RTLe, the risk was slightly underestimated, such that using the modeled thresholds resulted in a less conservative risk evaluation (blue line, Figure 2). In conclusion, we found it being representative of the entirety of RTLe and providing comparable risk evaluation based on actual field measurements.

The Databases
The present study was based on two publicly accessible effect databases maintained by the US EPA (i.e., ECOTOX database and OPP database, see Figure A1), as introduced in Section 1. The ECOTOX database contains predominantly peer-reviewed, open literature studies of ecotoxicological

The Databases
The present study was based on two publicly accessible effect databases maintained by the US EPA (i.e., ECOTOX database and OPP database, see Figure A1), as introduced in Section 1. The ECOTOX database contains predominantly peer-reviewed, open literature studies of ecotoxicological effects on a variety of test organisms for individual substances. The OPP database contains data from peer-reviewed and non-peer-reviewed studies that were previously reviewed by the US EPA for risk characterizations within the process of pesticide regulations, also for a variety of test organisms. The OPP database is more aggregated than the ECOTOX database (i.e., contains less information for individual studies). Therefore, the possibility to formulate filter criteria (Tables A1 and A2) to select only valid endpoints was limited to the provided information. However, since EPA staff already rated the reliability of these studies as invalid, supplemental and core, relevant endpoints can be selected by excluding invalid studies, as only supplemental and core studies are used for ERA by the US EPA [9]. Both databases were downloaded and imported into a relational database management system (PostgreSQL, pgAdmin 4.1; latest database updates: ECOTOX: 06/2019; OPP: 03/2017). Data was analyzed using R [38], and the following packages [39,40].
The test medium was used as a criterion to distinguish freshwater organisms from saltwater/estuarine organisms to enable the estimation of RTLe for freshwater systems. As the OPP database does not provide information on the test medium, unlike the ECOTOX database (e.g., freshwater, saltwater), the typical, predominantly used test media were retrieved from the ECOTOX database by joining species and media tables by species name. All test media for species that were not listed in the ECOTOX database but occurred in the OPP database (n = 61), were inferred from the predominant habitat specified, as specified in additional sources (e.g., WoRMs database (http://www.marinespecies.org), Algaebase database (https://www.algaebase.org), Stephen et al. [41][42][43]).

Collection of RTL
A list of 143 RTL (updated by October 2018) used for model validation and calibration was compiled by reviewing the respective, most recent regulatory documents provided by the US EPA from different sources [26][27][28][29]. Active substances were either searched by CAS number or name to unambiguously identify each test substance. Pesticides for the present studies were chosen based on previous work [15,16], use data [44] and the availability of regulatory documents on the searched sources covering different important use types (i.e., fungicides, herbicides, and insecticides). Once the most recent documents were identified and downloaded, RTL were computed similar to the methods described in Stehle and Schulz [16] as follows: First, the valid ecological effect data for freshwater organisms, that was used for risk quotient (RQ) calculations within the respective document, was identified. As the model was supposed to be calibrated for active ingredients, only studies performed with the active ingredient of a pesticide were selected. Further, only definite endpoints (without the qualifiers ">" and "<") were considered to be relevant. If no information on RQ calculations were provided but sufficient ecological effect data was listed, the relevant endpoint was chosen based on the US EPA's classification of endpoints as supplemental and core (excluding invalid studies). This follows the procedure of endpoint selection by the US EPA during the screening of available data for ERA [9]. As a final verification of the manually selected endpoint, written justifications of endpoint selections, if available, were checked to confirm the identified species and effect endpoint. Finally, as the most sensitive species drives regulatory decisions [9], the lowest, valid endpoint was selected for RTL computation. These RTL were calculated by multiplying the selected, most sensitive, relevant endpoint by the respective level of concern (LOC; 0.5 for acute aquatic animal studies, 1 for aquatic plant studies) [9]. The final dataset of RTL was divided randomly into a calibration dataset (n = 94) and a validation dataset (n = 49) to set up the model.

Model Building and Calibration
To set up the model, a basic, mid, and full model were tested differing in the strictness of applied filter criteria for each database separately. The basic model was set up containing elementary filter criteria to define a basic dataset (basic model, see Tables A1 and A2, and Figure 3). Then, based on the most sensitive, remaining endpoint per substance, RTLe were calculated for the substances of the calibration dataset (see Section 1. Summary). The application of the basic models would result in a large underestimation of the RTL (i.e., overestimating risks), as for 90% and 66% of the substances for the ECOTOX and OPP database, respectively, the RTLe would be estimated too low if compared to the RTL. At the same time, the proportions of correct estimates (2% and 21%) and overestimates (8% and 13%) were rather low, such that this query would not result in accurate RTL estimates. To improve model precision, additional filter criteria (Tables A1 and A2) were formulated based on quality standards [10][11][12][13][14]33], as introduced in Section 1, to identify and exclude data with a low reliability (i.e., which are likely to be rejected during regulatory risk assessments). A mid model containing essential criteria (mid model, n = 4, for the ECOTOX and OPP database, respectively; see Tables A1 and A2) was set up and RTLe were calculated and compared to RTL to evaluate the precision of estimates. Applying the mid model, model precision improved, as 28% and 42% of the RTLe were estimated correctly, 54% and 34% of RTLe underestimated their RTL and 17% and 24% of RTLe exceeded their RTL, for the ECOTOX and OPP database, respectively. For further refinements, additional criteria were formulated to build a full model, which contains a total of 17 and 8 consecutive filter criteria for the ECOTOX and the OPP database, respectively (full model, Tables A1 and A2). Here, the model performance could again be improved, as 43% and 47% of the RTLe were estimated correctly, only 25% and 24% of RTLe underestimated their RTL, and 32% and 29% of RTLe exceeded their RTL, for the ECOTOX and OPP database, respectively (see Section 2.2 Data quality and Section 2.3 Case study for further details on the precision of estimates). Thus, calibration aimed at increasing the proportion of correct estimates (RTLe/RTL = 1) while decreasing the proportion of substances for which the RTLe would underestimate the true RTL (RTLe/RTL < 1, see Figure 3). Consequently, the comparison of RTLe to MEC would be unlikely to result in a false positive overestimation of the risk.  Figure 3). Consequently, the comparison of RTLe to MEC would be unlikely to result in a false positive overestimation of the risk.  As a consequence of filter refinements, the application of each additional filter step resulted in a minimal loss of endpoints (see Figures A2-A4) of lower reliability. In this process, for some substances, all data was lost, such that no final RTLe could be estimated (calibration dataset: n = 94 pesticides; null model estimates: n = 93 and 91 pesticides; full model estimates: n = 91 and 87 pesticides, respectively, for the ECOTOX and OPP database). This was a trade-off, where improving the model precision was preferred over maximizing the generation of RTLe.

Model Validation and Join of ECOTOX and OPP Query Results
The model was validated on two levels: (i) The measure of precision (log10 (RTLe/RTL)) for calibration and validation data was compared by hypothesis testing to test whether the precision of estimates was statistically different when the model was applied to an independent set of pesticides. In case of non-normality and heteroscedasticity, non-parametric testing was conducted (i.e., Wilcoxon Rank Sum test, see Section 2.2 for results). Then, (ii) it was tested whether external factors (i.e., number of available endpoints prior to filter application, publication year of the most recent regulatory document, use type of the pesticide) had an influence on the model precision by either hypothesis testing or testing for correlations. The number of available endpoints per substance had no influence on the precision of estimates (Spearman's rank correlation rho: ECOTOX database: S = 450,422, p = 0.389, rho = −0.074; OPP database: S = 308,287, p = 0.558, rho = 0.052). Further, the test of a potential association between the publication year of the regulatory document and the measure of model precision revealed that RTL from newer studies were more likely to be overestimated (significant for ECOTOX database estimates: S = 316,508, p = 0.004, rho = 0.245; not significant for OPP database estimates: S = 300,311, p = 0.391, rho = 0.077). Further, the RTL based on endpoints reported in newer publications were less likely to be included in the respective database (see Figures A6 and A7). Thus, there appears to be a delay in the update of the databases. Finally, the test for differences between the estimates for different use types (i.e., herbicides, fungicides, and insecticides) revealed no significant differences (Kruskal-Wallis rank sum test: Kruskal-Wallis chi-squared: 3.64, df = 2, p = 0.16). This was confirmed by a post-hoc pairwise Wilcoxon rank sum test. Thus, it was concluded, that the precision of the model does neither rely on the number of endpoints available for the respective pesticide, nor on the use type. However, dependencies became apparent for the publication year and the presence of the most sensitive, relevant endpoint used for RTL compilation in the respective database. That means that RTL were more likely to be estimated too high for newer publications and non-included endpoints (see Figures A6 and A7).
After the model was successfully validated, the final dataset of RTLe (Table S2) was compiled by pooling the calibration and validation data and merging the estimates from the different databases: The ECOTOX data was joined to the OPP data, whereby all RTLe from the OPP database were kept. Here, OPP estimates were preferred over ECOTOX estimates, as the classification into core, supplemental, and invalid studies allowed to exclude invalid studies with a higher degree of confidence than in the ECOTOX database. Thus, the OPP dataset was complemented with ECOTOX data, for which no OPP-RTLe was available.

Bootstrapped Model Precision
Model precision was estimated by bootstrapping mean ratios (RTLe/RTL, n = 1000), where a mean close to 1 would indicate a high model precision. Vice versa, the larger the departure from 1, the higher the imprecision. While a mean ratio < 1 indicates that estimates would rather be too low and the model would result in too sensitive thresholds, a mean ratio > 1 would indicate conservative threshold estimates. Ratios were sampled from the final dataset (combination of OPP and ECOTOX estimates) with replacements of the same size as the original dataset and with 1000 repetitions (Monte Carlo resampling) following methodologies as outlined in [45,46]. Then, from the bootstrapped samples of ratios, the mean ratio, median ratio, and 95% confidence limits as 0.025 and 0.975 centiles (percentile method, e.g., [45]) were computed.
Further, it was tested whether the endpoints leading to the RTL for all substances were included in the respective databases such that correct estimates would be theoretically possible. By again bootstrapping mean ratios of RTLe to RTL, it was tested whether model precision improved if only pesticides were considered, whose relevant endpoint actually occurred in the respective database.

Model Application
For the final model queries, the seventeenth and eighth filter criteria for the ECOTOX and OPP databases, respectively, were removed as it was no longer necessary to restrict the results to the years of publication. These filter criteria were only used for calibration and validation of the model, as potentially occurring endpoints introduced into the databases after the publication of the regulatory documents could not yet be included to regulatory risk characterizations and would thus bias the model precision.
The final list of RTLe will be updated regularly and included to the magic graph (see https://static.magic.eco/rtle1 and https://static.magic.eco/rtle2), as the ECOTOX database is updated four times a year. Simultaneously, the OPP is checked for updates, whenever the ECOTOX database is updated such that the dataset, on which the model and thereby the RTLe are based, cover the latest studies added to the effect characterizations.
Here, the risk assessment procedure was applied to all substances listed in the ECOTOX database and OPP database, as both databases are relevant to the risk assessment process of pesticides. However, beside the 676 RTLe for pesticides, additional RTLe for chemicals of other use types are returned by the model. These might be of special interest to complement scientific risk evaluations, as the model enables the direct comparison of pesticides to other use types based on uniformly derived thresholds.

Case Study
To evaluate the likelihood to over-or underestimate the risk if exceedance rates are calculated by comparing MEC to RTLe instead of RTL, we performed a case study based on MEC ≥ quantification limits as reported by Wolfram et al. [15] (n = 3001 MEC for 27 insecticides). First, risk distributions [16] obtained by using RTL and RTLe, as well as the resulting exceedance frequencies, were compared descriptively. In addition, to test for significant differences in median exceedance rates, we used bootstrap methods to calculate medians with 95% confidence limits [46,47]. Therefore, we sampled 500 MEC with replacements and 1000 repetitions from this dataset to estimate median exceedance rates. Then, for each resampled dataset, the proportion of MEC exceedances was calculated for RTL and RTLe, respectively. From the resulting bootstrapped distributions of exceedance rates, the median with confidence limits were calculated by taking the 0.025, 0.5, 0.975 centiles (i.e., percentile method, e.g., [45]). Additionally, the median difference (in percent points) between the exceedance rate of MEC compared to RTL and RTLe, respectively, was bootstrapped with confidence limits.
To ensure that RTLe used in this case study (n = 27) are representative for the entirety of RTLe from the pooled data (i.e., calibration and validation data, n = 137) in the study, we used hypothesis testing (i.e., Wilcoxon Rank sum test) to test for significant differences between the aforementioned data subset and pooled dataset.

User Notes
RTLe can be used to evaluate pesticide MEC and to calculate exceedance rates (for methods see [15,16]). They are intended for use in large scale analyses over a large number of substances where, despite deviations for individual substances, a reasonable statistical certainty can be achieved (see Section 2.2 Data quality). For studies with only a few substances, consultation of the respective regulatory documents should be considered.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. ECOTOX database filter criteria.

Filter Criteria
Description Model 0 Basic filter criteria, that select studies with a concentration larger than 0, convertible to ug/L, and conducted with freshwater medium. Further, a concentration must be specified, and not include "NR", "ca", ">", "~", or "x".

Basic 1
The test type needs to be specified as "acute" or "not coded". Mid  Basic filter criteria, that select studies with a concentration larger than 0, convertible to ug/L, and conducted with freshwater medium. Further, a concentration must be specified, and not include "NR", "ca", ">", "~", or "x".