Balance between the Reliability of Classiﬁcation and Sampling E ﬀ ort: A Multi-Approach for the Water Framework Directive (WFD) Ecological Status Applied to the Venice Lagoon (Italy)

: The Water Framework Directive (WFD) requires Member States to assess the ecological status of water bodies and provide an estimation of the classiﬁcation conﬁdence and precision. This study tackles the issue of the uncertainty in the classiﬁcation, due to the spatial variability within each water body, proposing an analysis of the reliability of classiﬁcation, using the results of macrophyte WFD monitoring in the Venice Lagoon as case study. The level of classiﬁcation conﬁdence, assessed for each water body, was also used as reference to optimize the sampling e ﬀ ort for the subsequent monitorings. The ecological status of macrophytes was calculated by the Macrophyte Quality Index at 114 stations located in 11 water bodies. At water body scale, the level of classiﬁcation conﬁdence ranges from 54% to 100%. After application of the multi-approach (inferential statistics, spatial analyses, and expert judgment), the optimization of the sampling e ﬀ ort resulted in a reduction of the number of stations from 114 to 84. The decrease of sampling e ﬀ ort was validated by assessing the reliability of classiﬁcation after the optimization process (54–99%) and by spatial interpolation of data (Kernel standard error of 22.75%). The multi-approach proposed in this study could be easily applied to any other water body and biological quality element.


Introduction
The Water Framework Directive (WFD) [1] is a framework by European Commission that requires Member States to monitor each relevant biological quality element (BQE) in order to assess the ecological status of each water body (WB), which represents the classification and management unit of the WFD [2]. Macroalgae, phanerogams, macroinvertebrates, and fish faunal are the BQEs to be evaluated for the assessment of the ecological status of European transitional WBs under the WFD. In addition to BQEs, the physicochemical and hydromorphological supporting elements contribute to the ecological classification, confirming or not the classification provided by the BQEs. Biological monitoring results have to be expressed as ecological quality ratios (EQRs), by comparing sampled data with those equivalents from undisturbed or minimally disturbed reference sites. Depending on spread within the 11 natural WBs of the Venice Lagoon, were sampled to assess the ecological status by macrophyte assemblages (Figure 1). The dataset analyzed during the current study is available online at Environmental prevention and protection agency of Veneto Region (ARPAV) web portal [17]. For the classification of the BQE macrophytes, the MaQI [9] was used.
Water 2019, 11, x FOR PEER REVIEW 3 of 14 ecological status by macrophyte assemblages (Figure 1). The dataset analyzed during the current study is available online at Environmental prevention and protection agency of Veneto Region (ARPAV) web portal [17]. For the classification of the BQE macrophytes, the MaQI [9] was used. Figure 1. The Venice Lagoon map with its morphological and hydrological characteristics, and the WFD (Water Framework Directive) MaQI (Macrophyte Quality Index [8]) classification in 2011. Names of water bodies (WBs) are indicated in uppercase bold characters; place names are in lowercase. Final assessment of each WB is labeled like for stations: blue for high, green for good, yellow for moderate, orange for poor, red for bad status classes.

Reliability of Classification at Water Body Scale
The classification of the macrophyte assemblage at WB scale was obtained by averaging the MaQI EQRs of all stations within each WB (j).
First, the confidence interval (L) was also calculated for each WB (j) by the following formula: Final assessment of each WB is labeled like for stations: blue for high, green for good, yellow for moderate, orange for poor, red for bad status classes.

Reliability of Classification at Water Body Scale
The classification of the macrophyte assemblage at WB scale was obtained by averaging the MaQI EQRs of all stations within each WB (j).
First, the confidence interval (L) was also calculated for each WB (j) by the following formula: where t N j −1, α /2 is the critical value of the t-student distribution for the confidence level 1−α (two tiled distribution); S j is the standard deviation of the EQRs within each WB; N j is the number of stations within each WB. In this study, the confidence intervals were stated at the 95% confidence level (α = 0.05).
The L value provides an overall view of the confidence of the mean EQRs, but it does not take into account the closeness of the face value to the class boundaries, being independent from the WFD classification system. Therefore, the reliability of the classification was assessed in terms of probabilities that the observed ecological status classification (sample mean) lies within the right class. The cumulative probability that the actual (population) mean value of MaQI fell in each status class was assessed by t-student distribution. Data normality was tested by the Kolmogorov-Smirnov test whilst the absence of autocorrelations was verified by variogram analyses and Moran's test of autocorrelation. Classes were identified by the WFD boundaries reported by the Italian Ministry decree 260/2010 (high/good = 0.8, good/moderate = 0.6, moderate/poor = 0.4, poor/bad = 0.2). The confidences related to the critical boundary good/moderate were also calculated by the sum of probability of classes being lower/higher than good.
Statistics were carried out using the R software [18].

Statistical Approach
The relationship between the sampling effort (N of stations) and the classification confidence (confidence interval "L" and level "α") was investigated by the following equation applied to each WB of the Venice Lagoon: Three scenarios with different L values were investigated. The first two L values were selected considering the relationship between the width of the confidence interval, the distance between the class boundaries, and the maximum error of classification. Italian normative (Italian Ministry Decree 260/2010) provides equidistant boundaries for MaQI EQR (bad/poor = 0.2; poor/moderate = 0.4; moderate/good = 0.6; good/high = 0.8), therefore the width of the classes is 0.2. As a consequence, L = 0.1 (L 0.1 ) entails a maximum error of one ecological class at only one direction (1st scenario), whilst L = 0.2 (L 0.2 ) entails a maximum error of one ecological class at both directions (2nd scenario).
Finally, the mean confidence interval (L mean ) was calculated as average L of all WBs and applied to each WB (3rd scenario), in order to obtain a more consistent (homogeneous) classification confidence between WBs within the Venice Lagoon.

Expert Judgment Criteria
Starting from the three above-mentioned statistical scenarios, a two-step expert judgment analysis was applied to define the final optimization of monitoring effort (number of stations), based on level of confidence and hydromorphological features.
First of all, a suitable L value (L opt ) was defined in order to ensure higher homogeneity between WBs with an acceptable minimum level of reliability. Accordingly, to calculate N by Equation (2) for each WB, the following rules were adopted: (a) If L mean < L j < 0.2 (i.e., maximum error of one ecological class at both directions), then N new = N old ; (b) If L j > 0.2, then Equation (2) is calculated with L opt = 0.2; (c) If L j < L mean , then Equation (2) is calculated with L opt = L mean .
The risk of misclassification in relation to the critical boundary G/M was also considered. Finally, the whole dimension and the hydrological and morphological heterogeneity within each WB were also taken into account, to avoid an oversized number of stations in small WBs or, vice versa, an excessive reduction in large WBs. Table 1 summarizes the elements considered for each WB of the Venice Lagoon. First, the reliability of classification related to the critical boundary good/moderate was recalculated for each WB using the dataset obtained from the optimization process. The analyses were carried out as reported in Section 2.2 by calculating the cumulative probability of Student's t-distribution.

Spatial Analyses
To estimate the error in the sampling effort reduction, spatial interpolations of the two (original and reduced) monitoring networks of the whole Venice Lagoon were performed by Kernel interpolation with barrier available in the ArcMap toolbox of ArcGIS Desktop 10.1 (http://esri.com). Barriers were represented by islands and salt-marshes ( Figure 1). Exponential equations were used during the regression analysis as Kernel function, while the other parameters were optimized by default in ArcMap. Mean prediction errors and root-mean-square errors (cross-validation) were calculated to test the interpolation maps obtained. The relative error (RE) reduction was computed by the modified formula reported by [19]: where KBSE original and KBSE reduced are the Kernel standard error of the MaQI network of 2011 (original) and the new one (reduced), respectively. Finally, for each WB, the ecological classes resulted by the original network and the reduced one were also calculated by averaging the interpolated EQR values of all Kernel interpolation grid cells. The significance of the differences between the two networks were tested using Student's t-test.

Reliability Assessment
The probability that the actual mean value of MaQI EQRs fell within each of the five WFD classes, assessed by the Student's t-distributions; MaQI EQRs and status classifications; confidence interval (L j ), computed by Equation (1); and the cumulative probability related to the critical boundary G/M are shown for each WB in Table 2.
The Venice Lagoon mean L value (L mean ) used for the following analyses was 0.109. The WB ENC3 was left out from analyses because of its very small surface, its urban features, and the low number of stations (N = 3) that affected the meaning of statistical elaborations.

Optimization of Sampling Effort
The number of stations (N) resulted by the application of the statistical approach (Equation (2)) with the three scenarios (i.e., considering L = 0.1, L = 0.2, and L = L mean ) changed from 111 (2011 sampling) to 169 (L = 0.1), 54 (L = 0.2), and 147 (L = L mean ), respectively (Table 3). After emendation by the two steps of expert judgment, following criteria described in Section 2.3.2, 81 stations were proposed for future monitoring. Table 3. Results of the application of the statistical approach and expert judgement criteria to optimize the sampling effort in the MaQI (Macrophyte Quality Index) monitoring program of the Venice Lagoon.

WB
N original Statistical Approach Expert Judgment Criteria   Table 1, '3*' = a default N of 3 was given, when the application of Equation (2) resulted as N < 2, since standard deviation tends to be zero.
Briefly, no modifications of the number of stations were made to EC and ENC4 as their L j values (0.143 and 0.179, respectively) were higher than L mean but lower than 0.2 (rule (a)).
Rule (c) was strictly adopted at PC2, PC3, and PC4 since their L values (0.066, 0.057, and 0.049, respectively) were lower than L mean . Accordingly, the four stations to be maintained at PC2 were selected on the basis of the presence of salt-marshes and the small canals, which characterize the WB; the three stations deleted at PC3 were chosen from those on the most southern part of the WB, where several restoration activities, such as the construction of artificial salt-marshes, are underway to replace shallow waters available just before; and finally, the three stations maintained at PC4 were those mostly located in the middle of the WB.
The other WBs needed the expert judgment as following. A small or no reduction of monitoring effort was done for ENC1, ENC2, PC1, PNC1, and PNC2 due to their dimensions (Table 1), habitat heterogeneity (Table 1 and Figure 2), and their reliability of the classification (Table 3). Accordingly, five stations with spatially redundant information were deleted at ENC1, the largest WB of the Venice Lagoon, with several canals and two Lagoon inlets (Malamocco and Chioggia) and 97.7% of probability to be good or more. No modifications were done at ENC2, due to its very small dimension and high reliability of classification (84.3% of probability to be less than good). Regarding PC1, three stations were not considered to be sufficient to correctly represent, since it consists of large expanses of salt-marshes, the Dese river mouth, and several small canals that divide it mainly into four areas. No modifications were made for PNC1, which is located between the industrial area of Porto Marghera and the city of Venice; it is quite homogenous (L j = 0.045), but it is separated into five sections by the presence of six canals. The morphological characteristics of PNC2 are quite similar to PNC1 with the airport of Venice on the north-west side and Murano and Sant'Erasmo islands on the south-east side. Considering its high reliability of the status classification (100% less than good), one station, located in the middle of the WB, was deleted. Finally, a special approach was adopted for ENC3, which was excluded from the optimization process because no more than three stations could be sampled there, due to its very small dimension, the presence of Chioggia island in the middle of the WB, and the proximity to the Lagoon inlet of Chioggia.
After application of the optimization process, the final number of stations established for the subsequent monitoring programs of the 11 natural WBs (including 3 stations of ENC3) of the Venice Lagoon was 84 ( Figure 2).

Validation Process
The classification reliabilities resulted with the reduced network and comparisons with the MaQI network of 2011 (original network) are shown for each WB in Table 4. The change in probability that the quality status calculated with the reduced network is less than good and differences with the 2011 MaQI network are also reported.

Validation Process
The classification reliabilities resulted with the reduced network and comparisons with the MaQI network of 2011 (original network) are shown for each WB in Table 4. The change in probability that the quality status calculated with the reduced network is less than good and differences with the 2011 MaQI network are also reported.    Figure 3 shows the two maps produced by Kernel interpolation with barrier for the MaQI network of 2011 and the new one, respectively. Prediction errors are reported in Table 5. From Equation (3), an RE of 22.75% resulted.
Comparisons (Student's t-test) between status classifications of each WB before and after the reduction of sampling effort are reported in Figure 4. To test the differences, values were extrapolated from each Kernel interpolation map. Figure 3 shows the two maps produced by Kernel interpolation with barrier for the MaQI network of 2011 and the new one, respectively. Prediction errors are reported in Table 5. From Equation (3), an RE of 22.75% resulted. Comparisons (Student's t-test) between status classifications of each WB before and after the reduction of sampling effort are reported in Figure 4. To test the differences, values were extrapolated from each Kernel interpolation map.

Discussion
In this study, an analysis of reliability of the ecological status classification resulted by monitoring macrophyte assemblages according to the WFD in the Venice Lagoon was proposed and

Discussion
In this study, an analysis of reliability of the ecological status classification resulted by monitoring macrophyte assemblages according to the WFD in the Venice Lagoon was proposed and discussed in relation to monitoring effort review.
Previous theoretical studies proposed mixed models to estimate uncertainties in monitoring data, considering numerous different sources of variation that could affect an indicator, from the uncertainty related to sampling and analysis, to spatial and temporal variations and their interactions. However, when real data were considered, if some sources of uncertainty were small, they were disregarded by analysis [20][21][22][23][24][25][26]. In the current study, temporal variations were not considered, as the application of the MaQI index foresees to merge both spring and autumnal data collected in the year of monitoring, avoiding contribution from intra-annual variations to uncertainty [9]. Uncertainties associated with sampling and analysis methodology were excluded as the same laboratory staff was involved in the whole campaign at all activities, strictly following national protocols. Accordingly, only spatial variations were considered, also taking into account the practical need to reduce monitoring effort. However, when uncertainty is estimated from a larger dataset, pooling observations from multiple ecosystems with similar characteristics, with several spatial, temporal, and analytical method variations, it might be desirable to quantify every component at different scales [3,22]. In addition, as soon as data from subsequent monitoring cycles are available, inter-annual variability may also be assessed [7,[24][25][26].
In the current study, the reliability of classification was assessed in terms of probabilities that the observed ecological status classification (mean value at WB scale) lies inside the right class, i.e., inside the class assigned from the index. Spatial variations were estimated by the confidence interval: higher values resulted at ENC2, a small WB characterized by high hydromorphological, and pressure gradient from Lido inlet to Venice island. Lower values were mostly observed at polyhaline WBs (annual mean of salinity < 30), such as PC1, PC2, and PC3, which are mainly characterized by lower internal ecological variability. By Student's t-distribution, the probability that the actual mean value of MaQI EQRs of each WB fell within each one of the five WFD classes ranged between 53.9% and 99.6%. However, it should be considered that the confidence interval itself is not sufficient to determine the risk of misclassification. The uncertainty is in fact determined both by the width of the confidence interval and by the proximity of the mean to the class boundary and in particular to the critical threshold good/moderate. Anyway, considering the critical good/moderate boundary, which is important for making decision about measures by governments, results highlighted a satisfactory reliability of the WFD MaQI classification of 2011 (83-100%).
From results of the estimation of confidence, it was possible to investigate where a reduction of the monitoring network effort could be allowed, avoiding excessively increasing the risk of misclassification. Recent studies proposed, for monitoring programs assessing status of WBs, to identify the optimal allocation of samples in time and space, through the quantification of the different uncertainty components affecting monitoring data [22,23]. As reported above, in this study, the main factors affecting the reliability of MaQI results were spatial variations, therefore changes focused only on the number and location of sampling stations.
Again, statistical principles offer relatively simple and suitable tools to address the optimization of sampling effort, providing a quantitative and objective assessment of the impact of sampling strategy on the risk of misclassification. On the other hand, results of statistical analysis require a careful analysis before their application and the operative choice is likely to benefit from including a final revision by expert judgment. Indeed, this study highlighted as the purely statistical approach, based on the amplitude of the interval of confidence, could lead to not-applicable or meaningless results. For instance, according to the first scenario (L = 0.1), to reduce the L value of the WB ENC2 from 0.27 to 0.1, the number of stations should increase from 7 to 51. Considering that the sampleable area of ENC2 is about 10 km 2 , it means five stations per km 2 , an effort unachievable in the framework of Institutional monitoring and far away from the concept of "optimization". Under scenario L = 0.1, more than one station per km 2 would be also required within WB ENC4, and similar issues of incoherence, in case of the strictly application of statistical approach, are observed in the scenario L = L mean . Conversely, the application of the second scenario (L = 0.2) resulted in very little restriction for some large WBs, especially those with lower standard deviations. For instance, under this scenario, the number of stations within the WB ENC1 decreased from 26 to 8 (1 station for 13.5 km 2 ). All these evaluations are obviously linked to real data. However, there could be contexts where statistical results are confirmed even after expert judgment.
Accordingly, a revision process by expert judgment is essential. Intrinsically, the expert judgment is difficult to standardize and it introduces subjectivity into the evaluation. Therefore, it is crucial to guarantee maximum transparency on the followed criteria and later provide an objective estimation (validation) of the impact of the choices. In this study, the expert judgment followed the criteria described in Sections 2.3.2 and 3.2, and it aims (i) to homogenize the reliability of classification between WBs, (ii) to ensure for all WBs a minimum efficient reliability, (iii) to ensure a high reliability of the status classification regarding the critical boundary good/moderate, and (iv) to consider other particular elements of each WB such as dimensions, and hydrological and morphological characteristics.
To ensure the validation process, spatial interpolation of data was performed. Geostatistical techniques are based on the hypothesis that nearer observations are more similar to one another than to distant observations, and therefore allow you not only to insert new points where knowledge is more approximate, but also to eliminate others in those where they are redundant [27]. Furthermore, mathematical variogram models could assess the reductions of sampling effort by interpolations. Previous attempts to validate optimal locations of monitoring networks were performed adopting specific variogram models, such as in ordinary Kriging or Bayesian maximum entropy, depending on characteristics of their study areas [19,28]. The Venice Lagoon is a complex transitional area characterized by a composite mosaic of canals, salt-marshes, mud and sand intertidal flats, shoals, man-made structures, and islands [29,30]. Accordingly, spatial interpolations of the MaQI EQRs were performed by applying Kernel interpolation with barrier. The choice of this model, rather than others, was made in order to obtain interpolations of data which reflect the morphology of the Venice Lagoon, therefore with the presence of breaklines. Cross-validation results of both interpolations confirmed the right choice of the model, as mean prediction errors tended to be zero and quite small root-mean-square errors were observed. Moreover, both mean prediction errors resulted positive, which indicated that both models slightly overestimated the data [31], with relatively better results in the MaQI network of 2011 rather than in that reduced. To quantify the relative error of the reduced MaQI network, standard errors of the two interpolations were related by the formula of [19] modified using Kernel standard errors instead of Kriging.
In summary, results of the monitoring effort review of the WFD MaQI network of the Venice Lagoon showed a relative error of 22.7%, which can be considered acceptable taking into account the total reduction of stations of 26.3%. To evaluate the optimization process, Adhikary et al. [19] reported values below 30% as acceptable as considering the zero trend as a criterion. In order to obtain a more objective assessment, we performed a further check to validate the optimization process. As the aim of this study was also to avoid risk of misclassification, we extrapolated, both from the Kernel interpolation map of the original and the reduced networks, the EQR values of each WB. Then, results of each WB were tested by Student's t-test to verify that the reduction had no significantly modified from the original network. All comparisons did not significantly differ, except for PC1, PC3, and PNC1. Moreover, the classification results were not affected by the optimization process at all WBs.

Conclusions
From the statistical analyses, it appears that the first results of the WFD status classes of the Venice Lagoon obtained by macrophyte assemblages produced a satisfactory reliability and provided useful information about estimation of the confidence, which can be useful to management organizations to optimize resources for remedial measure adoption. Results also allowed proposing a reduction of the monitoring sampling effort, which definitely affects river basin management costs. For this purpose, a suitable multi-approach method based on inferential statistics, spatial analyses, and expert judgment was applied in this study, in order to review the sampling effort with the aim at ensuring and keeping on a high reliability of the status classification. Following the approaches described in this study, an increasing role of expert judgment could be observed: from a clear and strictly statistical approach based on L values a priori defined by ecological class amplitude, to a mixed method where L opt was a posteriori defined for each WB, and resulting stations further emendated taking into account the site-specific characteristics. However, this study showed that a strictly statistical approach is obviously objective and standardized, but it could be unachievable with realistic data, since some results could lead to a waste of resource or vice versa to an excessive reduction of information. For these reasons, to optimize the monitoring effort, the inferential statistics were applied as a guideline, with the implementation of criteria arisen from expert knowledge of the area and the problem. Moreover, a robust process of validation was also proposed and adopted which definitively ensure on spatial reductions and reliability of information.
Apart from the complex nature of the Venice Lagoon, the multi-approach proposed in this study could also be applied to any other water body to assess the WFD classification reliability, and to optimize the monitoring effort to any other area.