Development of a Metric of Aquatic Invertebrates for Volunteers (MAIV): A Simple and Friendly Biotic Metric to Assess Ecological Quality of Streams

: Citizen science activities, involving local people in volunteer-supported and sustainable monitoring programs, are common. In this context, the objective of the present work was to develop a simple Metric of Aquatic Invertebrates for Volunteers (MAIV), including a user-friendly tool that can be easily accessed by volunteers, and to evaluate the e ﬃ ciency of a volunteer monitoring program following an audit procedure. To obtain MAIV values, macroinvertebrate communities were reduced to 18 surrogate taxa, which represented an acceptable compromise between simplicity, e ﬃ ciency, and reproducibility of the data, compared to the regular Water Framework Directive monitoring. When compared to results obtained with the National Classiﬁcation System of Portugal, MAIV accurately detected moderate, poor, and bad ecological status. Thus, MAIV can be used by volunteers as a complement to the o ﬃ cial monitoring program, as well as a prospective early warning tool for local problems related to ecological quality. Volunteers were students supervised by their teachers. Results obtained by volunteers were compared to results obtained by experts on macroinvertebrate identiﬁcation to measure the e ﬃ ciency of the procedure, by counting gains and losses on sorting, and identiﬁcation. Characteristics of groups of volunteers (age and school level) did not inﬂuence signiﬁcantly the e ﬃ ciency of the procedure, and generally results of volunteers and experts matched.


Introduction
The European Water Framework Directive (WFD) instructed all member states to achieve good ecological status in all water bodies [1].The process required the implementation of monitoring programs to assess the ecological integrity of each water body in a continuous manner.Within this context, bioassessment became relevant in ecological monitoring, and substantial effort was put into developing multimetric indices that would best express human pressures on ecosystems [2].
Bioassessment programs need qualified expertise to make key decisions in the field and to identify organisms in the laboratory.A rather interesting way to circumvent the low number of experts in ecological monitoring has been the development of citizen science (CS), that is, the science done in conjunction with volunteers, under the direction of professionals and scientific institutions [3].In addition to the benefits for science, CS produces important societal benefits through innovative thinking and by improving: (a) societal conditions through scientific outreach; (b) education for the promotion of conservation [4]; and (c) relationships among institutional actors, bringing together individuals from diverse backgrounds [5].
CS activities involving local people in volunteer-supported and sustainable monitoring programs is a common practice [6][7][8][9][10][11][12].For example, in Canada, both federal and provincial governments have initiated aquatic biomonitoring networks such as the Ontario Benthic Biomonitoring Network (OBBN), accessible to both volunteers and scientists [3].In general, CS has been used in hydrological observations and/or in the monitoring of surface water quality in all different continents of the globe.This is the case of Europe [13], North America [14], Central America [15], South America [11], Oceania [16], Asia [17], and Africa [18], demonstrating that CS is a low-cost and crucial tool in raising awareness on the importance of good water quality [13], and important in decision-making instances [19].When educators are involved in CS, the benefits for school environmental education are multiple [20,21].The improvement of student learning skills and the professional development of science teachers through engagement in scientific inquiry and research, are two of such benefits.These ultimately improve literacy in Science, Technology, Engineering, and Mathematics (STEM).STEM is defined as the ability to read and/or write science texts, tables, and graphics, and the derived skill to apply scientific knowledge [22].
The integration of results coming from CS volunteer programs as a part of the National Classification System (NCS) must be used carefully, because potential conflicts might occur between environmental authorities and volunteers [23].As an attempt to prevent such conflicts, certain compromises are required [24], for example: (a) water authorities need to have valuable and precise results to support management actions, but volunteers prefer more friendly tools that are not necessarily precise; (b) water authorities desire results to accomplish legislative requirements, whilst volunteers are most likely driven by self-interest (hobbies, leisurely pursuits) [20,25].Thus, a participative balance has to be reached in choosing the sites to be monitored [24], and an equilibrium must be set between friendly tools and realistic results, quick results and valuable data, and inexpensive and sufficiently precise results.The challenge is to come up with the friendliest tool for volunteers to produce useful information for management proposals.For this, water authorities should know how volunteer results differ from expert results, because any metric of ecological quality is of little use without an understanding of the uncertainties in its estimation [26][27][28][29].
To develop a supported and sustainable volunteer monitoring program, it is necessary that volunteers have grasped the basic scientific concepts through adequate literature or through training courses [30].To prevent errors, sampling and laboratory protocols need to be simple and standardized.Finally, an expert audit should be implemented to assess the quality of the results, detecting main gaps and improvements to be made, thus increasing the efficiency of the process.
Macroinvertebrates are probably the most popular biological quality element evaluated by volunteers on lotic ecosystems [29].Macroinvertebrates are easy to collect with simple equipment and to identify with the naked eye in the field.They colonize all aquatic habitats and they have a great diverse sensitivity to physical, chemical, organic, and morphological pressures.However, the general metrics and multimetric indices developed to assess ecological status based on macroinvertebrates require counting and identification of, at least, all sampled families or even the identification of some species [2].The amount of work and time required for these identifications could be annoying to volunteers, increasing the risk of low accuracy and desistance.Thus, to maintain the interest of the volunteers, it is necessary to simplify the metrics through a reduction of the taxonomic detail [29] and to reduce the counting process, thus obtaining quick results but maintaining the accuracy.
The starting point of the present study was a set of tools developed by the European project on Conservation and Sustainable Development of Freshwater Ecosystems (CONFRESH, https://www.nhmc.uoc.gr/en/museum/programs/1814) to assess stream ecological quality based on macroinvertebrate communities.These tools were applied in a regional project conducted by the Regional Hydrographic Administration of Algarve of the Portuguese Environmental Agency (APA-ARHAlg).The objective was to evaluate the accuracy of the results obtained by students from 5th to 12th grades (volunteers), during extracurricular activities supervised by their teachers and following a participatory/contributory model of participation [31].The following steps were used for this activity: (1) APA-ARHAlg selected study sites, (2) teachers were taught skills on stream ecology and monitoring during courses given by experts, (3) teachers and their students (volunteers) collected and examined the samples and determined on their own the ecological class of each sampling site, and (4) experts on macroinvertebrate taxonomy audited the results.During the audit procedure, experts reanalyzed samples and quantified the differences between the two analyses, expressing them as the accuracy of the results obtained by volunteers.It was expected, as a hypothesis, that the characteristics of volunteers (age, number of students, and number of teachers) could influence the accuracy of the results and, in opposition, that this accuracy is independent from the ecological quality.
We intended to test if simple tolerance metrics based on the reduction of the more than 120 macroinvertebrate families to just 18 surrogates (Class, Order, and Family), as proposed by the CONFRESH project, did not significantly compromise the quality of the biological assessment when compared to the results of the National Classification System of Portugal.Thus, the aim of this paper is to propose a simple Metric of Aquatic Invertebrates for Volunteers (MAIV) through a reduction of the taxonomic detail, eliminating the counting process but maintaining the accuracy.We propose MAIV as an upgrade of the metric calculated by the CONFRESH project, since its score is the sum of tolerance scores of all surrogates (summative metric) and not only the tolerance score of the most sensible surrogate (nonsummative metric), thus increasing accuracy.

Materials and Methods
A synthetic overview to follow the methodological procedures adopted in this study is presented on Figure 1.

Study Area and Sampling Period
The study was developed in the Algarve region (southern Portugal), characterized by Mediterranean climate conditions, where the aquatic environment is shaped by sequential changes of annual flooding and drying, directly affecting the physical and biological characteristics of the streams [32].Most of the streams are temporary, and urbanization, agriculture, and cattle raising are the most important human pressures.From 17 streams, selected by APA-ARHAlg in this region and covering the four river types presented in Algarve (southern rivers with medium to large dimensions, drainage area more than 100 km 2 (M-L); southern mountainous siliceous rivers (M-S); southern small siliceous rivers (S-S), drainage area less than 100 km 2 ; and calcareous rivers of Algarve (C)), 36 samples were collected during spring 2010, after a period of intense precipitation (Figure 2).

Study Area and Sampling Period
The study was developed in the Algarve region (southern Portugal), characterized by Mediterranean climate conditions, where the aquatic environment is shaped by sequential changes of annual flooding and drying, directly affecting the physical and biological characteristics of the streams [32].Most of the streams are temporary, and urbanization, agriculture, and cattle raising are the most important human pressures.From 17 streams, selected by APA-ARHAlg in this region and covering the four river types presented in Algarve (southern rivers with medium to large dimensions, drainage area more than 100 km 2 (M-L); southern mountainous siliceous rivers (M-S); southern small siliceous rivers (S-S), drainage area less than 100 km 2 ; and calcareous rivers of Algarve (C)), 36 samples were collected during spring 2010, after a period of intense precipitation (Figure 2).

Teacher Training Courses and Volunteer Skills
Skills to implement this volunteer monitoring program were developed in six similar teacher training courses on "Conservation and Sustainability of Freshwater Ecosystems", covering the entire Algarve region, between January and May 2010.These courses were supported by bibliography on aquatic ecology, field and laboratory procedures, and macroinvertebrate identification [33,34],

Teacher Training Courses and Volunteer Skills
Skills to implement this volunteer monitoring program were developed in six similar teacher training courses on "Conservation and Sustainability of Freshwater Ecosystems", covering the entire Algarve region, between January and May 2010.These courses were supported by bibliography on aquatic ecology, field and laboratory procedures, and macroinvertebrate identification [33,34], developed by the CONFRESH project.Additional information regarding the WFD and basin management was provided [1].The training courses were organized into four sequential parts: (1) theoretical lectures on aquatic ecology, basin management, and monitoring; (2) field activities preceded by laboratory practical lectures; (3) question-and-answer sessions to solve doubts and to help with macroinvertebrate identification to family level, and (4) presentation of results.The institutions collaborating with the courses were the six Teacher Training Centers, Regional Hydrographic Administration of Algarve of the Portuguese Environmental Agency (APA-ARHAlg), Algarve Educational Authority, University of Évora, and University of Algarve.
In total, 87 teachers attended the courses and applied/transferred the obtained skills to their 807 students distributed in 23 secondary schools.This was done during extracurricular activities, forming 36 volunteer groups (students under teacher supervision).The students were from 36 classes with sizes varying from 4 to 66 students per class, degrees from 5th and 7th-12th, ages from 11 to 16, and number of teachers from 1 to 3 teachers per class (Table S1).None of the students had previous experience with macroinvertebrate assessment.

Habitat Description and Composite Human Pressure Gradient Calculation
Site habitat description was done by filling a form in the field.This form was produced by the APA-ARHAlg, covering four main categories of descriptors that were categorized as presence/absence on both banks of each river reach (land use, settlements on the river, water status, and river status) (Table S2).
Based on the results obtained from site habitat description, a Multiple Correspondence Analysis (MCA), applied to a matrix of the 36 samples described by presence (Yes)/absence (No) of 32 descriptors (Table S2), covering land use (11), settlements on the stream (7), state of the water (11), and stream characteristics (3) was performed, and the composite human pressure gradient established by the scores of the samples on the first dimension of the MCA.

Macroinvertebrate Sampling and Laboratory Procedures
In a 50-m-long reach at each site, the percent cover of the aquatic habitats was estimated (boulders, stones, sand, silt, macrophytes and algae, and coarse particulate organic matter-CPOM), following the procedure adopted by the Portuguese monitoring program [35].The composite sample consisted of six kick subsamples (each 1 m in length) collected with a hand net (0.5 mm mesh size, 25 cm front) (Figure 3a).These six kick subsamples were distributed along the reach covering different habitats, according to their percent cover.Each composite sample was sorted live in the field or immediately fixed after collection with formalin, kept in plastic flasks, and transported to the laboratory.
Macroinvertebrates were sorted and identified into 18 surrogates (Table S3), following the teaching materials developed by CONFRESH [33,34] (Figure 3b,c) and complemented by other more complex keys [36].In order to simplify the identification process, a key was developed to identify the 18 surrogates (see Figure S1; Tables S4 and S5).The ecological quality class for each sample was the score of the most sensitive taxa present in the sample (Figure 3d; Table S3).All macroinvertebrate sampling and laboratory procedures were done by volunteers (students under constant supervision by teachers).
teaching materials developed by CONFRESH [33,34] (Figure 3b,c) and complemented by other more complex keys [36].In order to simplify the identification process, a key was developed to identify the 18 surrogates (see Figure S1; Tables S4 and S5).The ecological quality class for each sample was the score of the most sensitive taxa present in the sample (Figure 3d; Table S3).All macroinvertebrate sampling and laboratory procedures were done by volunteers (students under constant supervision by teachers).

Audit Procedure
The objective of the audit procedure was to quantify the accuracy of the results obtained by the volunteers.To attain this objective, the volunteer results (primary analysis), were checked by experts (auditors) from the universities of Évora and Algarve.The accuracy of the volunteer results was measured by the quantification of the differences between primary and audited analysis, following an adaptation of the audit procedure implemented during the European STAR (Standardisation of River Classifications) project [37].

Audit Procedure
The objective of the audit procedure was to quantify the accuracy of the results obtained by the volunteers.To attain this objective, the volunteer results (primary analysis), were checked by experts (auditors) from the universities of Évora and Algarve.The accuracy of the volunteer results was measured by the quantification of the differences between primary and audited analysis, following an adaptation of the audit procedure implemented during the European STAR (Standardisation of River Classifications) project [37].
The primary analysis consisted of: (1) sorting the organisms in each sediment sample, (2) identification down to the established 18 surrogates (Table S3), and (3) establishment of the respective water quality class (Table S3).Once the primary analysis was completed, the remaining sediment from each sample was kept in plastic flasks and preserved in 70% alcohol.Likewise, the macroinvertebrates identified by the volunteers were kept in individual vials and preserved in 70% alcohol.Flasks and vials were labeled accordingly, affixed with the respective taxa list (primary taxa list) and ecological quality class (primary classification), and sent to the auditors.
Experts (auditors) checked the sorting and identification done during the primary analysis, termed the sorting audit and identification audit, respectively.On the sorting audit, the auditors re-sorted the sediment of each sample, detecting the remaining surrogates not collected during the primary analysis.Every surrogate (Table S3) found by the auditors and not detected in the primary analysis was considered a gain of the sorting audit.The sorting audit, for each sample, was quantified by the respective number of gains.
For the identification audit, the auditors checked the identifications of the primary taxa list.Two situations were possible during the identification audit: (a) gains of the identification audit were recorded if a new surrogate not previously found by the primary analysis was identified; and (b) losses of the identification audit were recorded when individuals of a given surrogate were allocated into different surrogates, thus increasing the number of surrogates present.The result of the identification audit was quantified by the number of gains and losses.The integration of the sorting and identification audits corresponded to the total audit.After the audit procedure, a new taxa list was obtained (audited taxa list), and the respective ecological quality class recalculated (audited classification).
Spearman correlations between the results of the total audits and volunteer characteristics were calculated to detect possible effects of the latter (Table S1) on the accuracy of the volunteer assessment.Spearman correlations between total audit results and water quality were also calculated to test the independence between the accuracy of the volunteers and water integrity.For these last correlations, water integrity was measured in two ways: (1) ecological quality class obtained by the audited classification, and (2) the composite gradient of human pressures obtained by the scores of the samples on the first dimension of the MCA.

Accuracies of Simple New Tolerance Metrics Compared to the NCS
In order to select an accurate and simple tolerance metric for macroinvertebrates, two kinds of metrics, based on the 18 surrogates were tested: (1) the CONFRESH metric [33], which score (value ranging from 2 to 5) is that of the most sensitive surrogate present in the sample.This score is a very simplistic picture of the ecological state of the site (nonsummative metric-NS); and (2) a set of three metrics, which final score is the sum of the tolerance scores of all surrogates present in the sample (summative metrics-S1, S2, and S3).These added scores give a better and more accurate depiction of the ecological state of each site.For S1, all surrogates included in the same group of tolerance had the same tolerance score adopted by the CONFRESH metric, ranging from 5 (the most sensitive surrogate) to 2 (the most tolerant surrogate).For S2, the tolerance score of each surrogate was based on the mean of the family tolerance scores established by the Iberian Biological Monitoring Working Party (IBMWP) [38] (e.g., the Trichoptera tolerance score is 8.5, corresponding to the mean of the tolerance scores established by IBMWP for its respective 20 families).In this S2 metric, all surrogates of each group of tolerance have the same score calculated as the mean score of all their respective surrogates.At S3, surrogates from each group of tolerance have different scores, according to the mean of their taxa scores.All these mean scores were rounded to the nearest natural number.A synthetic overview of the tolerance scores of each surrogate included in each group of tolerances is presented in Table 1.For each summative metric (S1, S2, and S3), a similar procedure was adopted to establish the boundaries between ecological classes: (1) the highest possible score was calculated, adding the scores of all surrogates (59 for S1, 94 for S2, and 88 for S3); (2) the boundary between high and good classes was established as 35% of the maximal possible value of the score (approximately the lower level of the boundary of high-good classes established for the Portuguese monitoring program [35]); (3) the remaining 65% was divided into four equal intervals to establish the boundaries between the other ecological classes, following the procedure of the Portuguese monitoring program [35]; and (4) the boundaries were rounded up to the nearest natural number to prevent decimal numbers (Table 2).
To evaluate the accuracy of these tested metrics, the APA-ARHAlg provided the data set of the National Classification System (NCS) for the study area, consisting of 79 sites with the official classification already established (NCS classification), as well as its respective taxa list (NCS taxa list).For all 79 sites, classifications of the four tested metrics (NS, S1, S2, and S3) were calculated (metric classifications), based on the NCS taxa lists.NCS classification and metric classifications were compared, and their respective accuracies measured by the number of correct/incorrect classifications, with the NCS classification as the benchmark.These comparisons were done for ecological classifications based on five ecological quality classes (high, good, moderate, poor, and bad), and on two ecological quality classes (high + good and moderate + poor + bad).This was to comply with the WFD target, which is to attain at least the good ecological status.These comparisons were done to all sites and again separately for each river type (southern rivers with medium to large dimensions, drainage area more than 100 km 2 (M-L); southern mountainous siliceous rivers (M-S); southern small siliceous rivers (S-S), drainage area less than 100 km 2 ; and calcareous rivers of Algarve (C)) [35].The accuracy of the four tested metrics (NS, S1, S2, S3) was also assessed under a real volunteer contest procedure, because the selected metric had to be used by volunteers.To quantify their accuracies, primary and audited classifications were calculated based on the respective primary taxa lists and audited taxa lists.Primary and audited classifications were compared, and their accuracies measured by the number of correct classifications, assuming the audited analysis as the benchmark.

Gradient of Human Pressures
The ordination obtained by the MCA shows a clear increasing gradient of human pressures along the first dimension (Figure 4).An opposition is evident between nondisturbed and disturbed sites (Figure 4a) and between land use, e.g., forest versus agriculture (mainly explained by dimension Dim1, approximately 46% and 37%, respectively) (Figure 4b).Concerning the existence of settlements along the streams, an increase of impact intensity is observed along the first axis, from small dams and springs to houses, roads, and irrigations (mainly explained by Dim1; approximately 46%) (Figure 4c).An opposition is also evident between water state (Dim1 = 27.2%)(Figure 4d) denoting a clear The ordination obtained by the MCA shows a clear increasing gradient of human pressures along the first dimension (Figure 4).An opposition is evident between nondisturbed and disturbed sites (Figure 4a) and between land use, e.g., forest versus agriculture (mainly explained by dimension Dim1, approximately 46% and 37%, respectively) (Figure 4b).Concerning the existence of settlements along the streams, an increase of impact intensity is observed along the first axis, from small dams and springs to houses, roads, and irrigations (mainly explained by Dim1; approximately 46%) (Figure 4c).An opposition is also evident between water state (Dim1 = 27.2%)(Figure 4d) denoting a clear relationship with the ecological status, and for this reason, considered the composite gradient of human pressures.

Audit
The audit procedure applied to the 36 samples collected by volunteers showed that gains and losses are mostly associated with identification (Table 3).It is also evident that there was a higher number of gains during the entire audit procedure as compared to the losses (Table 3).Concerning the identification done by volunteers, there were gains practically in all surrogates, with Diptera, Ephemeroptera, Coleoptera, and Plecoptera being the highest contributors (Table 4).
Results included in Table 5 showed no significant correlations in terms of gains and losses with the characteristics of the groups of volunteers (p > 0.05).The only significant correlations (p < 0.05) were obtained for human pressures, expressed as the scores of the samples along the first MCA dimension (Figure 4).For the great majority of the sites ( 22), primary and audited classifications match closely (Table 6).When classifications do not match, primary classifications were underevaluated in one class (eight cases) or in two classes (four cases).Table 6.Comparison of primary and audited classifications.Underevaluation occurs when primary classification is lower than the audited classification, with the opposite for overevaluation.When both classifications match, it is considered as correct.

Two Classes One Class One Class Two Classes
Primary to Audited 4 8 22 0 0

Accuracies of Simple New Tolerance Metrics Compared to the NCS
All metric classifications are significantly correlated (p < 0.01) with the NCS classification (Table 7).However, a more detailed analysis, based on the incorrect classifications for five and for two quality classes, gave different results.The NS shows a higher tendency for overevaluations than S1, S2, and S3 (Figure 5).The exception occurs with the southern mountains river type without any unimodal pattern, showing a relative minimum for the correct classifications.However, results for the southern mountains and the southern medium-large river types are based on a lower number of sites.Similar results occur when only two ecological quality classes are considered (graphs in the right column of Figure 5).7).However, a more detailed analysis, based on the incorrect classifications for five and for two quality classes, gave different results.The NS shows a higher tendency for overevaluations than S1, S2, and S3 (Figure 5).The exception occurs with the southern mountains river type without any unimodal pattern, showing a relative minimum for the correct classifications.However, results for the southern mountains and the southern medium-large river types are based on a lower number of sites.Similar results occur when only two ecological quality classes are considered (graphs in the right column of Figure 5).No more than 18% of the classifications obtained by S1, S2, and S3 for the two ecological quality classes, assuming an uncertainty of 5% around the boundary (Figure 6a-e), are incorrect.Once again, the river types with the lowest number of sites (south mountains and large-medium river types) do not fit into these trends.
No more than 18% of the classifications obtained by S1, S2, and S3 for the two ecological quality classes, assuming an uncertainty of 5% around the boundary (Figure 6a-e), are incorrect.Once again, the river types with the lowest number of sites (south mountains and large-medium river types) do not fit into these trends.

Metric Accuracies Under a Real Volunteer Contest Procedure
The comparison of primary classifications and audited classifications, for the four tested metrics (NS, S1, S2, S3) and on the 36 sites collected by the volunteers (Table 8), shows that the number of correct classifications tends to be higher than the underevaluations and the overevaluations.When five quality classes are considered, the number of underevaluations is very close to the correct evaluations, but when two classes are considered, the correct classifications compose the great majority of the summative metrics (S1, S2, S3), showing that NS has the lowest value of correct classifications (Table 8).
Table 8.Comparison of quality classes, between primary and audited analyses of 33 samples analyzed by the volunteers for the tested metrics (NS, S1, S2, S3).Underevaluation occurs when the class quality of the primary analysis is lower than the audited analysis; an overevaluation occurs in the opposite case.When both classifications match, it is considered as correct.Three samples were not included due to deficient preservation of the material sent to the auditors.

Discussion
Evaluating the accuracy that volunteers have in determining the ecological quality class of a site using macroinvertebrates comprises four main components: (1) sampling, (2) sorting, (3) identification, and (4) acceptability of the final results (assignation to an ecological quality class).The sampling procedure, if well standardized, seems not to be a significant source of error [28].Sorting

Metric Accuracies Under a Real Volunteer Contest Procedure
The comparison of primary classifications and audited classifications, for the four tested metrics (NS, S1, S2, S3) and on the 36 sites collected by the volunteers (Table 8), shows that the number of correct classifications tends to be higher than the underevaluations and the overevaluations.When five quality classes are considered, the number of underevaluations is very close to the correct evaluations, but when two classes are considered, the correct classifications compose the great majority of the summative metrics (S1, S2, S3), showing that NS has the lowest value of correct classifications (Table 8).
Table 8.Comparison of quality classes, between primary and audited analyses of 33 samples analyzed by the volunteers for the tested metrics (NS, S1, S2, S3).Underevaluation occurs when the class quality of the primary analysis is lower than the audited analysis; an overevaluation occurs in the opposite case.When both classifications match, it is considered as correct.Three samples were not included due to deficient preservation of the material sent to the auditors.

Discussion
Evaluating the accuracy that volunteers have in determining the ecological quality class of a site using macroinvertebrates comprises four main components: (1) sampling, (2) sorting, (3) identification, and (4) acceptability of the final results (assignation to an ecological quality class).The sampling procedure, if well standardized, seems not to be a significant source of error [28].Sorting seems to be the most interesting phase to volunteers.The curiosity stimulates the demand of new (different) macroinvertebrates and a constant attention is devoted to this task, this being a possible reason for the absence of gains on the sorting audit.The most important source of error occurred during identification, leading to some gains and to a few losses.Gains probably resulted from two different sources: (1) error on the identification of morphologically similar surrogates (e.g., Ephemeroptera and Plecoptera, Diptera and some larvae of Coleoptera, or Trichoptera) and ( 2) confusion with grains of gravel (e.g., Gastropoda) or small plant pieces (e.g., Oligochaeta).In any case, mean gains per sample were not too high.Thus, it can be concluded that these results are acceptable and can probably be enhanced with more experience and adequate materials to support the identification process [39].
Results showed independence of gains and losses from the characteristics of the groups of volunteers, thereby rejecting the initial hypothesis.Then, the adopted procedure of sorting and identification can be applied to a wide universe of possible volunteers.However, number of gains and losses seem to be dependent on human pressures, thus rejecting the hypothesis of independence.The most degraded sites tend to be dominated by a small number of more tolerant taxa [40], making it difficult to single out the few different ones; therefore, more gains were detected (positive significant correlation, Table 5).In contrast, sites with better quality tend to have richer communities [40], and individuals belonging to the same surrogate can be distributed by several surrogates.However, those gains and losses did not affect the ecological quality assessment, because the number of correct classifications (the same between primary and audited analyses) outweighed the incorrect ones, a fact that confirms other studies done with experts [37] and nonexperts [29,41].
Previous studies also assessed the reliability of stream monitoring by volunteers in relation to professionals (e.g., [6,16,26,27,29,42,43]) and most of them concluded that, with appropriate resourcing and robust protocols, volunteer data closely agree with the professional data used by government reporting and decision making.High correlations were also obtained between primary and audit analyses in New Zealand for %EPT-Ephemeroptera, Plecoptera and Trichoptera [16] and in the U.S.A. by the volunteer programs in Virginia [43] and Seattle [26].
Although significant correlations were obtained between scores of the four tested metrics and the NCS, the three summative metrics gave better results when the quality classes were compared.This confirms the hypothesis of a better performance of the summative metrics.Generally, the NS tended to overevaluate the ecological assessment because it reduces information to only the most sensitive taxa, independently from community composition.The summative metrics accounting for the tolerance scores of all the surrogates present were found to balance taxa richness and tolerances.Reducing the five quality classes of the WFD to two classes, separated by the good-moderate boundary, showed that the number of correct classifications increased, making these summative metrics more acceptable to classify water bodies that are below the good quality status (the environmental target of the WFD).
The accuracy of the volunteer procedure (difference between primary and audited analyses) is very similar to the tested metrics (summative and nonsummative) when five ecological quality classes are established.However, if only two quality classes are established (boundary good/moderate), the accuracy of the summative metrics is higher than the nonsummative one.
Since the results obtained by S1, S2, and S3 were very similar and more accurate than those obtained by NS, S2 was selected to be the Metric of Aquatic Invertebrates for Volunteers (MAIV) for two main reasons: (1) the tolerance scores of the surrogates are based on published results [38] and (2) the scores of all surrogates included in each of the four groups of tolerance are equal, thus, easier to use.A detailed calculation protocol of the MAIV is provided in Tables S4 and S5.
The MAIV proved to be adequate for volunteers, because it is based on a set of a few surrogates easily recognized by nonexpert people, saving time when compared to other volunteer monitoring experiences in which the identification is done to the family level [26].In addition, its application is independent from the characteristics of the volunteers, and the determination of the ecological quality class is simple and quick, keeping volunteers interested.However, the use of MAIV, to complement the NCS raised some concerns that need to be pointed out.In fact, MAIV reduces the assessment to one tolerance metric, while the NCS [35] is based on an index composed by metrics of tolerance, composition, and diversity.This is the reason for the lower accuracy of MAIV in arranging water bodies into five ecological quality classes.However, MAIV still is a good tool for the detection of water bodies that are below the good ecological quality status (the WFD environmental target).
The use of single volunteer data (MAIV classifications) for management purposes can be risky, due to the lower precision.However, its use integrated with a set of samples to detect geographical and/or temporal environmental patterns can be an important complement for water authorities, as already pointed out by Deutsch et al. [9], saving time and costs [30,44,45].Thus, to prevent incorrect management decisions based on MAIV, it is advisable to keep an interval of 5% of uncertainty around the good-moderate boundary (scores 41 to 49).
The results obtained in the present exercise show that MAIV can be applied to different Portuguese river types with the same boundaries, with acceptable results, like the south limestone and south small rivers types.But no definitive conclusions can be reached concerning the other two river types, due to the reduced number of sites used in the exercise.However, if the MAIV will be used in a first step by volunteers, it had to be kept simple.Different boundaries for different river types could introduce a complexity not important for the objective of MAIV.This latter metric could provide an extensive coverage of river assessments, complementing the NCS.Thus, the obtained results by MAIV could act as a warning of human pressures on river sites that are not regularly assessed by the NCS, indicating where more detailed monitoring or management actions could be needed.
The involvement of the educational community in the training courses proved to be quite effective.Teachers had the chance to develop their professional profile by improving the STEM literacy, through theoretical lectures that related to practical aspects (i.e., field and laboratory practical lectures) on conservation and sustainability of freshwater ecosystems.Some other benefits of this study were student citizenship, connection of knowledge, awareness and behavior, and partnering student biomonitoring research with community needs, as well as those of stakeholders and policy makers.In general terms, volunteers are strongly encouraged by interaction with scientists, and identified learning as one of the main rewards of taking part in CS projects, as documented by other studies [46][47][48].
The training procedure confirmed that SC, providing only theoretical information, is insufficient to elicit the required behavioral changes, with the training on identification being a key factor to the success of volunteer monitoring programs [27,49].But identification should be complemented with other tools [11,42,43] such as simplified identification keys [27].Hence, if it is desired to elicit new perspectives on how the behavior of individuals connect them to services provided by their ecosystem, and if the premise that "individuals act ethically as an integral part of an interconnected society and biosphere" is to be accomplished, it should be reached through personal belief rather than rational understanding [50].
Although there has been an increase in the number of studies on environmental monitoring research, precise tools to measure data quality are still scarce [25,51], compared to other volunteer practices, such as those in the health sector, for example [25].More effort is required to evaluate data quality and to adjust more specific and simple methodologies, this to give credibility to volunteer data and enhance the effect on environmental community education [25].A compromise is needed, between less uncertainties in environmental data and environmental community education, supporting volunteers in a balanced way.
The dimension of the regional project reported herein, involving six teacher training centers, is too large to be repeated, mainly due to the limited available resources (time and budget).However, the skills obtained by citizens was the seed for the creation of several small groups of volunteers (environmental school clubs involving senior citizens) that are being trained specifically in sampling procedures and identification; a much lower effort than organizing six entire training courses.

Conclusions
After training, students are able to identify aquatic invertebrates using simplified invertebrate keys and indices, a skill that can be used to accurately detect ecological status lower than good.Protocols and methods used for the monitoring of streams proved to be well accepted by volunteers.

Figure 1 .
Figure 1.Flow diagram summarizing the methodology adopted for the present study.Inside white boxes are intermediate and final results, being the latter highlighted in grey.Arrows indicated the sequence of the procedures.Metric of Aquatic Invertebrates for Volunteers (MAIV), National Classification System (NCS), summative metrics (S1, S2, and S3), nonsummative metric (NS).

Figure 1 .
Figure 1.Flow diagram summarizing the methodology adopted for the present study.Inside white boxes are intermediate and final results, being the latter highlighted in grey.Arrows indicated the sequence of the procedures.Metric of Aquatic Invertebrates for Volunteers (MAIV), National Classification System (NCS), summative metrics (S1, S2, and S3), nonsummative metric (NS).

Water 2020, 12 , 654 5 of 18 Figure 2 .
Figure 2. Geographical distribution of the sampling sites complemented with the location of the secondary schools involved in the study.Dams and municipality boundaries are also depicted.

Figure 2 .
Figure 2. Geographical distribution of the sampling sites complemented with the location of the secondary schools involved in the study.Dams and municipality boundaries are also depicted.

Figure 3 .
Figure 3. (a) Habitat assessment and macroinvertebrate sampling, (b) screening, (c) identification, and (d) classification of macroinvertebrates by volunteers in the Algarve region, southern Portugal.

Figure 3 .
Figure 3. (a) Habitat assessment and macroinvertebrate sampling, (b) screening, (c) identification, and (d) classification of macroinvertebrates by volunteers in the Algarve region, southern Portugal.
ecological status, and for this reason, considered the composite gradient of human pressures.

Figure 4 .
Figure 4. Multiple Correspondence Analysis (MCA) showing the first two dimensions (Dim1 and Dim2) with samples arranged according to data in field forms.Four groups of descriptors are represented separately to avoid overlap of the vectors-(a) stream characteristics, (b) land use, (c) settlements on the stream, and (d) state of the water.The importance of contributing variable categories is shown by the color scale (from blue to red).Yes = Presence, No = Absence, wwtp = wastewater treatment plant.

Figure 4 .
Figure 4. Multiple Correspondence Analysis (MCA) showing the first two dimensions (Dim1 and Dim2) with samples arranged according to data in field forms.Four groups of descriptors are represented separately to avoid overlap of the vectors-(a) stream characteristics, (b) land use, (c) settlements on the stream, and (d) state of the water.The importance of contributing variable categories is shown by the color scale (from blue to red).Yes = Presence, No = Absence, wwtp = wastewater treatment plant.

Table 7 .Figure 5 .
Figure 5.Comparison of metric classifications (NS, S1, S2, S3) with the NCS classification.Negative numbers correspond to underevaluations, 0 corresponds to correct evaluations, and positive numbers correspond to overevaluations.The absolute values on the x-axis correspond to the number of classes the metric classification differs from the NCS classification.The y-axis corresponds to the frequency of the underevaluations, correct evaluations, and overevaluations.The graphs in the left column correspond to five ecological quality classes.Graphs on the right correspond to two ecological quality classes.The results are reported for the all the sites and for each river type.

Figure 5 .
Figure 5.Comparison of metric classifications (NS, S1, S2, S3) with the NCS classification.Negative numbers correspond to underevaluations, 0 corresponds to correct evaluations, and positive numbers correspond to overevaluations.The absolute values on the x-axis correspond to the number of classes the metric classification differs from the NCS classification.The y-axis corresponds to the frequency of the underevaluations, correct evaluations, and overevaluations.The graphs in the left column correspond to five ecological quality classes.Graphs on the right correspond to two ecological quality classes.The results are reported for the all the sites and for each river type.

Figure 6 .
Figure 6.Comparison summative metric classifications (S1, S2, S3) with the NCS classification for two ecological quality classes, showing the percentage of correct and incorrect classifications, and the 5% interval of uncertainty.The results are reported for the (a) total sites and to each river type: (b) south limestones, (c) south small, (d) south medium-large, and (e) south mountains.

Figure 6 .
Figure 6.Comparison summative metric classifications (S1, S2, S3) with the NCS classification for two ecological quality classes, showing the percentage of correct and incorrect classifications, and the 5% interval of uncertainty.The results are reported for the (a) total sites and to each river type: (b) south limestones, (c) south small, (d) south medium-large, and (e) south mountains.

Table 1 .
Tolerance scores established for surrogates for the different tested metrics.The nonsummative (NS) metric score corresponds to the highest surrogate score.Scores of the other metrics (S1, S2, S3) are obtained by the sum of the scores of all the surrogates present.The Mean IBMWP column shows the mean tolerance score attributed by IBMWP (Iberian Biological Monitoring Working Party) to families included in the surrogate.The number of families included in each mean are shown in parentheses.

Table 2 .
Score limits of the ecological quality classes for each tested metric.

Table 3 .
Gains and losses (totals and means per sample) for the three audits (sorting, identification, and total).A total of 36 samples were audited.Standard deviations appear inside parentheses.

Table 4 .
Gains and losses of the identification audit, distributed by the surrogates.The vernacular names of the taxa are inside parentheses.

Table 5 .
Pearson correlation of the total audit (gains and losses) with the volunteer group characteristics, the human pressures (composite gradient), and the quality classes.