Meta-Evaluation for the Evaluation of Environmental Management: Standards and Practices

: Environmental management plays a key role in the sustainable development of cities. The effectiveness of environmental management is commonly examined through some evaluation schemes, but the effectiveness of such evaluation schemes also needs to be veriﬁed. As such, in this study, meta-evaluation was creatively introduced into environmental management to improve the evaluation of environmental management (EEM). Meta-evaluation is the evaluation of an evaluation scheme, and can verify and enhance the evaluation quality. First, a set of new meta-evaluation standards and criteria was proposed based on the unique characteristics of environmental management, which made meta-evaluation standards more adaptable and effective. After that, the efﬁcacy of the proposed meta-evaluation standards was veriﬁed through their application to two evaluation schemes used in different ﬁelds of EEM. Based on meta-evaluation, suggestions for improving these two EEM schemes were also provided. The major contributions of this study are to introduce meta-evaluation into environmental management, establish new evaluation standards, and examine the efﬁcacy of EEM. The research showed that it is critical to carry out meta-evaluation before and/or after the implementation of EEM.


Introduction
Meta-evaluation is "the evaluations of evaluation" [1], which can identify the limitations of an evaluation process itself [2]. In the literature, researchers have claimed that the reliability of evaluation processes should be ensured and enhanced by carrying out meta-evaluation [3], and meta-evaluation has now entered an era of application [4]. However, to date, few studies have been done on the theme of meta-evaluation, especially in environmental science [5,6]. In fact, most evaluations of environmental management (EEM), which is to discover the issues in environmental management and promote sustainable development [7][8][9], are based on the presumption that there is no problem with the evaluation process itself.
EEM is a first-order evaluation method. According to the evaluation theories, firstorder evaluation has many common problems, and the results of EEM may be problematic [1]. For example, in Shaanxi province, China, key ecological function areas were set to refer to areas that played an important role in water-holding, soil and water conservation, flood control and storage, windbreak and sand-fixation, and biodiversity maintenance [10]. The local government improves the ecological quality of these areas through national key ecological function areas management (NKEFAM). By evaluating NKEFAM, the provincial government has claimed that national key ecological function areas were well managed, and the ecological quality of almost all key ecological function areas was improved [11]. However, based on the analysis of remote sensing data, the ecological quality in some areas of Shaanxi is actually deteriorating [12,13]. Even the evaluators of NKEFAM admitted that there were some problems in the evaluation process. For example, there is administrative intervention into environmental monitoring, which makes it difficult to ensure the objectivity and accuracy of monitoring data [14]. Therefore, NKEFAM does not reflect the actual situation of the key ecological function areas, indicating that the quality of EEM has to be evaluated. Otherwise, severe losses may arise during the practice of environmental management [15]. As a systematic tool to control the quality of evaluation, meta-evaluation could identify the defects in EEM and improve the accuracy of environmental management. As such, in order to improve the quality of EEM, meta-evaluation should be introduced into the evaluation process of environmental management.
Many researchers have found that evaluators in many cases showed a low familiarity with the evaluation standards, so that "what standards should be used by evaluators" becomes vital and has to be well addressed in meta-evaluation design [6]. Therefore, in this paper, to introduce meta-evaluation into environmental management, specified meta-evaluation standards that can be used in practice are first established. The proposed meta-evaluation standards are then applied in two cases with respect to two different fields of EEM. By carrying out meta-evaluation for each case from many perspectives, the possible problems involved in the two evaluation schemes are identified, and the efficacy of meta-evaluation standards is demonstrated. In short, this research aimed to introduce meta-evaluation, and verify its efficacy in environmental management.
The remainder of this paper is outlined as follows. In Section 2, the background of meta-evaluation is introduced. The meta-evaluation standards are designed in Section 3. In Section 4, the applications of meta-evaluation in practice are provided. The conclusions are drawn in Section 5.

Background of Meta-Evaluation
It is common that some defects exist in the evaluation process itself. Stufflebeam, the former chairman of the Joint Committee on Standards for Educational Evaluation (JCSEE), claimed that "many things can and often do go wrong in evaluation work" [16]. Incorrect evaluation reports and unreliable evaluators could mislead the users of evaluation results and incur great losses. As such, evaluating evaluation process becomes necessary.
Meta-evaluation was first proposed by Scriven half a century ago [1]. Its intuitive definition is to evaluate an evaluation, including the evaluation of an evaluation system or evaluation design [1]. "Meta-evaluation, only applies to evaluations; is frequently (and properly) applied to just one at a time; may assess an evaluand that is entirely qualitative; and refers to any process of evaluating it, or them, or the product of that process." [17]. Through meta-evaluation, the potential problems of evaluation, such as bias, administrative difficulty, and technical error, can be examined [16]. Meta-evaluation can also provide a backward-driving mechanism. Namely, meta-evaluation can improve the quality of evaluation, and further enhance the subject of the evaluation [2]. The procedure of meta-evaluation mainly includes establishing the evaluation team, confirming the meta-evaluation standards, collecting information, and judging the compliance of metaevaluation standards [18]. It should be pointed out that the meta-evaluation team must be able to represent the interests of stakeholders, which is the key to the implementation of meta-evaluation [2,16].
Till now, meta-evaluation has been successfully applied in some areas, such as the performance assessment system of Teach for American [18], the operating mechanism of universities [19], the evaluation system of higher education institutions [20], the policy on rural teacher management systems [2], and the maritime traffic safety evaluation scheme [21]. Since the JCSEE program evaluation standards can be directly used in similar fields [2,22], most of the previous meta-evaluation research has focused on educational areas. The JCSEE program evaluation standards contain a couple of standards, and each standard includes many criteria. The specific criteria can be found on the official website of JCSEE [23]. These criteria can be used to evaluate the evaluation programs of education from many angles, such as the value of the programs, the evaluation ethics, and the respon-Sustainability 2021, 13, 2567 3 of 18 sibilities of the evaluation participants. Some of these standards are common. For example, the reliability and validity of the information used for evaluation have to be tested in any evaluation program. JCSEE was founded in 1975. It has more than 2 million members and has been recognized by 12-15 professional organizations [18]. Recently, the JCSEE program evaluation standards have also been recognized by the authorities, such as the American Evaluation Association, and have been recommended as the guideline for designing other evaluation standards [3,24].
With the increasing importance of the sustainable development of the environment, the application of EEM is becoming more and more frequent. However, there is still a lack of effective meta-evaluation tools in environmental management. Since the JCSEE program evaluation standards have been tested by authoritative institutions [17], it provides a reference for the establishment of EEM meta-evaluation standards, and makes it possible to apply meta-evaluation in environmental management.
Although it is feasible to establish the meta-evaluation standards for EEM with reference to the JCSEE program evaluation standards, they cannot be simply copied and applied to EEM since the JCSEE program evaluation standards are formulated for the evaluation of education. Therefore, new meta-evaluation standards for EEM should be established and adjusted to increase the efficacy according to the unique characteristics of environmental management [16]. This work is a useful attempt to extend the application of meta-evaluation to the new area of environmental management, and provides a reliable basis for guiding related environmental management practices.

Study Design
The standards of meta-evaluation must be widely recognized by the stakeholders, which is the starting point for the development of a meta-evaluation method [16]. In order to improve the acceptability of the meta-evaluation standards and make them represent the interests of all stakeholders, an evaluation team, including environmental experts, environmental officers, residents, members of an environmental non-government organizations, and government employees (participating in the entire evaluation process), has to be organized. The evaluation process consists of five steps: Step 1: The recommendations for adjustments to EEM, based on the JCSEE program evaluation standards, are made, which will be reviewed and confirmed by the evaluation team; Step 2: The meta-evaluation standards are established for EEM, which will be reviewed and confirmed by the evaluation team; Step 3: The evaluation team presents the results of relative importance among criteria, and the corresponding criterion weights are assigned according to the evaluation's results; Step 4: The evaluation schemes are analyzed based on meta-evaluation standards. In this process, environmental experts will confirm whether the criteria are achieved or not; Step 5: The results of the meta-evaluation will be further studied to figure out the problems in the evaluation schemes. Suggestions for improving EEM will then be provided.

The Standards of Meta-Evaluation
At present, the JCSEE program evaluation standards are the basis for formulating other standards, and have been recognized by the professional evaluation organization [24]. They are now in their third edition and are still under development [25]. JCSEE has always encouraged its users to make comments and suggestions on their own applications [18,19]. Motivated by the JCSEE program evaluation standards, in order to improve the evaluation schemes and enhance evaluation ability, especially as regards the EEM, we developed the following evaluation standards.

Utility Standard
The utility standard is designed to ensure that an evaluation meets the information needs of the prospective users [18]. Since the goal of EEM is to improve environmental performance, prospective users need credible environmental recommendations. However, there is no corresponding criterion in the JCSEE program evaluation standards, which focused more on the field of education [16]. As such, we propose a new criterion related to the results of the EEM, which could provide effective recommendations to protect the environment. As human activities often cause environmental changes [26,27], humans have the responsibility to protect the environment. Therefore, this criterion was named "human responsibility".

Feasibility Standard
The feasibility standard requires that an evaluation must be realistic, prudent, and comprehensive [18]. However, an important fact is that the continuity of evaluation is often overlooked. In practice, the evaluation process may take a couple of weeks or sometimes need to continue intermittently. For example, the government requires that the evaluation of NKEFAM in Shaanxi must be conducted every year. The importance of continuous evaluation has been emphasized by more and more researchers [28]. For example, if the health and safety of the evaluator is at risk due to the lack of relevant provisions, the evaluation may be terminated in the next round. Since this type of evaluation cannot be applied continuously, the evaluation accuracy may be degraded. Therefore, the continuity is added to the meta-evaluation standards since it is a second-order evaluation [16].
The criteria "health assurance" and "collaboration mechanism" are designed to ensure the continuity of evaluation. As mentioned earlier, "health assurance" is set to ensure that the health and safety status of evaluators is not in danger (for example, near chemical plants or radioactive sources). The "collaboration mechanism" is to facilitate the next round of evaluation [29]. The measurement of these two criteria is mainly based on the design of the evaluation scheme. The evaluation scheme should have clear health and safety provisions, and an effective evaluator mechanism for cooperation.

Propriety Standard
The propriety standard supports proper, fair, right and just evaluations [23]. In EEM, the evaluators may directly or indirectly cause a negative impact on the environment [30,31]. This is a situation that should be avoided, but it is often ignored when designing the evaluation scheme. For example, a lot of engineering geological drilling is required in the evaluation process of groundwater resource management. If the drilling is not qualified or cannot be restored, it will bring new risks to the safety of groundwater resources. Therefore, the criterion "respect to the environment" was proposed in the propriety standard, which indicates that the EEM process should reduce or even have no negative impact on the environment.
The designs of the above criteria, aiming to improve the EEM, have been approved by the evaluation team. They are based on the possible interaction between the evaluator and the environment, and the interaction between the evaluators.
Note that the process and the results of EEM emphasize the interaction between humans and the environment, which is a major difference between EEM and other evaluation scenarios [32]. During EEM, the evaluator may have an impact on the environment, while at the same time the environment may also have an impact on the evaluator. The EEM results mainly represent the impact of the evaluator on the environment. In addition, the interaction among evaluators may also indirectly affect the environment, and in part determines the quality of EEM. Therefore, the aforementioned criteria are proposed to reflect all these possible interactions, and could well match the unique characteristics of environmental management, as shown in Figure 1.

Summary of the Standards
Besides the new criteria, we selected some widely applicable standards from t JCSEE program evaluation standards to form a complete meta-evaluation standard EEM. Table 1 summarizes all the standards used for the meta-evaluation of EEM, wh have been approved by the evaluation team.

Standard Label Criterion Description
Utility U1 Stakeholder Identification Person who is involved in or affected EEM should be identified.

U2 Evaluator Credibility
The person performing EEM should b competent.

U3 Information Scope
The collected EEM information should be extensive.

U4 Values Identification
The perspectives, procedures, and rationale of EEM should be carefully described.

U5 Report Clarity
The program being assessed should be clearly described, including context, purposes, procedures, and findings.

U6 Report Timeliness and Dissemination
The reports of EEM should be dissemi nated to target users.

U7 Evaluation Impact
The reports of EEM should be reporte and applied.

U8 Human Responsibility
The results of EEM should provide cle and reliable recommendations for env ronmental protection.

F1 Practical Procedures
The procedures of EEM should be pra tical.

F2 Political Viability
The EEM should be planned and conducted with the positions of stakehold ers.

F3 Cost Effectiveness
The EEM should be efficient and pro-

Summary of the Standards
Besides the new criteria, we selected some widely applicable standards from the JCSEE program evaluation standards to form a complete meta-evaluation standard for EEM. Table 1 summarizes all the standards used for the meta-evaluation of EEM, which have been approved by the evaluation team.

Weights of Criteria
Since it is difficult for an evaluation scheme to meet all the meta-evaluation standards, some important criteria have to be taken into account. Meanwhile, to facilitate the use of meta-evaluation standards in practice, the priority of the criteria needs to be considered and the weights of the criteria should be calculated. In this paper, the analytic hierarchy process was used to assign weights to the criterion system. The analytic hierarchy process is a flexible and practical multi-criteria decision-making method, which can estimate the criteria weights by pairwise comparisons, and has been successfully applied to environmental management [33]. According to the previous study [34], the calculation process is as follows: First, the relative importance of any two sub-criteria under a higher criterion (criterion A is more important than criterion B, or criterion A is less important than criterion B) can be obtained through the discussion of the evaluation team.
Second, according to the discussion results of the evaluation team, the Saaty 1-9 scaling method (1, equally important; 9, extremely important), as shown in Table 2, was used to obtain the quantitative value (b ij ) of the relative importance among criteria, and as such the judgment matrix could be formed, as shown in Appendix A.
Third, the largest eigenvalue (λ max ) and the corresponding eigenvector (w i ) of the judgment matrix are calculated. It should be pointed out that each element of w i is the required weight value in a certain hierarchy. The equation is as follows: where b ij is the quantitative value of the relative importance among criteria. Table 1. The standards of meta-evaluation for the evaluation of environmental management.

Standard Label Criterion Description
Utility U1 Stakeholder Identification Person who is involved in or affected by EEM should be identified. U2 Evaluator Credibility The person performing EEM should be competent. U3 Information Scope The collected EEM information should be extensive.

U4 Values Identification
The perspectives, procedures, and rationale of EEM should be carefully described.

U5 Report Clarity
The program being assessed should be clearly described, including context, purposes, procedures, and findings. U6 Report Timeliness and Dissemination The reports of EEM should be disseminated to target users. U7 Evaluation Impact The reports of EEM should be reported and applied.

U8 Human Responsibility
The results of EEM should provide clear and reliable recommendations for environmental protection.

F1 Practical Procedures
The procedures of EEM should be practical.

F2 Political Viability
The EEM should be planned and conducted with the positions of stakeholders.

F3 Cost Effectiveness
The EEM should be efficient and produce information of sufficient value.

F4 Health Assurance
The health and safety of the evaluators should be fully protected throughout procedures of EEM.

F5 Collaboration Mechanism
The EEM should provide a mechanism to ensure the cooperation of all participants.

P1 Service Orientation
The EEM should be designed to effectively serve the needs of stakeholders.

P2 Formal Agreements
Obligations of participants to EEM should be agreed to in writing.

P3 Human Interactions
Evaluators should respect human dignity in their interactions with other persons associated with EEM.

P4
Complete and Fair Evaluation EEM should be complete and fair in its examination of the program.

P5
Disclosure of Findings The persons affected by the evaluation and any others with expressed legal rights can receive the results.

Conflict of Interest
Conflict of interest within EEM should be dealt with openly and honestly.

P7 Fiscal Responsibility
The evaluator's allocation and expenditure of resources should reflect sound accountability procedures.

P8
Respect to the Environment The EEM should be designed and conducted to avoid negative impact on the environment.

A1 Program Documentation
The program being evaluated should be described and documented clearly and accurately.

A2 Context Analysis
The context in which the program exists should be examined in enough detail.

A3 Described Purposes and Procedures
The purposes and procedures of EEM should be monitored and described in enough detail.

A4 Defensible Information Sources
The sources of information used in EEM should be described in enough detail.

A5 Valid and Reliable Information
The information-gathering procedures should be chosen or developed and then implemented.

A6 Systematic Information
The information collected, processed, and reported in EEM should be systematically reviewed.

A7
Analysis of Quantitative Information Quantitative information in EEM should be appropriately and systematically analyzed.

A8
Analysis of Qualitative Information Qualitative information in EEM should be appropriately and systematically analyzed. A9 Justified Conclusions The conclusions reached of EEM should be explicitly justified.
A10 Impartial Reporting Reporting procedures should guard against the distortion caused by the personal feelings and biases of any participant.
Note: The researchers stated that, in some situations, certain criteria of the JCSEE program evaluation standards may not applicable [18], and these were removed or changed in our study. The difference between "human interactions" and "collaboration mechanism" is that the former is one of the necessary prerequisites of the latter. The cooperation mechanism emphasizes more heavily the design of rules. Table 2. The Saaty 1-9 scaling method [35].
Criterion b i is slightly more important than criterion b j . 5 Criterion b i is more important than criterion b j . 7 Criterion b i is much more important than criterion b j . 9 Criterion b i is extremely significantly more important than criterion b j . 1, 1/2, 1/3, 1/4, 1/5, 1/6, 1/7, Fourth, the weight coefficient (a i ) can be obtained by normalizing w i as Fifth, based on the above calculation, λ max can be obtained as where n is the rank of the judgment matrix corresponding to the criterion. Finally, it is necessary to check the consistency of each individual criterion and of the overall criteria. The equation is as follows: where: CR = the proportion of consistency; RI = the average random consistency. The statistical tool MATLAB 7.0 was used to calculate the previously mentioned variables, which are shown in Table 3.

Sample
In our study, two types of evaluation schemes in two different fields of EEM were taken as application cases. Each case has been meta-evaluated from a different perspective to reveal the possible problems in the schemes. The comprehensive efficacy of the metaevaluation standards is also demonstrated.

Urban Sewage Treatment Management Evaluation Scheme
Urban sewage treatment management (USTM) is highly related to the sustainability of city development [36,37], and the performance of USTM has received great attention from the public and the local government. In addition, the Ministry of Housing and Urban Rural Development of China has requested provinces conduct an annual evaluation of USTM since 2010 [38]. Therefore, the evaluation data on USTM are relatively complete, and are eligible for meta-evaluation.
Besides this, the sustainable planning of water resources is an important issue for water resource conservation [39]. From a long-term perspective, water shortage may be a problem that all regions have to face. Currently, the USTM's sustainability evaluation scheme is only aimed at alleviating water shortages in water-scarce areas. However, the potential contradiction between water supply and demand in water-rich areas should also be considered in the evaluation scheme [40]. As such, meta-evaluation should be applied to evaluate whether the water resources evaluation scheme in all regions carries the concept of sustainable development, regardless of whether the region is rich in water resources or not.
To this end, as shown in Figure 2, the evaluation schemes of USTM in three provinces in China, Guangxi, Shaanxi, and Inner Mongolia, are selected for meta-evaluation, since they are located in the same longitude and have similar GDP levels [41]. The information of the evaluation scheme comes from the official website of the local government [42][43][44]. (Note that the implementation date of the above USTM scheme was 2010, while the official announcement was later than that.)  [35]; the sum of criterion weights after adjustment is 100; λ max is the largest eigenvalue; w i is the required weight value in a certain hierarchy; CR is the proportion of consistency; Adjusted w i is the weight of each criterion of meta-evaluation.

National Key Ecological Function Areas' Management Evaluation Scheme
The management of national key ecological function areas is very important to Chinese national ecological security [46]. In 2010, the State Council of China identified 25 national key ecological function zones, two of which (the relevant areas of the Loess Plateau and Qinling-Daba Mountains) are located in Shaanxi province. According to the requirements of the central government, Shaanxi has issued an evaluation scheme of NKEFAM [47]. However, some research-based evidence has questioned the process and its conclusions [12][13][14]. Therefore, we apply a meta-evaluation to reveal the possible problems in the NKEFAM evaluation scheme.

Meta-Evaluation
The effectiveness of the Delphi method could indicate whether each group can make correct judgements [48]. Therefore, in order to reduce the response error, it was used for meta-evaluation, as shown in Figure 3. To this end, as shown in Figure 2, the evaluation schemes of USTM in three provinces in China, Guangxi, Shaanxi, and Inner Mongolia, are selected for meta-evaluation, since they are located in the same longitude and have similar GDP levels [41]. The information of the evaluation scheme comes from the official website of the local government [42][43][44]. (Note that the implementation date of the above USTM scheme was 2010, while the official announcement was later than that.)

National Key Ecological Function Areas' Management Evaluation Scheme
The management of national key ecological function areas is very important to Chinese national ecological security [46]. In 2010, the State Council of China identified 25 national key ecological function zones, two of which (the relevant areas of the Loess . However, some research-based evidence has questioned the process and its conclusions [12][13][14]. Therefore, we apply a meta-evaluation to reveal the possible problems in the NKEFAM evaluation scheme.

Meta-Evaluation
The effectiveness of the Delphi method could indicate whether each group can make correct judgements [48]. Therefore, in order to reduce the response error, it was used for meta-evaluation, as shown in Figure 3. Scholars from environmental departments at different universities were selected as environmental experts by the evaluation team. The evaluation schemes were sent anonymously to five environmental experts, who conducted independent analyses according to the standards of meta-evaluation. Consistent evaluation results from the meta-evaluation should be retained. If there is disagreement, more than half of the same evaluation results will be retained. If not, the evaluation schemes will be sent to other experts until an acceptable evaluation result is obtained. Finally, the results of the metaevaluation are obtained.

Results and Discussion
The utility standard is the most important part of the meta-evaluation standards. The reasons for this are as follows: on the one hand, the maximum weight was assigned to the utility standard by the evaluation team (see Table 3); on the other hand, researchers believed that the utility standard should be ensured prior to the feasibility, propriety, and accuracy standards [18]. Regardless of the performance of other standards, the lack of utility of EEM means that it cannot bring any changes to environmental management. Scholars from environmental departments at different universities were selected as environmental experts by the evaluation team. The evaluation schemes were sent anonymously to five environmental experts, who conducted independent analyses according to the standards of meta-evaluation. Consistent evaluation results from the meta-evaluation should be retained. If there is disagreement, more than half of the same evaluation results will be retained. If not, the evaluation schemes will be sent to other experts until an acceptable evaluation result is obtained. Finally, the results of the meta-evaluation are obtained.

Results and Discussion
The utility standard is the most important part of the meta-evaluation standards. The reasons for this are as follows: on the one hand, the maximum weight was assigned to the utility standard by the evaluation team (see Table 3); on the other hand, researchers believed that the utility standard should be ensured prior to the feasibility, propriety, and accuracy standards [18]. Regardless of the performance of other standards, the lack of utility of EEM means that it cannot bring any changes to environmental management. Therefore, the poor performance of the utility standard directly leads to the failure of the EEM scheme. According to the previous studies, when more than half of the utility criteria are not met, it is considered that the utility standard has not been met [2,18]. However, this measurement does not take into account the different weights of each criterion. In this study, based on the assigned weights, if the score of the utility standard is less than half, it is considered to have failed, and the relevant EEM scheme will be deemed to have serious defects.

The Meta-Evaluation for Urban Sewage Treatment Management Evaluation Scheme
The meta-evaluation results for the USTM evaluation schemes of the Guangxi, Shaanxi, and Inner Mongolia provinces are shown in Table 4. In overall, it was found that all three provinces have neglected some meta-evaluation criteria, including stakeholder identification, information scope, political viability, cost effectiveness, conflict of interest, valid and reliable information, justified conclusions, respect to the environment, and collaboration mechanism. Among them, some are important, such as stakeholder identification. There is no doubt that the public is an important stakeholder in urban wastewater management [49]. The lack of public participation will lead to many problems: (i) the conclusion drawn from the evaluation may be unfair; (ii) the monitoring of the evaluation may be questionable; and (iii) the proposed strategy for improving urban sewage treatment may not meet the actual needs of the society. It is easy to find that problems of sewage treatment in these three provinces have existed for a long time [50][51][52][53]. Note that currently, the involvement of extensive public participation to promote environmental conservation has been widely recognized [54]. In this study, meta-evaluation provided a further scientific basis for supporting public participation in environmental conservation. Therefore, if the criterion "stakeholder identification" is not emphasized in the evaluation scheme, such a scheme should be considered to have serious shortcomings. The government should encourage public participation in EEM.
Some criteria have been taken into account in the three provinces, including values identification, practical procedures, formal agreements, complete and fair evaluation, fiscal responsibility, program documentation, defensible information sources, analysis of quantitative information, and human responsibility. However, the number of these criteria did not exceed one third of the total, and most criteria were not considered (see Table 4). The evaluation schemes of USTM in Guangxi, Shaanxi and Inner Mongolia are internal, administrative, and bottom-up. The department of Housing and Urban-Rural Development of the three provinces required their subordinate units to evaluate the USTM under administrative orders, and then submit the evaluation results to them [42][43][44]. As a result, the evaluation schemes focus on the operational steps, responsibilities, completed conclusions and information records. Although such an approach can ensure the completion of the EEM activities, it has many weak points, as follows: (i) the internal evaluation lacks public participation and insufficient information dissemination; (ii) evaluation in the form of administrative order may make it difficult to ensure the personal dignity of the lower-level evaluators during the evaluation process; (iii) bottom-up evaluation may cause the evaluator to become a mere information transmitter rather than an evaluator; and (iv) the continuity of evaluation is limited. In addition, the final evaluation results, based on the self-evaluation of the lower-level administrative departments, may be unreliable.
In order to better understand our conclusions, we have interviewed some evaluators. It was found that (i) most evaluators treated the evaluation information as the internal information prescribed by the higher-level authorities, which cannot be released; (ii) most evaluators cared more about whether they have completed the assigned tasks than about the reliability of the evaluation information and the environmental concerns; and (iii) a lack of cooperation among the evaluators made most of them unwilling to conduct the next round of evaluation. These problems are further explored and confirmed by our meta-evaluation, particularly concerning the lack of consideration of the evaluation criteria "respect to the environment" and "collaboration mechanism". Therefore, the EEM metaevaluation standards we designed are reliable for problem identification. To solve these problems, the integration of "service orientation" and the "evaluation continuity" should be strengthened, and the administration management approach should be improved. Introducing multiple evaluation subjects, peer review, and random sampling evaluation are feasible solutions for administration, which can together improve the quality of EEM.
The problems mentioned above are common to the three provinces. In order to observe the scheme quality more intuitively and to facilitate a comparison among provinces, we calculated the scores of the three provinces' meta-evaluation standards based on the criterion's weight, as shown in Figure 4. quality of EEM.
The problems mentioned above are common to the three provinces. In o observe the scheme quality more intuitively and to facilitate a comparison provinces, we calculated the scores of the three provinces' meta-evaluation st based on the criterion's weight, as shown in Figure 4. Through meta-evaluation, the utility standard of Guangxi scored less than half (a score < 23.34), which shows that Guangxi' evaluation scheme of USTM has serious defects. The evaluation scheme of Shaanxi is qualified (score of the utility standard = 25.85), and the evaluation scheme in Inner Mongolia performed even better (score of the utility standard = 37.76). In addition, one finding worth noting is that the more abundant water resources in the province are, the worse the evaluation scheme of the USTM will be. Guangxi has abundant water resources, but its meta-evaluation scores are the lowest among the three provinces. This demonstrated that the abundance of natural resources may cause local people to lack an awareness of sustainable development. Researchers stated that a lack of natural replenishment could threaten the sustainability of water resources [55]. Studies found that sustainable development has been highly down-valued, or even overlooked, in natural resource-endowed areas [56]. Without awareness of sustainable development, even the abundant resources will eventually be exhausted [40]. Therefore, the efficiency of the natural resource management in various regions should be improved, especially in those regions with rich natural resources, in order to maintain sustainable development.
In summary, according to the meta-evaluation, it was found that there are many problems with the existing evaluation schemes of USTM in the Guangxi, Shaanxi, and Inner Mongolia provinces, since each province only obtained a low score (the full score is 100). In particular, Guangxi had the lowest score. According to the backward mechanism of meta-evaluation [2], we could see that the performance of evaluation directly determines the effect of USTM, which further determines the performance of urban sewage treatment. In other words, the results of the meta-evaluation indirectly reflected the performance of urban sewage treatment. Meta-evaluation has played an important role in improving the accuracy of EEM. In order to examine this effect, we demonstrated the situation of untreated urban sewage in the three provinces [57], as shown in Figure 5. Through meta-evaluation, the utility standard of Guangxi scored less than score < 23.34), which shows that Guangxi' evaluation scheme of USTM has serious The evaluation scheme of Shaanxi is qualified (score of the utility standard = 25. the evaluation scheme in Inner Mongolia performed even better (score of the standard = 37.76). In addition, one finding worth noting is that the more abundan resources in the province are, the worse the evaluation scheme of the USTM Guangxi has abundant water resources, but its meta-evaluation scores are the among the three provinces. This demonstrated that the abundance of natural re may cause local people to lack an awareness of sustainable development. Rese stated that a lack of natural replenishment could threaten the sustainability o resources [55]. Studies found that sustainable development has been highly downor even overlooked, in natural resource-endowed areas [56]. Without aware sustainable development, even the abundant resources will eventually be exhaus Therefore, the efficiency of the natural resource management in various regions sh improved, especially in those regions with rich natural resources, in order to m sustainable development.
In summary, according to the meta-evaluation, it was found that there ar problems with the existing evaluation schemes of USTM in the Guangxi, Shaan Inner Mongolia provinces, since each province only obtained a low score (the full 100). In particular, Guangxi had the lowest score. According to the backward mec of meta-evaluation [2], we could see that the performance of evaluation determines the effect of USTM, which further determines the performance o sewage treatment. In other words, the results of the meta-evaluation indirectly r the performance of urban sewage treatment. Meta-evaluation has played an im role in improving the accuracy of EEM. In order to examine this effect, we demon the situation of untreated urban sewage in the three provinces [57], as shown in F  Since 2010, after the Ministry of housing and urban rural development of China launched the urban sewage management evaluation [38], the proportion of untreated urban sewage in the three provinces has declined. However, to date, the effect of urban sewage treatment is still not ideal. For example, although the untreated rate of urban sewage in Inner Mongolia dropped to about 5% in 2018, there were still over 17,200,000 cubic meters of urban sewage left untreated [57]. This clearly showed that the quality of USTM in three provinces had to be improved, which was one of the conclusions drawn from our meta-evaluation. Moreover, there were differences in sewage treatment effect among the provinces. Guangxi had the largest number of sewage treatment plants among the three provinces [57], but the efficiency of urban sewage treatment was the lowest. This indicated that the effect of urban sewage management in Guangxi was the lowest, which is also consistent with our meta-evaluation results.

The Meta-Evaluation for National Key Ecological Function Areas Management Evaluation Scheme
In this case, meta-evaluation was used to evaluate EEM from the perspective of the evaluation process. According to the guideline of JCSEE [18], the evaluation process contains eight sections, including defining evaluation issues, collecting information, analyzing information, proposing an evaluation report, making an evaluation budget, making an evaluation agreement, managing an evaluation, and hiring an evaluator. Each section corresponds to specific meta-evaluation criteria. This means these criteria can be used to evaluate part of the evaluation process. As such, a matrix of the meta-evaluation criteria and the evaluation process is obtained, and a meta-evaluation of the evaluation scheme of NKEFAM in Shaanxi is carried out. The results are shown in Table 5.
Note: • indicates that the criterion is reached; indicates that the criterion is not reached; indicates the criterion that needs to be evaluated for a specific evaluation section.
After reviewing the evaluation process, it was found that the evaluation scheme of collecting and analyzing information was relatively good. However, there were obvious deficiencies in the work of managing evaluation and hiring an evaluator, since most of the corresponding criteria were not met. Without effective management, evaluation will be passive and result in blindness in practice. This is evidenced by the lack of criteria and poor work in defining evaluation issues. One of the problems of the Shaanxi NKEFAM evaluation program came from the evaluators, who worked in government departments. Due to the awareness of stakeholders, the evaluators intended to demonstrate the positive effects of public services [58]. This led to unreliable evaluation data and reports (lack of criteria of "valid and reliable information" and "impartial reporting"). This is consistent with the findings of Ding et al. [14]. In addition, the budget evaluation should be improved. Insufficient investment can also limit the effectiveness of the evaluation. Faced with these challenges, we should first strengthen the management of the evaluation system and promote the requirements as regards the qualifications of the evaluators.
In summary, the reliability and practicability of the meta-evaluation have been verified by the above two cases. Since meta-evaluation could discover important problems in environmental management and put forward targeted improvement measures, it can be widely used in the EEM field. In particular, carrying out meta-evaluation before implementing an evaluation scheme can provide diagnostic feedback to improve the EEM activities.

Conclusions
This paper creatively introduced meta-evaluation into environmental management to ensure the quality of the evaluation schemes and promote the sustainability of city development. In order make meta-evaluation applicable, we have established the metaevaluation standards, including utility, feasibility, propriety, and accuracy. In addition, an analytic hierarchy process was used to assign weights to the criterion system. The proposed meta-evaluation standards are shown to be practical and feasible. To demonstrate their applications in practice, two types of evaluation schemes used in two different fields of EEM have been analyzed.
For the evaluation schemes of USTM in Guangxi, Shaanxi, and Inner Mongolia, the following problems were discovered: (i) the public is not treated as a stakeholder in the evaluation schemes of USTM in all three provinces; (ii) all the evaluation schemes used in three provinces are internal, administrative and bottom-up; (iii) the continuity of evaluation is restricted; and (iv) the more abundant the water resources are, the worse the evaluation scheme of USTM will be. These major problems resulted in a series of further problems. For example, the evaluation conclusion may be unfair, the evaluation cannot be properly monitored, the evaluation is not known, and there is no cooperation mechanism among the evaluators. These problems indicate that the management effect of urban sewage treatment is far from perfect, and the urban sewage treatment in the three provinces has low efficiency. These conclusions are supported by the consistency among the meta-evaluation outcomes, the survey results, and the empirical data. The proposed potential solutions include the following: First, the criterion "stakeholder identification" should be added into the evaluation scheme, and the government should encourage public participation in EEM process. It is recommended that the environmental management department set up an evaluation committee and invite public representatives to participate in the evaluation and monitoring process as committee members. Second, the criteria "service orientation" and the "evaluation continuity" must also be considered. The more feasible suggestions are to strengthen the training of evaluators and allow the evaluators to participate in the formulation of the rules of the cooperation mechanism in the evaluation scheme. The reliability of EEM can also be improved by introducing multiple evaluation subjects, peer reviews, and random sampling evaluations. Third, the government should cultivate the concept of sustainable development [59], especially in resource-rich areas. Since green finance plays an important role in the environmental management control system [60,61], the government should effectively encourage green financial services and guide the green production behavior of enterprises so as to promote the achievement of sustainable development goals.
For the evaluation scheme of NKEFAM in Shaanxi province, it was found that the evaluation scheme for collecting and analyzing information is relatively good. However, the methods of managing evaluation and hiring evaluators need further improvement. In addition, the budget evaluation should be improved as well. In order to solve these problems, it is necessary to strengthen the management of the evaluation system, such as by establishing a management evaluation system platform and having dedicating professionals to manage it. In addition, EEM should set higher standard for evaluators through institutionalized means, such as making rules and regulations.
In total, the main contribution of this study is to introduce meta-evaluation into environmental management, and examine its effectiveness in practice. Meta-evaluation can provide diagnostic feedback to improve the EEM activities before actually implementing the evaluation, and can help users verify the reliability of the results after adopting the corresponding evaluation scheme.
Note that the meta-evaluation standards we proposed are applicable to general EEM. However, for some specific EEMs, it is possible that certain criteria may not be applicable and should be removed, while other new criteria need to be developed. In addition, we found that the professional competence and reliability of the evaluators is an issue worthy of in-depth study. Future meta-evaluation research on EEM should pay more attention to the evaluation of evaluators' qualifications and the inspection of the operation process. Considering the relevance of encouraging green finance in environmental management, it would be valuable to establish financial evaluation criteria in the meta-evaluation standards, in order to promote sustainable development.  Data Availability Statement: All the data are publicly accessible, included in the manuscript and accessible on the local governments' official websites for the consulted provinces, as well as in the press.

Conflicts of Interest:
The authors declare no conflict of interest.