The Evaluation Framework in the New CAP 2023–2027: A Reflection in the Light of Lessons Learned from Rural Development

The new Common Agricultural Policy (CAP) proposal includes few improvements compared to previous programming periods which may reinforce future evaluation, but we can also observe elements that may weaken the assessment, with the risk of repeating past failures. The objective of this essay is to analyse the new framework proposed for evaluation in the future CAP and to promote a collective discussion on how to make evaluations more usable, useful and reliable for users and practitioners. The first part of the paper analyses the main elements of evaluation during the different rural development programming cycles. A second part is dedicated to an examination of the current programming period (2014–2020) and the implications of the introduction of the Common Monitoring and Evaluation Framework (CMEF) and the evaluation plan. In a third part, we critically discuss the proposals for the next programming period and we offer some concluding reflections and two main open questions. From the analyses carried out, many elements emerge to encourage discussion on the role that evaluation has played and can play and the critical points to face. The experiences in rural development policies have introduced important changes in theoretical and implementation terms. In particular, they helped to build evaluation capacity and enabled the involvement of the civil society. However, it is also clear that the European Commission (EC) designed path has often led to an increase in rigidity and orthodoxy towards common frameworks compliance.


Introduction
The European Rural Development (RD) policies were introduced in the second half of the 1980s, as part of the reform of European structural policies, and are dedicated to the competitiveness of farms and to a sustainable growth in rural areas [1][2][3][4]. With the adoption of the LEADER (Liaison entre actions de développement de l'économie rurale) approach, now fully integrated into rural development intervention, these policies combine the strategic territorial vision with the integrated participation at the local level [5,6].
Over 30 years on from its conception, RD is about to face a new programming period. In June 2018, the EC presented legislative proposals on the future CAP aiming to make it more responsive to future challenges [7], continuing to support European farmers for a sustainable and competitive agricultural sector. In October 2020, the Council and the European Parliament agreed to open the final stage of the approval process, the so-called trilogue process, to reach the final agreement on the CAP legislation, which places the first and second pillars within a single framework.
One of the main innovations is the New Delivery Model (NDM), a new governance structure, which, in the Commission's thinking, should steer from the current compliancebased to a performance-based approach. In this perspective, we can read the introduction of the National Strategic Plan (NSP) as a tool to simplify the implementation of the CAP. However, this requires more organisational effort for Member State (MS), especially where a regionalised model is present.
Like other authors [8][9][10][11][12][13][14][15][16], we want to go beyond the mere critical description of the Commission proposals and, for this reason, we conduct the analysis in light of a concrete programming and operational delivery.
This essay intends to analyse the implications of the new model, with the aim of promoting a collective discussion on how to make evaluations more usable, useful and reliable for users and practitioners. Primarily, the paper presents a summary analysis of the evolution in rural development. This part highlights and compares the main and characterising elements of evaluation in the different programming periods especially based on the lessons learnt from the Italian Rural Development Programmes (RDPs). Then, the paper presents the current evaluation framework, focusing the analysis primally on two very important elements: the CMEF and the Evaluation Plan (EP). Finally, we analyse opportunities and critical points that could derive from the Commission's new proposal and we suggest some conclusions and improvements as well as two open questions.

Information Sources and Materials
The development of evaluation research is characterised not only by the emergence of different evaluation practices but also by the relationship with politics. In explaining the relationship between policy and evaluation, Martini points out that the purpose of the latter is to "offer policy-makers elements for judging the success of policies" [17]. This is an important point, which explains the path of the different evaluation approaches. This path should be oriented by approaches focused on the effective use of its results through the broadening of evaluation topics and the determination of specific evaluation questions [18]. The goal should be the dissemination of evaluation culture and the maturation of responsive administrative contexts (management responsive systems in Bandstein and Hedbolm [19]), where administrations learn and develop adaptive capacities to implement evaluation results [20,21]. This means critically analysing the evaluations carried out in order to clarify the identification of research aims and dimensions, especially to avoid confining evaluation to the function of public funding accountability [22]. The construction of tailor-made evaluations [23], whose design could be articulated on the analysis of the purposes of the evaluation itself, becomes relevant again. In this theoretical background, the article proposes to critically analyse the evolution of CAP evaluation approaches in relation to its possibilities of use and usefulness for decision-makers and all possible intended users [24] and to observe the criticalities that the concrete application of evaluation has shown. This article is based on a literary review relating to the European Structural Funds programming periods. The article aims to provide a synthesis, as complete and exhaustive as possible, of existing literature on the evaluation of rural development policies, in order to draw a broad description on how evaluation approaches have evolved over time, in relation of the changes in evaluation requirements as stated in EU regulations. To this end, the reference documents for this essay were selected based on their contribution to the understanding of the specificities of evaluation in the field of rural development (CAP II Pillar), both in a theoretical and in a practical way with particular attention to the operational results of evaluations, highlighting the challenges faced by evaluators since the 2000-2006 programming cycle and including the forthcoming 2021-2017 period. It should be noted that only from 2014-2020 for the first time a common monitoring and evaluation framework will cover the whole CAP (both pillars). Accordingly, this essay is focused on the evaluation various approaches proposed for the CAP II Pillar, where the responsibility is addressed at MS level, while the responsibility of assessing the I Pillar lies with the EC. However, the lessons learnt from the present investigation are useful to face the future challenge of assessing the future NSP, that will integrate interventions under the two pillars of the CAP into a single strategy.
In line with Ramdhani et al. [25] (p. 47) "a literature review of a mature topic addresses the need for a critique of, and the potential reconceptualisation of, the expanding and more diversified knowledge base of the topic as it continues to develop". Therefore, the selection of relevant documents for review was based on a critical analysis, which offers a view on the state of play of knowledge on a subject, provides analytical frameworks and indicates avenues for future research. This method enabled to examine the evolution of the main concepts, ideas and relations observed in the field of the European RD Policy evaluation. Moreover, the critical analysis is a useful method, since it ensures a general vision on the degree of knowledge on a given theme, but we are also aware of its main limit represented by the authors' discretion in choosing the literature to be examined as pointed out by Nakano and Muniz [26]. "In that sense, it does not intend to be exhaustive, authors are concerned to give the most accurate picture of the field from their point of view [26] (p. 2)". To overcome such an inconvenience and organise a robust critical analysis, in our investigation, we took into account indications included in some guidelines [27][28][29], in particular: • several information sources and magazines were explored; • interpretations are presented in a clear way; • assertions are logically supported.
Among information sources used for this analysis purposes, it is necessary citing the EC website, which was consulted to collect insights on the RD policy regulatory framework. Moreover, the review of existing studies [30][31][32][33][34], drafted on behalf of the EC, offered a comprehensive picture on the evolution of the European RD Policy.
As a final point, the desk analysis was integrated with the overview of many documents, reports and guidelines published by the European Evaluation Network for Rural Development, at EU level, and the National Rural Network, in Italy, which can be easily downloaded through their websites.

The Evolution in the Assessment of Rural Development
Throughout the different programming periods of rural development, approaches and demands for evaluation have changed, sometimes in radical ways. Clearly, there should be a relationship between the evolution of a policy approach and the evolution of a system called upon to evaluate it. In some respects, this can be seen in the evolution of rural development policies, but this evidence cannot be extended to other context.
During the process of reforming the Structural Funds (SF), the focus on evaluation grew significantly and it became a central element in programming processes. The regulations defined in a very incisive way the responsibilities, the procedures and the main evaluation approaches, even if by an accountability point of view [9,12]. It is worth recalling that evaluation standards were developed by Directorate-General for Budget with the goal to guide Directorate-Generals in their evaluation work [16,35]. The Regulation (EEC) No. 2052/88 legitimises the evaluation as a fundamental item and tool in programming and in implementing the interventions. This Regulation, accompanied by several working documents and guidelines, has started to define a structured framework with questions, indicators and criteria [12,36,37]. The introduction of ex ante and ex post evaluations dates back to this period, in light of making evaluation more useful and more well timed with the programme cycle.
It has been an experimental phase with an immature evaluation with different views and very different levels of awareness about evaluation usefulness and potentials [9,38,39]. We can say there was a need to build a shared vision and a co-ordination, both at community (EC and MS) and individual MS level, and this pushed the Commission to draw up a set of orientations and guidelines (i.e., MEANS, STAR VI/7676/98), in the view of building a common approach and standardised procedures. The Commission identified the socalled intervention logic (or logical framework) as the central point of this process of Sustainability 2021, 13, 5528 4 of 18 standardisation [12,40]. However, all these efforts, although worthy, were not able to conduct to a homogeneous framework and to strengthen evaluation practices. We can observe, indeed, that the evaluation reports in this period have been often of low-quality and the use of evaluation results was substantially poor [38,39]. We can read very little about how the evaluation has contributed to draw better programmes [12]. As mentioned, this was a transitional phase also in policy formulation towards rural development, with the closure of the so-called structural policies and the conception of the new rural policy, between CAP reform and cohesion policy [4]; accordingly, the indications for evaluation were somehow affected by this; the main indications came from other Directorate-Generals than from Directorate-General for Agriculture and Rural Development and were not always able to approach the specificities of rural development.
Due to the need to demonstrate the results achieved, the Regulations for the 2000-2006 cycle strengthened the monitoring and evaluation framework in a prevailing logic of control and reporting; the Commission introduced relevant changes both in the evaluation of SF (Reg. (EC) No. 1260/99) and RD (Reg. (EC) No. 1257/99). During this period, the Commission reinforced a common evaluation model, based on questions, criteria and indicators, with two documents: STAR VI/8865/99 and STAR VI/12004/00. In the latter one, the evaluation question developed by the EC in the Common Evaluation Questionnaire takes on relevance. However, the Commission brings some innovations in terms of use and governance of evaluation. On the side of the evaluation cycle, besides the ex ante and ex post evaluations, MSs are requested to provide an in itinere assessment, to timely support the RDP implementation and the possible reprogramming [9]. In the period 2000-2006, evaluation was called upon to take account of the introduction of the RDPs and Agenda 2000. This type of programming provided for a combination of existing instruments, such as those deriving from Regulations (EC) 950/96 and 951/96, agri-environmental measures, deriving from the first pillar and of a compulsory nature, and new instruments of a territorial nature, such as those inserted and proposed by Art. 33 of Regulation 1257/99. The resulting RDPs were very complex and sometimes mosaic-like. Consequently, evaluation guidelines, such as STAR document 12204/00, have attempted to give a systematic approach to the evaluation of these interventions, with the reconstruction of the logic of each intervention and the proposal of criteria and indicators. Governance wise, the Commission promoted different solutions to manage the evaluation process (e.g., the steering groups), suggesting a more active involvement of different actors (local authorities, socioeconomic partnerships, scientific community). For example, in Italy, we consider it worth highlighting the good experience of the National Evaluation System (Sistema Nazionale di Valutazione) that aimed to share evaluation procedures and practices between evaluators and public administrations.
The Regulation for the 2007-2013 presented a simpler approach to RD than those adopted in the previous period, giving more emphasis importance of the so-called strategic approach, that provides a closer and more addressed relationship among the needs assessment, the identification of objectives and the choice of measures. In this light, evaluation became a strategic knowledge tool that could improve programming and implementation, allowing to achieve RD objectives [4,13]. Following this approach, the most important step was the adoption of the CMEF, to provide a common design, not only in terms orientation but also in terms of commitments. The Regulation (EC) No. 1698/2005 provided a reinforcement to both homogeneity and methodology with the aim of guiding MSs towards a more effective assessment of RDPs, but also to ensure a better accountability of public spending [41]. According to Smismans [16], the key features are coming from three Communications: 2007 "Reinforcing Evaluation", 2010 "Smart Regulation", 2013 "Improving evaluation" and the relating guidelines.
The most important step was the adoption of the CMEF, to provide a common design, not only in terms of orientation but also in terms of commitments. The CMEF consisted in a set of documents drawn up by the Commission and agreed with the MS, presenting a common understanding of evaluation and uses, a set of common questions and an articulated system of indicators (common baseline, output, result and impact indicators). The rationale of the CMEF was to harmonise MSs' evaluations through methodological and procedural common references, to ensure their quality and to enable aggregation and comparation at European Union (EU) level [40,42]. However, there were no mandatory indications about judgment criteria, procedures and techniques to answer the evaluation questions and this guaranteed a certain degree of autonomy to evaluators [43,44].
The on-going approach was the second novelty of the 2007-2013 programming period, in the light of making assessment more relevant in the programming cycle [35]. As noted by Cristiano and Licciardo [45], the on-going approach configures the evaluation as a process which takes place from the initial stages of programming, providing the Managing Authorities (MAs), in every step, the necessary knowledge to improve the implementation of RDPs (see Figure 1); evaluation is therefore considered more a learning process than a mere product, presented in an evaluation report [9,[46][47][48][49]. The most important step was the adoption of the CMEF, to provide a common design, not only in terms of orientation but also in terms of commitments. The CMEF consisted in a set of documents drawn up by the Commission and agreed with the MS, presenting a common understanding of evaluation and uses, a set of common questions and an articulated system of indicators (common baseline, output, result and impact indicators). The rationale of the CMEF was to harmonise MSs' evaluations through methodological and procedural common references, to ensure their quality and to enable aggregation and comparation at European Union (EU) level [40,42]. However, there were no mandatory indications about judgment criteria, procedures and techniques to answer the evaluation questions and this guaranteed a certain degree of autonomy to evaluators [43,44].
The on-going approach was the second novelty of the 2007-2013 programming period, in the light of making assessment more relevant in the programming cycle [35]. As noted by Cristiano and Licciardo [45], the on-going approach configures the evaluation as a process which takes place from the initial stages of programming, providing the Managing Authorities (MAs), in every step, the necessary knowledge to improve the implementation of RDPs (see Figure 1); evaluation is therefore considered more a learning process than a mere product, presented in an evaluation report [9,[46][47][48][49]. Finally, we can mention, as a third element, the strengthening of networking activities both at EU level (Evaluation Helpdesk, European Evaluation Network for Rural Development, Evaluation Expert Committee) and at national level (National Rural Networks), to build evaluation capacity and share experiences and knowledge. Table 1 provides a synoptic overview of the main indications in rural development evaluation across the different programming periods. Columns two and three list the official regulations and guidelines, columns four and five provide the main key elements and orientations set by the same regulatory framework, whereas column six "Notes" shows evaluation exercises' main features and the lessons learned from their actual implementation, which were tackled during the analysed programming cycles. Finally, we can mention, as a third element, the strengthening of networking activities both at EU level (Evaluation Helpdesk, European Evaluation Network for Rural Development, Evaluation Expert Committee) and at national level (National Rural Networks), to build evaluation capacity and share experiences and knowledge. Table 1 provides a synoptic overview of the main indications in rural development evaluation across the different programming periods. Columns two and three list the official regulations and guidelines, columns four and five provide the main key elements and orientations set by the same regulatory framework, whereas column six "Notes" shows evaluation exercises' main features and the lessons learned from their actual implementation, which were tackled during the analysed programming cycles.

The Innovations Adopted for the Evaluation in 2014-2020 Programming Period
The current programming period is characterised by some important changes in policy terms, such as the linkage between rural development and the SF inside a single Common Strategic Framework. Nonetheless, the approach to evaluation shows a path-dependence from the previous programming periods.
In 2014-2020, the CAP contributes to three general objectives, which together feed into the Europe 2020 objectives for a smart, sustainable and inclusive growth: viable food production; sustainable management of natural resources and climate action; balanced territorial development. Both CAP pillars contribute to these general objectives, broken down into specific objectives (priorities) and some of which are common to Pillars, whereas others are linked either to Pillar I or to RD (Figure 2). More specifically, Pillar II contributes to ensuring viable food production through enhancing farm viability and competitiveness of all types of agriculture (priority 2) as well as through promoting food chain organisation (priority 3). At the same time, the sustainable management of natural resources and climate action are pursued through restoring, preserving and enhancing ecosystems related to agriculture and forestry (priority 4) along with the promotion of resource efficiency and the support to the shift towards a low carbon and climate resilient economy in agriculture, food and forestry sectors (priority 5). To achieve a balanced territorial development, Pillar II focuses on the multipurpose objective of promoting social inclusion, reducing poverty and developing economy in rural areas (priority 6). To these ends, Pillar II foresees a cross-cutting objective on fostering knowledge transfer and innovation in agriculture, forestry and rural areas (priority 1). Due to this common logic framework, for the first time, the monitoring and evaluation system covers the whole CAP (both pillars), under a CMEF [50]. In the programming period 2014-2020 there is often confusion between what is the CMEF and the Common Monitoring and Evaluation System (CMES). The CMEF is the compilation of rules and procedures necessary for evaluating the whole CAP, while the CMES contains the rules and procedures within the CMEF, which relate to only Pillar II. Together the CMEF and CMES help to measure the full performance of the CAP [51]. down into specific objectives (priorities) and some of which are common to Pillars, whereas others are linked either to Pillar I or to RD ( Figure 2). More specifically, Pillar II contributes to ensuring viable food production through enhancing farm viability and competitiveness of all types of agriculture (priority 2) as well as through promoting food chain organisation (priority 3). At the same time, the sustainable management of natural resources and climate action are pursued through restoring, preserving and enhancing ecosystems related to agriculture and forestry (priority 4) along with the promotion of resource efficiency and the support to the shift towards a low carbon and climate resilient economy in agriculture, food and forestry sectors (priority 5). To achieve a balanced territorial development, Pillar II focuses on the multipurpose objective of promoting social inclusion, reducing poverty and developing economy in rural areas (priority 6). To these ends, Pillar II foresees a cross-cutting objective on fostering knowledge transfer and innovation in agriculture, forestry and rural areas (priority 1). Due to this common logic framework, for the first time, the monitoring and evaluation system covers the whole CAP (both pillars), under a CMEF [50]. In the programming period 2014-2020 there is often confusion between what is the CMEF and the Common Monitoring and Evaluation System (CMES). The CMEF is the compilation of rules and procedures necessary for evaluating the whole CAP, while the CMES contains the rules and procedures within the CMEF, which relate to only Pillar II. Together the CMEF and CMES help to measure the full performance of the CAP [51]. The most important elements are related to the connection between monitoring and evaluation leading to the dichotomy between rigidity and flexibility of the system proposed. They can be summarised in the following points: 1. the proposition of the new CMES for the rural development as part of the CMEF; 2. the focus of the evaluation process via EP; 3. the inclusion of evaluation results in Chapter 7 of the "enhanced" Annual Implementation Reports (AIR).
Over the different programming periods, the evaluation of rural development has undergone a substantial process of definition and systematisation, including a strong standardisation. With the 2014-2020 CMES, the EC strengthens the vision of a system "one-fit-all"; it is a rigorous framework, able, by theory, to meet Europe-wide needs in terms of evaluation and knowledge [38]. We can say that the new CMES is intended to provide the definitive, structured, and common reference for evaluation practices in rural development [52].
The regulations substantially reaffirm the importance of evaluation as a fundamental tool to improve programmes and to promote shared learning; however, the guidance documents-e.g., the Technical handbook [50]-propose a very rigid interpretation of the The most important elements are related to the connection between monitoring and evaluation leading to the dichotomy between rigidity and flexibility of the system proposed. They can be summarised in the following points: 1.
the proposition of the new CMES for the rural development as part of the CMEF; 2.
the focus of the evaluation process via EP; 3.
the inclusion of evaluation results in Chapter 7 of the "enhanced" Annual Implementation Reports (AIR).
Over the different programming periods, the evaluation of rural development has undergone a substantial process of definition and systematisation, including a strong standardisation. With the 2014-2020 CMES, the EC strengthens the vision of a system "one-fit-all"; it is a rigorous framework, able, by theory, to meet Europe-wide needs in terms of evaluation and knowledge [38]. We can say that the new CMES is intended to provide the definitive, structured, and common reference for evaluation practices in rural development [52].
The regulations substantially reaffirm the importance of evaluation as a fundamental tool to improve programmes and to promote shared learning; however, the guidance documents-e.g., the Technical handbook [50]-propose a very rigid interpretation of the CMES, firstly based on the common indicator system. In the Commission vision, the common framework is intended to provide key information on the overall CAP implementation, on its realisations, results and on its impacts. It quantifies the actions in the different RDPs, describes their achievements, highlights which instruments are most efficient and verifies how objectives have been reached. To justify the resources committed, the outcome information needs to be credible, timely and should be used effectively to improve the implementation. In this picture, we see that monitoring and evaluation remain different exercises, but we can affirm they are strongly complementary, or even closely integrated.
Briefly, Regulation (EU) No. 808/2014 (art.14-Monitoring and evaluation system) set the Common Monitoring and Evaluation System, that includes common context, result and output indicators and targets to be used for the performance review (the so-called indicator plan); the Common Evaluation Questions and the EP. MSs are encouraged to complement the common framework with the specificities of each programme [42], developing, for instance, RDP's specific questions in addition to the common ones [50], especially for strongly territorial issues, as well as LEADER [53]. However, the structure of the common system turned out to be much more rigid than expected and very demanding in terms of compliance and complexity, also because of the requirement to draw up the evaluation results in the framework of AIRs.
The EP is a mandatory component of RDP and it follows up on the concept of ongoing evaluation but shifting into the "during the programme" approach. It has often been pointed out that programmes have a specific life cycle and that the speeds between RDPs could be very different [35]. The EP is the instrument supposed to give a degree of autonomy and flexibility to the evaluation. It is a tool to plan all the evaluation activities during the programming period: how and when evaluation activities are going to be conducted. Contrary to the past programming period, there is no Mid-Term evaluation because experience showed the Mid-Term assessment timing is inadequate to generate valuable information for decision makers [9,[54][55][56][57][58][59].
We expected the introduction of the EP would give the MSs a greater autonomy to set up the assessment activities to be undertaken, because as a formal part of the RDP, it could boost to go over the minimum drafting requirements (Figure 3), both with reference to the EU evaluation objectives and, more importantly, to complement these with programmespecific evaluations [60]. The EP should have been used, first, by the MS for carrying out and following up the evaluation activities, and only secondly for reporting activities and results in the AIRs. Placing evaluation within the AIRs involves several critical aspects: it blurs the demarcation between monitoring and evaluation and the evaluation itself becomes too rigid in monitoring formats. We can say the enhanced AIRs replace the Mid-Term evaluation, but actually it leads MSs to double the former requirements for Mid-Term evaluation, making the process even more formal and less useful.
Furthermore, the enhanced AIRs format, especially the very articulated one proposed We consider this transition from an ongoing process to a during the programme one, via EP, as a very relevant point. This decision, a priori, could steer the role of evaluation in the life cycle of programmes as more efficient and effective. The system put in place, however, sounds too complicated and costly to offer real improvements. In several respects, the current framework frustrates the meaning of evaluation, relegating it to an ancillary function to monitoring and reporting. We can see this clearly when we analyse the evaluation section in the enhanced AIRs. In 2017 and 2019, AIRs contain the Chapter 7 as an additional element. In this chapter, MSs are requested to provide an assessment of programme results in the light of the progresses towards achieving the RDP's objectives and, if possible, an early assessment of impacts.
Placing evaluation within the AIRs involves several critical aspects: it blurs the demarcation between monitoring and evaluation and the evaluation itself becomes too rigid in monitoring formats. We can say the enhanced AIRs replace the Mid-Term evaluation, but actually it leads MSs to double the former requirements for Mid-Term evaluation, making the process even more formal and less useful.
Furthermore, the enhanced AIRs format, especially the very articulated one proposed for 2017, leads the evaluators to find solutions with evident weaknesses to provide at least some element to conform the required format. These efforts, which at the same time are as unreliable as they are mandatory, have raised the cost of evaluation and diverted resources from exercises that could be far more useful (in Italy, for example, evaluation activities have been reduced from EUR 20.1 million in 2007-2013 to around EUR 15 million in the current programming period). At least in Italy, Chapter 7 of the AIRs has been the subject of the largest number of observations by the EC services [61] and the quality of evaluation results sounded very poor [62,63].

Post-2020: The State of Play
The Commission's legislative package consists of three proposals: (i) a regulation covering the architecture and rules of National Strategic Plans (NSP) and the types of interventions to be implemented by MSs [63]; (ii) a regulation on the common market organisation [64]; (iii) a horizontal regulation on financing, managing and monitoring the CAP [65]; the reform package is accompanied by an impact assessment [66]. These proposals aim to make the CAP more responsive to current and future challenges, such as climate change or generational renewal, while continuing to support European farmers for a sustainable and competitive agricultural sector [67], by a more focused implementation and delivery. We point out that for the very first time, the two CAP Pillars work together with a real, complementary approach, under one strategic framework (Figure 3), focused on three general objectives, nine specific objectives and one horizontal (the Agricultural Knowledge and Information Systems). Based on this approach and complemented by new CAP tools, the European Commission considers that the CAP reform proposal is compatible with the Green Deal. In this dialogue, CAP strategic plans are called upon to pay particular attention and reflect the objectives of the European Green Deal [68]. Based on the territorial diagnosis and needs assessment, Member States will have to indicate the appropriate choices so that the NDPs can contribute effectively. The European Commission will verify the consistency of the strategic plans with the aggregate objectives of the Green Deal, but the progress towards the Green Deal objectives will be monitored through the indicators already proposed for the future CAP [69].
The draft regulations indicate a new model of governance and division of responsibilities between the Union level and MSs [70,71], with the aim to increase simplification and flexibility. The Commission proposal is intended to shift the focus of the CAP from compliance of rigid requirements, towards the achievement of set results, but with stronger common performance elements. The so-called NDM represents the implementing mechanism to enhance the shift from a compliance-based to a performance-based system [72]. The NDM provides, for each MS, the drawing up of one NSP, based on the synergic action of different operations under both pillars of the CAP (European agricultural guarantee fund and the European agricultural fund for rural development). Some authors [70][71][72][73][74][75][76] consider the introduction of the strategic plan at MS level a crucial element, since it intends to give MSs greater freedom and responsibility to design and implement their own policies, while the objectives are still defined at EU level [75]. The MSs determine the NSP on the base of a comprehensive intervention logic, starting from an evidence-based needs assessment [73]. In the political debate, in some cases Commission proposals about the NSP have been framed in terms of the 'renationalisation' of the CAP. However, the majority of authors recognise, for NSP, more flexibility but no renationalisation of the CAP. As pointed out by De Castro et al. [77], "In concrete terms, this means that the EU defines a series of basic parameters (in terms of objectives, types of intervention and minimum requirements), while the Member States, within a common general framework, choose the most appropriate solutions for their specific contexts, to allow for, according to the ambitions expressed by the Commission, the maximisation of their contribution to the objectives of the Union".
A more result-oriented policy requires the establishment of a solid Performance Framework (PF) that, based on a set of common indicators, could allow to assess the performance of the new CAP and to promote a learning process (art. 116 (COM(2018)392 final). As outlined by the Commission [78] (pp. [3][4], "the result orientation of the future European Structural and Investment Funds is based on three pillars: a clear articulation of the specific objectives of programmes with a strong intervention logic (the result orientation of programmes) and result indicators with definitions and measurable targets; the introduction of ex ante conditionalities to ensure that the necessary prerequisites are in place for the effective and efficient use of Union support; and the establishment of clear and measurable milestones and targets to ensure progress is made as planned (performance framework)".
The PF refers to the elements needed to monitor progress towards the targets (performance review and clearance), as well as to conduct a continuous evaluation process of the NSP. Art. 121 (COM(2018)392 final) proposes an annual report on the implementation of strategic plans, called Annual Performance Report.
A new Performance Monitoring and Evaluation Framework is proposed. It will cover the performance of all instruments of the future CAP and it is like an evolution of the 2014-2020 CMES, as highlighted in the assessment report accompanying the regulatory proposals [66]. The Commission proposes, in relation to the EU specific objectives, a set of common indicators (context, output, result and impact indicators) as basis for monitoring, evaluation and the annual performance reporting (Figure 4). The Performance Monitoring and Evaluation Framework will be organised around the following principles: • context indicators remain pertinent in the intervention logic set up and follow up; • common output indicators will annually link expenditure with the performance implementation (performance clearance); • a set of result indicators will be used to reflect whether the supported interventions contribute to achieving the EU specific objectives; • annual performance follow-up will rely on a limited, but more targeted, list of common result indicators (performance review). It is currently supposed the use of only one indicator for each EU specific objective; • multiannual assessment of the overall policy is proposed based on common impact indicators (evaluation).
In a nutshell, what is proposed in the New Delivery Model is a framework, clearly defined and enforced, to report the NSP performance through the operations identified in the intervention logic in relation to the Specific Objectives. The ambition is to seek simplification, greater results orientation and greater efficiency and effectiveness of the CAP; however, the framework sounds much more inelastic and mechanical than in the past and it appears difficult to be handled.

indicators (evaluation).
In a nutshell, what is proposed in the New Delivery Model is a framework, clearly defined and enforced, to report the NSP performance through the operations identified in the intervention logic in relation to the Specific Objectives. The ambition is to seek simplification, greater results orientation and greater efficiency and effectiveness of the CAP; however, the framework sounds much more inelastic and mechanical than in the past and it appears difficult to be handled. It is evident that the NDM is mainly based on a performance review, so what is the function and role of evaluation?
While operative-wise the common evaluation elements should be established in the implementing regulations, the NSP regulation proposal contains provisions for the exante and ex-post evaluations, as well as all other evaluation activities, and states MSs shall draw up an EP indicating all intended activities. Contrary from current RDPs, including evaluation results in the AIR is not mandatory, only the report for the last year includes a summary of all evaluations conducted; however, the monitoring committee shall examine the progress made in evaluation activities. MSs will send to the EC only ex-post evaluations, whilst all evaluations should be public and available. As in the past, functionally independent experts will carry out these evaluations and MSs shall ensure that procedures are in place to produce and collect the data necessary for evaluation activities.
From the Commission side, it shall establish a multiannual evaluation plan of the CAP to be carried out under its responsibility (art. 127). The Commission will be undertaking an overall interim evaluation by the third year of implementation, to examine the effectiveness, efficiency, relevance, coherence and added value of the European Agricultural Guarantee Fund and the European Agricultural Fund for Rural Development. The Commission shall implement also an ex-post evaluation.
In February 2019, the Council provided a working paper [79] intended to give some specific answers to MSs' questions on what kind of evaluations exactly they have to set up. MSs will implement evaluations of the NSP by improving the quality of the design and implementation, in relation to their contribution to the CAP general, specific and cross-cutting objectives. The evaluation shall judge the Plan's effects based on the common indicators/targets and detail the degree to which the NSP can be considered relevant, effective, efficient and able to activate Union added value. The evaluation shall include lessons learnt to identify any gaps/problems or any item to improve the actions or their results and impacts.

Annual Performance Clearance
Linking expenditure to output Monitoring Policy performance Annual Performance Review Checking progress towards targets

Evaluation
Assessing performance towards objectives

Common Result Indicators
Common Impact Indicators It is evident that the NDM is mainly based on a performance review, so what is the function and role of evaluation?
While operative-wise the common evaluation elements should be established in the implementing regulations, the NSP regulation proposal contains provisions for the exante and ex-post evaluations, as well as all other evaluation activities, and states MSs shall draw up an EP indicating all intended activities. Contrary from current RDPs, including evaluation results in the AIR is not mandatory, only the report for the last year includes a summary of all evaluations conducted; however, the monitoring committee shall examine the progress made in evaluation activities. MSs will send to the EC only ex-post evaluations, whilst all evaluations should be public and available. As in the past, functionally independent experts will carry out these evaluations and MSs shall ensure that procedures are in place to produce and collect the data necessary for evaluation activities.
From the Commission side, it shall establish a multiannual evaluation plan of the CAP to be carried out under its responsibility (art. 127). The Commission will be undertaking an overall interim evaluation by the third year of implementation, to examine the effectiveness, efficiency, relevance, coherence and added value of the European Agricultural Guarantee Fund and the European Agricultural Fund for Rural Development. The Commission shall implement also an ex-post evaluation.
In February 2019, the Council provided a working paper [79] intended to give some specific answers to MSs' questions on what kind of evaluations exactly they have to set up. MSs will implement evaluations of the NSP by improving the quality of the design and implementation, in relation to their contribution to the CAP general, specific and cross-cutting objectives. The evaluation shall judge the Plan's effects based on the common indicators/targets and detail the degree to which the NSP can be considered relevant, effective, efficient and able to activate Union added value. The evaluation shall include lessons learnt to identify any gaps/problems or any item to improve the actions or their results and impacts.

Final Remarks and Two Questions for the Future
The evaluation of the CAP is part of the broader picture of the EU's determination to strengthen evidence-based policy-making and good governance [80].
During these last 30 years, the Commission has developed a systematic and integrated approach to evaluation and its objectives have anyway remained twofold: (i) to provide knowledge and inputs for setting priorities and improving the formulation and design of policy measures, (ii) to enhance accountability and legitimacy of European policy action. In rural development in particular, important elements have been introduced, both in theoretical and operational terms, contributing to build a common assessment capacity and to involve stakeholders in assessment processes [9,13,41,46,56,81]. However, several critical elements must be highlighted, especially in terms of conformity and homogeneity [66]. The schematic overview we made and the post-2020 proposals lead to think that the evaluation toolbox is filling up, but we observe also that it is still not able to guarantee enough soundness in judgments, clarity in communication and support in policy choices, since it is still unclear how and through which means MSs could valorise their peculiarities and use evaluation findings to feed their own specific cognitive needs. Nevertheless, the current picture is still rather uncertain, both for the Commission and the MSs [6,73,82,83].
So far, we have a partial view of what the Commission expects from evaluation in the next programming period, but we can agree with the impact assessment accompanying the legislative proposals [66]: the CAP evaluation should evolve to reflect changes in CAP instruments and priorities [80,84,85]. The second pillar evaluation has increased, on the one hand, the common culture and interpretation and, on the other hand, it has developed approaches and refined techniques (e.g., the counterfactual). Unfortunately, often the request for compliance has been stronger than the willingness to understand and learn, mortifying the evaluation results and their use [9,[86][87][88][89][90][91].
A priori, the shift to a performance-based policy is welcomed, as well as proposed simplification, but our essay put the proposed evaluation framework firstly in line with past experiences. Future evaluation provisions are more concise, but perhaps more generic than in the past and some authors warn against too general indications, as they see the risk of weakening the evaluation commitment ("as seen in the past, this very often implies a reduced coverage of the evaluation" Naldini [14]).
The Commission has contributed, through the growth of its demand, to build an evaluation, and monitoring, mainly focused on EU specific objectives and MSs have responded to varying degrees, often by expressing alignment with common questions on specific and punctual aspects of RDPs, rather than through the development of their own questions at territory and programme level. Some authors, especially in relation to the Cohesion Funds [14], believe that MSs should revert to programme and territorial evaluations and perhaps we can suggest a similar interpretation for the CAP. With the introduction of the NSP, it would be of great interest to shift the focus of the evaluation from individual components (e.g., 18 common evaluation questions on the focus areas in 2014-2020) to the programme as a whole: what overall change have we been able to activate in our territories?
The development of evaluation questions by the Commission is characterised by a very traditional approach on the proposition of evaluation criteria (effectiveness, efficiency, relevance, coherence, performance and added value) and the strict link with the common system of indicators. These two aspects together induce the MSs to carry out evaluations in an overconformed way, looking more for compliance than knowledge and learning, while the necessity we highlight is to strengthen and improve the evaluation process: its quality, and its usefulness more than its conformity and comparability. To do this, we do not need a revolution of the framework in the legislative proposals for the new CAP, but a clarification, in the light of lessons learnt from the past.
As we have learnt, a very rigid monitoring and evaluation framework, however useful it may be for monitoring activities, ends up depressing the evaluation ones [9,12,16,38,61]. The resources, both budgetary and human, for the evaluation are a defined amount and too often this amount has been consumed to answer the common compulsory parts, while specific evaluation issues to give answers to MS needs have been increasingly reduced. A common framework at EU level is obviously necessary, but it cannot be a kind of black hole that absorbs all the resources. This evidence, which has always been present, has been exacerbated in 2014-2020, where specific evaluation activities at MS level have been minimised, despite statements by the Commission in terms of evaluation specificity and flexibility [63,64]. In the current period, we have seen precisely this: the mandatory requests to compile the highly structured Chapter 7 of the AIRs have absorbed too many resources, depressing the specific RDPs evaluation questions and activities. For instance, in the synthesis of the evaluation components of the 2017 enhanced AIR [63] we can observe that the list of programme specific evaluations questions is very poor: none in Germany, three in France and only one in Italy.
We think the EP is an important and powerful tool for evaluation governance, but making it mandatory in RDPs has crystallised it, while it was expected to offer flexibility and give the opportunity to tailor evaluation for MSs. It is, therefore, necessary to let EPs be concretely flexible and supportive, in a broader evaluation design.
The evaluation design is the correct place to manage the process and face the need of a robust methodological framework, with the concrete goal to understand the change that the RDP has been able to trigger, if it has. It is a question of finding the best approach to handle high levels of complexity, on the one hand, and to identify standards to ensure quality, credibility and the usability of evaluation results, on the other one.
Finally, to provoke the debate, we can suggest to main questions.
6.1. To Be Performing or Not to Be Performing?
The word performance is certainly the term that recurs the most in the proposals on the future framework, starting from the name itself (Performance Monitoring and Evaluation Framework), but this cannot be the only focus we need. We do not underestimate the importance of ensuring the performance and the accountability of public spending but, from the Commission proposals, it seems to be the main goal of evaluation.
We cannot agree. The role of assessment should lead, in primis, to identify evaluation questions in the appropriate institutional setting: actors, interests, approaches and focuses. However, what we can see now-regulation drafts, non-papers, presentations, comments-is that evaluation is mainly labelled in terms of a mere performance analysis. This is the issue of evaluation closer to the mainstream of EU regulatory and it is a pretty much a narrow way of thinking on the rural development evaluation system. From that perspective, most academics and think tanks question the relevance and effectiveness of the performance indicators proposed by the EC also in the light of the European Green Deal [68]. The evaluation has wider boundaries than performance analysis, as well as thematic and territorial readings, which are a requirement in a complex evaluation process, especially from the perspective of a common evaluation culture, based on an in-depth understanding of policy making and implementation [40,92].
The shift to a result-based approach should not depress the identification of a set of specific evaluation questions for each territory, beyond the common EU-level questions and the performance framework, and it should not drown evaluation in too many obligations and in the monitoring reporting. In the case of the new CAP, however, we can estimate the opposite; we believe that the increase in the monitoring and reporting requirements and a too much present performance analysis will reverberate directly on the evaluation. This will lead to additional complexities and will further increase the administrative burden for MSs, reducing the use and the usefulness of the evaluation.
In our opinion, the evaluation design is the correct place where to face the question of methodological robustness and capacity to tackle high levels of complexity. In addition to answering common questions enabling comparisons at EU level, evaluation designs should be flexible enough to address programme needs, taking into account their specificities, and to foster a wider and more active participation of programme bodies, stakeholders and beneficiaries in the evaluation exercises with the twofold objective to enhance evaluation findings ownership and, therefore, improve decision-making processes. This can be done by gauging evaluation designs to programme strategies through logic models as well as by leaving more room to the use of participatory methods and tools (or mixed approaches) enabling a fruitful mutual exchange between programme actors and evaluators. Based on our practical experience, the involvement of all the subjects contributing to the programme implementation (through brainstorming, focus groups and other methods alike) and the subsequent follow-up joined with regular communications on evaluation findings enable to build a shared vision on programme objectives as well as on possible corrective measures aimed to guarantee a higher quality of programmes management and implementation. The primary objective is to allow evaluation users to focus on their daily needs without perceiving evaluation as a mere fulfilment of formal obligations, but considering it as an efficient instrument to highlight their achievements and to overcome the bottlenecks met during implementation basing on sound and robust methodologies as well as on lessons learnt.
This leads to the very last point which is considering the final users of evaluation findings. Keeping them in mind while designing the new evaluation framework would give greater meaning to evaluation criteria and orientate assessment to a more concrete acknowledgement.

Sound Evidence and Learning, but for Whom?
This question is intended to reflect on the meaning of proposed evaluation criteria, such as relevance, effectiveness, etc., and to orientate assessment to real relevant-wise policy knowledge. In this sense, the debate in the rural development evaluation community should be less focused on the methodological point of view, where the counterfactual approach cannot be the only point to discuss; in our opinion, it is time to return to the main question on how to make evaluation usable, useful and used, as well as reliable, for rural development actors, rather than for academics. We can affirm the objective of reinforcing the evaluation capacity and quality by the mutual efforts of both the EC and the MSs is still crucial nowadays. It is possible, and hopeful, to reduce the regulatory burden and let evaluations be focused on the real stakeholders' needs, also to the detriment of an orthodoxy, now useful only more for theory than for practice, and of an overly stringent analysis of policy performance. The focus remains on understanding how a policy may have triggered the expected change in an area, to provide useful and robust guidance to those who live and manage that territory.  Data Availability Statement: More information and the full data can be requested from the authors of present work.

Acknowledgments:
The authors thank the Italian Ministry of Agricultural, Agri-food and Forestry Policies for supporting this study and all the multifunctional farms that were involved in the survey for providing the fundamental information basis of this work.

Conflicts of Interest:
The authors declare no conflict of interest.