Next Article in Journal
Spatial Decision Support System for Last-Mile Logistics: Optimization of Distribution Storage in Ciutat Vella (Valencia)
Previous Article in Journal
Measurement and Scenario Simulation of Territorial Space Conflicts Under the Orientation of Carbon Neutrality in Jiangsu Province, China
Previous Article in Special Issue
Bridging Agriculture and Renewable Energy Entrepreneurship: Farmers’ Insights on the Adoption of Agrivoltaic Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing Rural Economies Through Young Farmer Support: A Romanian Case Within the European Union Policy Framework

by
Aurelia Ioana Chereji
1,
Nicolae Bold
2,*,
Monica Angelica Dodu
1,
Ioan Chereji
1,
Cristina Maria Maerescu
1,*,
Doru Anastasiu Popescu
2 and
Irina Adriana Chiurciu
3
1
Department of Animal Husbandry and Agrotourism, Faculty of Environmental Protection, University of Oradea, 26, Gen. Magheru Street, 410048 Oradea, Romania
2
Department of Mathematics and Computer Science, Faculty of Sciences, Physical Education and Computer Science, Pitesti University Center, National University of Science and Technology POLITEHNICA Bucharest, 1, Târgul din Vale Street, 110040 Pitesti, Romania
3
Faculty of Management and Rural Development, University of Agronomic Sciences and Veterinary Medicine Bucharest, 59, Mărăști Boulevard, 011464 Bucharest, Romania
*
Authors to whom correspondence should be addressed.
Land 2026, 15(1), 131; https://doi.org/10.3390/land15010131
Submission received: 28 October 2025 / Revised: 18 December 2025 / Accepted: 27 December 2025 / Published: 9 January 2026

Abstract

The establishment of a young farmer in the rural economy is a key stage in the process of farm succession in the rural development environment. In this matter, Pillar II of the Common Agricultural Policy (CAP) has a distinct approach related to financing the initiatives of this establishment. A young farmer can obtain funds for their agricultural activity by submitting a funding project proposal to the national agency. The success of a funding project proposal depends on various factors. In this paper, a model of prediction and classification using supervised learning algorithms, primarily Random Forest (RF) and Logistic Regression (LR), was developed to predict project selection outcomes and identify the key determinants of success. This was developed in relation to proposals submitted in the period 2014–2021 through Sub-Measure 6.1 and through the intervention for the young farmer installation intervention under the 2023–2027 CAP Strategic Plan (DR-30 (2023–2027)—Young Farmer Installation, indicated in this paper as DR 30) for the period of 2023–2027. Using the historical data related to this proposal, several models that use automated learning were developed in order to predict the success of a proposal based on specific determinants. In addition, a classification model was used to determine patterns in the proposal data, obtaining several project proposal clusters with common characteristics. The variables and selection criteria with the greatest impact on the final score and probability of acceptance were identified, highlighting the differences between sub-measures and the implications for generational renewal policies in rural areas. The novelty of this study lies in the integration of predictive modeling, classification, and clustering within a unified, policy-oriented analytical framework applied to real administrative data. The results reveal that project selection outcomes are driven primarily by formal scoring components, while structural characteristics such as farm economic size and planned investment play a secondary but consistent role across programming periods. These findings provide actionable insights for refining selection criteria and advisory mechanisms under the Common Agricultural Policy.

1. Introduction

1.1. The Context of the Funding Process

1.1.1. Contextual Information

Rejuvenating new generations of farmers is essential in improving the performance and competitiveness of Romanian agriculture [1]. Romanian agriculture faces a major challenge: the aging of farmers. Due to limited resources and outdated equipment, older farmers have difficulty producing for the market. As a result, a large part of their production is used for their own consumption [2].
Across the European Union, rural areas face simultaneous pressures of demographic aging, declining population, and shrinking labor availability, all of which threaten the long-term viability of agricultural systems. Young farmer support schemes have therefore become central instruments in addressing structural decline by encouraging generational renewal, investment capacity, and the continuity of family farms. The demographic context is widely documented in both economic and social research, highlighting the need for policy measures that can counteract depopulation and strengthen rural resilience.
Therefore, this paper analyzes the financial years 2014–2020 and 2023–2027 through the perspective of the funds allocated for the first installation of young farmers in the management of a holding. It is important to mention that this support was provided within the following support measures: for the period related to the National Rural Development Programme (PNDR) 2014–2020 Sub-Measure 6.1, supporting the installation of young farmers. This had a financial allocation of EUR 580,612,526 for the current programming period within the Strategic Plan Common Agricultural Policy, 2023–2027 (PS PAC 2023–2027) (Intervention DR 30—support for the installation of young farmers, with an allocated value of EUR 250,691,764) ([3,4]).
Based on these data, this study uses machine learning-based analysis methods (predictive, classification, and cluster analysis models), applied to the mentioned data, provided by the Agency for Rural Investment Financing (AFIR) for measures 6.1 and DR-30, in order to identify the determinants of the score and the probability of selection. The obtained results enable the formulation of concrete recommendations regarding the criteria and conditions that can support the installation of young farmers and, implicitly, the renewal of generations in the Romanian agricultural environment. Generational renewal refers to the replacement of aging farm holders by younger entrants and represents a core objective of the Common Agricultural Policy.

1.1.2. Funding Dynamics

Established in 1957, the Common Agricultural Policy (CAP) has undergone numerous reforms, with Pillar II (rural development) playing a crucial role in encouraging investments, environmental protection, and, particularly, supporting young farmers, facilitating generational renewal in agriculture. The Common Agricultural Policy is one of the most significant drivers of agricultural land prices across Europe [5]. It is important to emphasize that the EU membership and the adoption of the Common Agricultural Policy have had different consequences for structural change in different Member States [6]. The continuous depopulation and land abandonment in rural areas of the EU, coupled with the predominance of an aging agricultural workforce, have brought generational renewal to the forefront of policy discussions since the 1970s, highlighting its crucial role in revitalizing both the farming sector and the broader rural landscape [7,8].
According to some authors [9,10], the participation of local communities, the recognition of regional specificity, and knowledge of local conditions are the starting points for formulating a comprehensive rural development strategy. Thus, for Romania, an essential aspect of Pillar II of the Common Agricultural Policy is the support of young farmers and the facilitation of generational renewal in agriculture [11].
In Romania, the aging of the rural population and the lack of attractiveness of agriculture for young people are major challenges. In order to stimulate the installation of young farmers, the Common Agricultural Policy 2023–2027 transposed at the level of Romania through the Common Agricultural Policy Strategic Plan 2023–2027 (PS PAC 2023–2027) offers to following: support for the installation of young farmers, measures to facilitate access to agricultural land and credit, vocational training programs, and innovation for the modernization of farms [12]. Rather than representing a first-time intervention, current measures represent a policy adjustment, building upon the foundations of PNDR 2007–2013, such as Measure 4.1, and have provided instruments facilitating land acquisition or leasing. The current framework places renewed emphasis on land access as part of a broader strategy to support generational renewal. Romania, as a Member State of the European Union (EU), has benefited from financial support through the Common Agricultural Policy (CAP) to address a major problem for agriculture, namely support for the establishment of young farmers. The rural development programs implemented in Romania in the periods 2007–2013 and 2014–2020 and the current Common Agricultural Policy Strategic Plan 2023–2027 (PS PAC 2023–2027) had the attraction of a new generation of farmers to rural areas as one of the areas for intervention.

1.1.3. Young Farmers

“Young farmer” is used here to refer to a person who is under 40 years of age at the time of application, who has the appropriate professional skills and qualifications, and who is setting up for the first time in an agricultural holding as the head of that holding [13,14]. At the same time, the beneficiary of young farmer support can also be a legal entity with several shareholders, where a young farmer (as defined in art. 2 of Regulation (EU) No 1305/2013) is set up and exercises effective long-term control over decisions relating to the management, benefits, and financial risks related to the holding, while holding at least 50% + 1 of the shares [1,15,16,17,18]. Financial support for young farmers was introduced starting in 2014 through The Young Farmers’ Payment Scheme, which has become a mandatory Programme for all Member States of the European Union [19].
The biggest problems faced by young farmers in Romania are insufficient funding within rural development programs (for example, PS PAC 2023–2027 and DR 30—5611 funding applications submitted—3580), which are selected to cover the amount allocated within the entire funding period, land fragmentation (holdings with dispersed land, which prevents the practice of high-performance agriculture), lack of cadastre, problems related to the marketing of production, and climate change factors that require modification in crop plans and in the varieties used. The age structure of farm managers in Romania is not favorable, reflecting the general trends of population aging, a phenomenon also present in other European Member States [20]. The new Common Agricultural Policy highlights these challenges and was developed with the aim of being the primary support for farmers.
The support is granted to facilitate the start of agricultural activities for the establishment of the young farmer. Thus, by fulfilling the objectives proposed in the Business Plan, the young farmer is considered to have established himself, and the support granted through the EAFRD has achieved its strategic objective [16]. The support for the establishment of young farmers in Romania was very well received by potential beneficiaries, so that the number of funding applications submitted in the periods analyzed far exceeded the amount of support allocated according to the official reports of the Agency for Rural Investment Financing (AFIR). During the previous financial framework period, the reluctance toward European funds was greater, but over time, farmers’ confidence has increased [21].
There are authors who consider that the apparent shortage of young farmers occurs in Member States where small-scale holdings are more prevalent, particularly Portugal, Italy, Romania, and Greece [22]. At the same time, farms that have been managed by the same family over several generations are more likely to be transferred to new generations of the same family [23]. This is also the case in Romania, as young people choose to remain in rural areas and take over agricultural operations through the support offered for the installation of young farmers. Intergenerational farm transfer in particular is increasingly viewed as fundamental to the sustainability and development of global agriculture, and financial support is very important [24,25]. In Romania, intergenerational farm transfer is constrained by fragmented land ownership, limited succession planning, and demographic pressures, as documented by national empirical studies on farm structure and demographic aging [8,26,27]. These structural patterns reduce the pool of potential successors and reinforce the vulnerability of small and aging farms. Young farmer support serves not only as a precondition for increasing the educational level of farmers but also as a tool to stop emigration from new EU Member States rural regions [21,28,29]. It is also important to specify that at the level of Romania, this support for the installation of young farmers had a special allocation allocated for young people returning from the Diaspora, and the interest was considerable, so that the allocated amount was requested.
A comparative analysis of the type of support provided to young farmers in Romania can be correlated with examples from other European Union Member States that are faced with similar agricultural structures. According to Eurostat data, in Romania, over 92% of farms are under 5 hectares, which indicates a fragmented agricultural structure. Similar situations can be found in Bulgaria, as well as in Poland, where approximately 75% of farms are under 10 hectares. However, Poland has managed to improve this situation through dedicated measures within the Common Agricultural Policy (CAP), in particular through land consolidation policies and institutional facilities created for young farmers. Thus, mechanisms have been implemented through which they can access land through lease or purchase with the support of state agencies. In Romania, the lack of these mechanisms has led to increased difficulty in accessing land, which affects the competitiveness and economic viability of young farmers’ farms. Therefore, Poland’s experience could serve as a model for formulating complementary national policies that would facilitate access to land resources and support the transition to economically and agronomically viable farms. This study takes similar aspects as related to some situations in other parts of the European Union [30,31,32].
Regarding the eligibility of young farmers to receive support for taking over agricultural holdings, it is important to mention that applicants for support must meet specific conditions related to age, professional skills, and farm management. Support for the setting up of young farmers is also highlighted by other authors who mention that an important moment at the implementation of the specific Common Agricultural Policy objectives 2021–2027 is generational renewal [33]. The importance of the measure in attracting young, educated farmers to agriculture was emphasized, also highlighted in previous studies [34,35].
This study investigates the factors influencing the success of young farmers’ funding proposals within Romania’s rural areas, specifically focusing on the financial aid provided by Common Agricultural Policy measures 6.1 (2014–2021) and DR 30 (2023–2027). Our methodology rigorously analyzes key operational data from project proposals submitted during these periods, sourced from national agency databases. This selection ensures alignment with the relevant implementation phases of the analyzed policies, offering a robust framework for understanding recent developments and enabling a structured comparison between two successive funding cycles; while the study’s scope is limited to these specific operational intervals, future research could expand the analysis to include broader historical data and subsequent results, as they become available, to identify longer-term trends.
While numerous studies have explored the implementation of the Common Agricultural Policy (CAP) and the challenges faced by young farmers across the European Union, there remains a notable gap in empirical research using quantitative, data-driven approaches, particularly those involving predictive modeling, to evaluate the determinants of project success at the national level. This is especially true in the case of Romania, where programmatic evaluations often lack analytical depth and fail to systematically assess how selection criteria influence outcomes.
While similar machine learning frameworks have been applied to young farmer support in other Member States, such as Lithuania (e.g., [28]), such approaches remain limited in scope and country coverage. This study extends the existing evidence by applying a comparable methodological framework to the Romanian context, characterized by distinct institutional arrangements and scoring mechanisms. This study addresses this gap by developing and applying predictive and classification models to real-world project data submitted under Measures 6.1 (2014–2021) and DR-30 (2023–2027) in Romania. Through this approach, we identify the most influential criteria that shape the probability of funding success while also uncovering distinct clusters of applicants with common characteristics. The results not only contribute to a better understanding of the current policy performance but also offer actionable insights that can inform the redesign of evaluation criteria, the development of support tools for applicants, and the strategic alignment of future rural development programs. Building on this foundation, future research could extend the model by integrating environmental indicators, socio-demographic profiles of applicants, and longitudinal success rates, offering a more holistic view of farm sustainability and the dynamics of generational renewal.

1.2. Conceptual Background and Research Objectives

The evaluation of support schemes for young farmers is a central theme in the wider discourse on agricultural policy, generational renewal, and rural development in the European Union. Young farmers face well-documented structural barriers related to land access, capital constraints, limited professional experience, and regional disparities, making targeted policy instruments a key component of Common Agricultural Policy objectives. Understanding how these instruments function in practice, and which factors shape access to funding, is therefore essential for assessing the effectiveness and fairness of the support mechanism.
Research on generational renewal has examined the socio-economic characteristics of young farmers [22], the institutional and structural determinants of farm succession, as well as the performance and targeting of rural development measures [23]. While Common Agricultural Policy Pillar II includes important instruments aimed at supporting young farmers and facilitating generational renewal, several studies have shown that its overall impact on demographic turnover remains limited, with structural constraints and market dynamics continuing to hinder succession processes. However, empirical evidence on how scoring criteria, applicant attributes, and contextual factors interact to influence selection outcomes remains limited, particularly in the case of Romania. This gap highlights the need for analytical tools capable of integrating complex relationships while maintaining interpretability for policy discussion.
This study contributes by shifting from traditional regression-based assessments to supervised classification models that can better capture nonlinear relationships and complex interactions underlying selection outcomes. This approach is particularly suitable for the Romanian context, where administrative scoring rules and applicant heterogeneity create decision boundaries that are not well modeled by linear specifications.
This analytical approach integrates three complementary components: descriptive statistical assessment (descriptive statistics) (STAT), supervised predictive modeling (supervised prediction) (PRED), and clustering-based profiling (clustering analysis) (CLASS). Together, these components form a diagnostic tool designed not merely to predict funding outcomes but to reveal the mechanisms through which selection criteria, project characteristics, and applicant profiles influence access to support. Unlike standard predictive workflows, the framework is explicitly grounded in Common Agricultural Policy logic, connecting model outputs to generational renewal objectives, scoring structures, and structural constraints affecting young farmers.
The present study addresses this gap by proposing and applying the SPCC–CAP framework, a structured approach designed to analyze the determinants of project success within the young farmer support scheme. The framework integrates descriptive statistics (STAT), supervised prediction (PRED), and clustering analysis (CLASS), enabling the examination of scoring dynamics, selection probabilities, and applicant profiles within a coherent evaluative logic. It is conceived not as a purely technical modeling exercise but as a diagnostic tool that links data-driven evidence with mechanisms relevant to Common Agricultural Policy design, such as score formation, timing effects, and structural constraints affecting young farmers.
The research objective is to identify the key factors that shape access to support for young farmers and to assess whether the functioning of the scheme aligns with its intended goals of generational renewal, competitiveness, and territorial balance. The analysis is guided by the following research questions:
1.
Which applicant and farm characteristics most strongly influence score formation and selection outcomes?
2.
Do these determinants reflect the priorities stated in the young farmer support measure?
3.
Are there identifiable groups of applicants who systematically benefit from or are disadvantaged by the selection mechanism?
By articulating these objectives and questions, and by situating the SPCC–CAP framework within the relevant policy and academic literature, theis study clarifies the conceptual purpose of the framework and establishes its contribution beyond descriptive summaries or isolated modeling components.
Positioning the analysis within this demographic context clarifies the relevance of the SPCC–CAP framework, as understanding which applicants achieve higher scores and which profiles face structural disadvantages provides insight into how effectively young farmer measures can contribute to reversing rural depopulation trends.
While the use of predictive analytics and machine learning techniques in evaluating agricultural policy instruments is growing, several recent studies have begun to apply such approaches, including in the context of young farmer support and structural change. The present analysis builds on this growing literature rather than claiming methodological novelty.
In addition to offering analytical guidance for applicants, this study is primarily intended to support paying agencies and policymakers by providing evidence on how scoring rules, applicant characteristics, and structural disparities shape access to young farmer support. The insights generated through the SPCC–CAP framework aim to inform program design and facilitate more effective targeting within Common Agricultural Policy implementation.

2. Materials and Methods

2.1. Purpose and Objectives

The main goal of this work is to use analysis methods based on machine learning and predictive statistical modeling to identify the factors and criteria that significantly influence the score and probability of selection of projects submitted by young farmers under Sub-Measure 6.1 and Intervention DR-30, in order to formulate recommendations for improving support policies and facilitating generational renewal in the agricultural environment.
The main goal of the work is not only to apply prediction methods but also to better understand the factors influencing project selection within the Common Agricultural Policy measures for young farmers.

2.2. Data Description

2.2.1. Data Source

In order to obtain the desired results of the mentioned purpose, two distinct databases of the projects for the two sub-measures (DB6 for the data related to the measure 6.1 and DB30 for the data related to the measure DR-30) was compiled using official sources of data [36]. The size of the two datasets was determined to be 16,129 instances for DB6 and 5827 for DB30.

2.2.2. Data Structure

The input data consisted of a compilation of the main characteristics of the project funding proposals. Among these characteristics, several can be enumerated:
1.
identification data related to the project proposal, such as the measure and sub-measure code;
2.
geographical data, such as the NUTS-2 region code (from 1 to 7), the county classification code (from 1 to 42), the county name according to the county classification code, and the administrative unit (municipality);
3.
temporal data, related to the calendar data of funding proposal submission;
4.
legal representative data, such as the legal form of organization on behalf of the proposal was made (e.g., sole proprietorship, Limited Liability Companies—LLCs, etc.), its legal representative (surname and name), Value-Added Tax (VAT) code and the Standard Output ( S O ), value of the organization;
5.
technical assessment data, such as the proposal pre-scoring (auto-estimated score), the official assessment proposal score calculated by The Agency for Financing Rural Investments, the values of the scores for the specific criteria used in the computation of the official assessment score, the selection status, and the criteria of rejection (if the proposal was denied);
6.
financial data, related to the funding indicators: the eligible project value, the public value, and the total cumulated value of the project.
An extraction of several characteristics was made for the model. The selected characteristics that form the set of independent variables or factors are presented in Table 1.
A short observation must be made on the eligibility criteria for both sub-measures. In this matter, Table 2 and Table 3 show the explanations for the criteria for both sub-measures. The abbreviations related to NM and M from the second tables stand for “non-montainous” and “montainous”, according to the zonal classification used by AFIR within the PNDR. Furthermore, the 8000 threshold for S O reflects the minimum standard output required by AFIR eligibility criteria for young farmer projects, as defined in the national implementation guidelines for Sub-Measure 6.1. This program rule determines the minimum economic size necessary for an application to qualify. SelC represents the original categorical selection outcome (“Selected”/“Not selected”) as recorded in the AFIR database. SelN is the binary numerical encoding (1/0) required for machine learning algorithms that operate only on numeric inputs. SelN is therefore not an analytical duplicate of SelC but a technical transformation used exclusively for model training and evaluation.

2.2.3. Expected Data

Table 4 summarizes the expected output data generated by the predictive, classification, and clustering models, including the performance metrics, the importance of the explanatory variables, and the interpretation formats used for policy-relevant insights.
The model data will be obtained and interpreted in Section 3.

2.3. Methods

2.3.1. Descriptive Statistics—STAT

Descriptive statistics analysis (STAT) was used to provide an overview of the dataset, highlighting the distribution, central tendency, dispersion, and completeness of the variables used in the modeling process. This method employed the following tools:
  • Measures of central tendency: mean, median, and mode for numeric variables.
  • Dispersion indicators: standard deviation, minimum, maximum, and range.
  • Distribution analysis: histograms and boxplots for key variables.
  • Categorical analysis: frequency tables for county, region, and selection status.
  • Missing values assessment: proportion of missing data per variable.
  • Data quality check: identification of potential outliers and inconsistencies.

2.3.2. Predictive and Classification Models (PRED)

Supervised machine learning methods were applied to address two tasks: predicting the official AFIR score ( O S ) as a continuous target variable and classifying the project selection status ( S e l N ) as a binary target variable. The methods (also referred to as models) used are:
  • Linear Regression—models the relationship between predictors and a continuous target using a linear equation.
  • Logistic Regression—estimates the probability of a binary outcome based on predictor variables.
  • Random Forest—ensemble method using multiple decision trees to improve prediction accuracy.
  • Gradient Boosting—sequential ensemble method that builds models to correct the errors of previous ones.
  • k-Nearest Neighbors (kNN)—classifies or predicts based on the closest data points in the feature space.
  • Neural Networks—computational models inspired by the human brain, capable of learning complex patterns.
  • Stacking Ensemble—combines predictions from multiple models to produce a more accurate final result.
The predictive framework relies on supervised classification algorithms implemented in Orange Data Mining (University of Ljubljana, version 3.38), with Random Forest and Logistic Regression serving as the primary models. Random Forest was used with its default configuration (100 trees, bootstrap sampling, Gini impurity splitting), which is suitable for capturing nonlinear relationships and variable interactions commonly present in administrative scoring data. Logistic Regression was included as a benchmark linear classifier, estimated using an L 2 regularization scheme and the default solver.
Model performance was evaluated using Orange’s Test and Score widget, which provides a standardized and reproducible evaluation pipeline. The following validation metrics were used: Accuracy (overall classification correctness), AUC (discriminatory capacity), Precision and Recall (error distribution across classes), F1-score (harmonic mean of precision and recall), and the Matthews Correlation Coefficient (MCC), which is robust to class imbalance. These metrics were used consistently across all tested classifiers to ensure a comparable assessment of predictive capacity.
Model performance was assessed using standard regression and classification metrics (detailed descriptions are provided in Appendix A). The Neural Network model corresponds to the default multilayer perceptron (MLP) implementation in Orange Data Mining, which uses the scikit-learn backend. The default architecture consists of a single hidden layer with 100 neurons and ReLU activation, trained with the Adam optimizer and early stopping enabled. No additional regularization (e.g., dropout) is applied in the default configuration. These settings were retained to ensure transparency and reproducibility. All predictive models were trained using Orange Data Mining’s default evaluation procedure, which applies a 70/30 train–test split with automatic stratification for the imbalanced SelN outcome. Orange performs feature normalization automatically for algorithms that require it (e.g., Neural Networks, SVM), relying on standard z-score scaling. These default settings were retained to ensure transparency and reproducibility. Confidence intervals or standard errors are not available in Orange’s default evaluation framework and are therefore not reported here. Given the large size of the datasets, small differences between model metrics should be interpreted with caution, as they may not reflect substantive performance gaps.
In the context of this study, these models are not used only for prediction, but primarily as analytical tools to uncover the factors that shape access to support for young farmers. By modeling both the official score and the selection outcome, the PRED component allows us to quantify how specific criteria, farm characteristics, and temporal patterns influence the probability of receiving funding. This provides a structured empirical basis for evaluating the functioning of the support scheme and its alignment with the objectives of generational renewal in Romanian agriculture.

2.3.3. Clustering Analysis—CLASS

Unsupervised machine learning methods were applied to group projects with similar characteristics, without using a target variable. The following algorithms were used:
  • K-means —partitions data into k clusters by minimizing within-cluster variance.
  • Hierarchical Cluster Analysis (HCA)—builds a hierarchy of clusters based on data similarity, visualized as a dendrogram.
  • Silhouette Score—metric used to evaluate the quality and separation of the resulting clusters.
Clustering was performed exclusively on accepted projects because the aim was to identify internal structural patterns among successful applications. This approach inherently introduces a selection bias, as it does not reflect the full diversity of all applicants. The analysis therefore characterizes only the profiles of projects that met the program’s selection thresholds. Extending the clustering to the entire applicant pool is a relevant avenue for future research and would provide a more comprehensive representation of applicant heterogeneity.

2.3.4. Model Interpretation and Explainability (EXPL)

To interpret and explain the results of the predictive and classification models, we place particular emphasis on their ability to provide transparent evidence about how the support scheme operates in practice. Two complementary approaches are used:
  • Feature importance—measures the relative contribution of each input variable to the model’s predictions, indicating which criteria and farm attributes most strongly affect score formation and selection.
  • SHAP (Shapley Additive Explanations)—calculates the marginal contribution of each variable to individual predictions, based on cooperative game theory, and shows how specific combinations of characteristics increase or decrease the likelihood of funding for young farmers.
Within the SPCC–CAP framework, these explainability tools are not presented as purely technical diagnostics. They are used to evaluate whether the scoring grid and selection mechanism effectively prioritize young farmers with structural constraints, whether certain profiles are systematically advantaged or disadvantaged, and to what extent the observed patterns are consistent with the stated objectives of Common Agricultural Policy generational renewal policies.

2.4. Methodology

The methodological approach consisted of the following main steps:
Step 1:
Compilation of the input data: Data related to sub-measures 6.1 and DR-30 were collected from AFIR project records (2014–2020, 2021–present). Variables included official AFIR scores ( O S ), selection status ( S e l C / S e l N ), selection criteria scores ( C S ), financial indicators, and geographical identifiers.
Step 2:
Design and implementation of the components: The analysis was structured into three main components:
  • STAT—descriptive statistical analysis of the dataset.
  • PRED—supervised learning models for regression ( O S ) and classification ( S e l N ), structured in the following sub-steps:
    (a)
    Preprocessing—handling missing values, encoding categorical variables, and scaling numerical features where applicable.
    (b)
    Train–test splitting—dividing the dataset into training and testing subsets to enable model evaluation on unseen data.
    (c)
    Model training—fitting each algorithm to the training dataset.
    (d)
    Model application—generating predictions and probabilities for the test dataset.
    The models applied in this step include:
    Regression models: Linear Regression, Random Forest, Gradient Boosting, k-Nearest Neighbors (kNN), Neural Networks, Stacking Ensemble.
    Classification models: Logistic Regression, Random Forest, Gradient Boosting, k-Nearest Neighbors (kNN), Neural Networks, Stacking Ensemble.
  • CLASS—unsupervised clustering of projects using K-means and HCA, including cluster number determination (Silhouette score) and cluster profiling.
Step 3:
Determination of the performance of the models Model performance was assessed using standard evaluation metrics:
  • Regression—MSE, RMSE, MAE, and R 2 .
  • Classification—AUC, CA, F1, Precision, Recall, and MCC.
  • Clustering—Silhouette score.
All modeling results were generated using the default workflows implemented in Orange Data Mining.
Step 4:
Validation of the obtained data: Cross-validation techniques and consistency checks were applied to verify the stability and robustness of the models. Cross-validation was performed using Orange’s default random k-fold procedure. Because this method does not preserve the temporal ordering of applications, it does not eliminate the risk of look-ahead bias. A time-series cross-validation approach would be more appropriate for strictly longitudinal settings, and we acknowledge this as a methodological limitation of the present study.
Step 5:
Interpretation of the results: Feature importance and SHAP values were used to identify the most influential variables for prediction and classification, and to understand how they shape access to funding for young farmers. Clustering results were analyzed to profile typical project categories and applicant types, highlighting groups that face similar structural constraints or advantages. Together, these outputs were interpreted in the light of Common Agricultural Policy objectives, in order to assess the effectiveness, fairness, and targeting of the support scheme for young farmers.
To make the interpretative logic of the SPCC–CAP framework more explicit, Table 5 summarizes how each methodological component contributes to understanding the functioning, fairness, and effectiveness of the support scheme for young farmers. This structure ensures that machine learning models are used not only for prediction but also for generating policy-relevant insights.

3. Results

The results integrate statistical methods, supervised learning algorithms, and clustering techniques in order to offer a full data analysis of the project proposals within this framework.

3.1. Descriptive Statistics—STAT

The statistical component (STAT) of the model has the role of describing the data within the compiled dataset, providing insight for prediction and classification. The statistical analyses of the 6.1 and DR-30 sub-measures are presented, focusing on key metrics like mean, mode, median, dispersion, min/max values, and missing data. Data are visually represented through charts and geocharts to show geographical distribution and trends. Pearson correlation analysis is used to assess linear relationships between variables, such as the official and estimated scores. The dataset includes 16,129 projects for sub-measure 6.1 and 5827 for DR-30. These datasets represent all applications submitted to AFIR under the respective sub-measures during the analyzed programming periods, including both accepted and rejected proposals. The 16,129 records for Sub-Measure 6.1 correspond to approximately 10% of all applications submitted nationally between 2014 and 2020, while the 5827 records for DR-30 reflect 20% of all applications registered in 2023–2024. As the datasets include the full population of processed applications, they ensure complete representativeness and avoid sampling-related external validity concerns.

3.1.1. Descriptive Statistics (STAT)—Sub-Measure 6.1

Table 6 presents the descriptive statistics for the 16,129 projects submitted under Sub-Measure 6.1. The variables include project identifiers, administrative and temporal attributes, scoring criteria, financial values, and selection status. Missing data percentages reflect the fact that certain scoring criteria (e.g., C S 6 ) apply only to specific project types.
The average official AFIR score ( O S ) is slightly lower than the self-reported estimated score ( E S ), suggesting a moderate degree of overestimation in applicants’ self-assessments. On average, the estimated score ( E S ) exceeded the observed AFIR score ( O S ) by a mean absolute difference of 1.01 points. As Orange Data Mining software does not provide confidence intervals for this comparison, these values should be interpreted descriptively. Criteria C S 1 and C S 4 tend to have the highest contributions to O S , while CS6 values are missing for most projects, indicating selective applicability. The majority of projects fall into the non-mountainous category (61.2%), and about 64.8% were selected for funding. These descriptive patterns reveal structural differences among applicants that later translate into disparities in score formation and selection outcomes. At the same time, these patterns provide the structural baseline needed to interpret scoring and selection disparities, clarifying which applicant characteristics matter most for young farmer support.
As shown in Figure 1, these distributions provide essential context for interpreting the model results, as they reflect the structural diversity of applicants and projects underlying the observed selection outcomes. The data for Sub-Measure 6.1 show that most projects come from small and medium-sized farms, with an average standard output of around EUR 17,000 and a typical eligible value of EUR 40,000. The high share of projects from non-mountainous areas and their concentration in certain counties reflects both the distribution of agricultural potential and the accessibility of the measure. The structure of the scores obtained by criteria suggests a relatively balanced competition between applicants, with greater variations on the criteria related to qualification and investment. These results confirm the role of Sub-Measure 6.1 as an essential support mechanism for the installation of young farmers and the stimulation of generational renewal in agriculture. These distributions highlight structural and regional disparities that shape how young farmers engage with the support scheme and influence their scoring outcomes.

3.1.2. Descriptive Statistics (STAT)—Sub-Measure DR-30

The data for Sub-Measure DR-30 include a total of 5827 projects, characterized by a wide range of administrative, technical-economic, and geographical indicators. The information covers calendar aspects (year, month, day) and evaluation elements (estimated score and AFIR score), selection criteria (CS1–CS6 and sub-criteria), financial data (eligible value, public value, EAFRD funding), and geographical distribution at county and regional level. The detailed structure of these variables is presented in Table 7.
The variability in economic size, criteria scoring, and investment values highlights heterogeneity in applicant readiness and capacity across DR-30.
Compared to DB6, the DR-30 distributions (shown in Figure 2) reflect differences in applicant profiles and project characteristics, which are subsequently captured by the predictive and classification models. The statistical analysis for Sub-Measure DR-30 highlights a recent competition, held between November 2023 and January 2024, with 5827 projects submitted. The regional distribution is relatively balanced, but certain counties, such as Bihor, concentrate a higher number of applications. The average AFIR score is 59.68 (minimum 10, maximum 97.94), and the estimated one is 61.49 (minimum 30, maximum 99.5), reflecting a moderate variation. The eligible and public values are uniform, at EUR 70,000, indicating a standardized support ceiling. The selection criteria show variable applicability, with some sub-criteria having up to 36% missing values, which confirms the different focus of the projects. These data underline the role of the measure in supporting agricultural investments and the installation of young farmers. These patterns show how applicant structure and timing influence scoring and access to young farmer support under DR-30.

3.1.3. Comparative Descriptive Analysis of Sub-Measure 6.1 and DR-30

Table 8 presents the general structure of the datasets for both measures, including size, selection rate, missing data, time frame, and geographical coverage.
The data for 6.1 represent a significant volume of applications collected over several years, while DR-30 represents a recent, high-demand call with a narrow time frame. Missing data is more prevalent in 6.1, reflecting optional or inapplicable criteria. The density of projects per month highlights a major difference between the two sub-measures, with approximately 207 projects/month for 6.1, compared to almost 2914 projects/month for DR30, reflecting the much more focused and intense nature of recent DR30 sessions. Next, Table 9 summarizes the main financial indicators for both sub-measures.
According to the data, Sub-Measure 6.1 finances smaller and more heterogeneous projects, while DR-30 applies a fixed grant amount, reducing variability and simplifying financial evaluation. Next, Table 10 compares the scoring statistics and selection criteria patterns.
Certain criteria systematically reach their maximum values, indicating they are easier to achieve. O S distribution patterns suggest cut-off thresholds for project selection. Next, Table 11 presents the main categorical variables relevant to each measure. The systematic overestimation of E S may be partly explained by behavioral factors. Applicants tend to form expectations under limited information and often rely on heuristic interpretations of the scoring rules, which can lead to optimism bias. In addition, uneven access to advisory services and varying familiarity with administrative requirements may contribute to inaccurate self-assessment. These mechanisms help explain why E S consistently exceeds OS, highlighting both informational asymmetries and behavioral patterns in the application process.
Both measures maintain national coverage; however, DR-30 is significantly more time-concentrated. The detailed classification of applicants enables a more targeted diagnostic analysis of how support is distributed among young farmers.

3.2. Prediction Results—PRED

For both sub-measures, several machine learning algorithms were applied, for both regression (to predict the official AFIR score—OS) and for classification (to determine selection status—SelN), in order to capture performance differences between models with different architectures and learning mechanisms. We included these results, related to model performance, in order to determine the results related to feature importance and, thus, to determine the project funding dynamics. The data obtained in this step determines the ability of different models to predict the official score and selection status of projects, as well as to identify the variables (features) with the greatest influence on the final result.
Overall, the most influential determinants of project success were the applicant’s professional qualifications, the economic size of the farm, and the planned scope of the investment. These factors consistently appeared among the top predictors across both programming periods, supporting their central role in explaining selection outcomes.

3.2.1. Models Performance

The predictive models were evaluated separately for the two targets: the official AFIR score ( O S ) for Sub-Measure 6.1 and DR-30, and the binary selection status ( S e l N ). Performance was assessed using standard regression metrics for continuous targets, detailed in Appendix A. Table 12 summarizes the performance scores.
To ensure methodological correctness, Linear Regression is reported only with regression metrics (MAE, RMSE, R 2 ), as it predicts the continuous O S outcome. The stacking ensemble is presented solely as a robustness check and is not included in comparisons against the base learners. Overall, ensemble-based models show consistent performance advantages, although differences across algorithms remain moderate given the large sample size.
Table 12 summarizes the performance of multiple supervised learning models applied to both sub-measures for two tasks: regression (predicting O S ) and classification (predicting S e l N ). Multiple algorithms were tested to capture different data patterns and assess robustness, ranging from simple linear models to non-linear ensemble methods. The Stacking ensemble, although tested, is excluded from this comparison as it was designed specifically to combine the strengths of the other models and thus does not serve as a baseline.
For regression, Random Forest consistently showed the highest R 2 and lowest RMSE among the standalone models for both datasets, indicating its superior capability in modeling non-linear relationships between selection criteria and O S . In classification, both Random Forest and Gradient Boosting achieved near-perfect AUC and high classification accuracy for Sub-Measure 6.1, while for DR-30, Gradient Boosting slightly outperformed others in terms of AUC. These results suggest that tree-based ensemble methods are best suited for this type of agricultural project selection data.
Using a diverse set of models allows identifying the most robust and accurate algorithm for each type of task, highlighting the trade-offs between accuracy, interpretability and complexity. In practice, running these models will allow us not only to estimate the results before the official evaluation but also to extract the importance of variables (feature importance), highlighting the criteria and factors that contribute most to the selection of projects.

3.2.2. Data Results—PRED

PRED—Criteria Ranking Analysis
This specific analysis was conducted to determine the influence of each variable (criteria C S 1 C S 6 for both sub-measures) on the final AFIR score, separately from the main prediction models. These variables were not included in the prediction models because, through their direct and strong correlation with the final score, they would have influenced the performance and accuracy of the estimates (data leakage). Thus, their importance was assessed independently, using three different methods—univariate regression, RReliefF, and multivariate linear regression—to provide complementary insight into the determinants of the final score and to support the interpretation of the FI and SHAP results. Values for Sub-Measure 6.1 are shown in Table 13.
The ranking indicates alignment with the formal scoring structure rather than intrinsic differences in farm performance. The analysis shows that C S 4 (which assesses the project’s integration into the priority strategic objectives) has the strongest impact on the AFIR score, followed by C S 3 (which reflects the degree of innovation and adaptation to market requirements) and C S 2 (regarding the relevance of the investment in the regional and sectoral context). In contrast, C S 6 (criterion related to optional additional features) has a low contribution, suggesting that its influence on the total score is limited. These results indicate that the final score is strongly influenced by the project’s compliance with the strategic priorities and innovative elements, while the marginal criteria have a reduced role in differentiating applications. Next, values for Sub-Measure 6.1 are shown in Table 14.
Compared to Sub-Measure 6.1, the DR-30 rankings suggest increased concentration around core administrative criteria. The analysis of the relevance of the criteria for DR-30 indicates that C S 2.1 —which aims to adapt the investment to sectoral and regional needs—has the greatest influence on the final score, followed by C S 6.1 and C S 3.1 , which reflect the size of the investment and innovation and operational efficiency, respectively. Criteria such as C S 6.2 and C S 2.2 have moderate importance, while additional or optional elements, such as C S 3.2 , have a minimal impact. The results suggest that alignment with strategic priorities and structural characteristics of the project are determinants in the O S , while additional details have less influence on the selection process.
PRED—Scoring
For the scoring variable O S of the PRED component, feature importance analysis and SHAP-based interpretation highlight the variables with the greatest influence on the official score ( O S ). Figure 3 presents the relative importance of the predictors in the model and their marginal influence for sub-measure 6.1, and Figure 4 illustrates the same for DR-30.
The results indicate that formal scoring components dominate the formation of the official score, while structural characteristics—such as farm size and investment value—exert a secondary but consistent influence. The combined FI and SHAP analysis confirms the dominance of the Estimated Score E S , with a major positive influence on the O S score. V C has a smaller but visible impact. The rest of the characteristics (Month, Year, County, S O , etc.) contribute marginally, indicating a predictive structure focused on a few key variables. SHAP shows that high values of the estimated score are correlated with significant increases in the predicted O S . The SHAP contributions show that applicants with higher E S , S O , and V C values benefit disproportionately in score formation, reflecting the structural priorities embedded in the evaluation grid.
Similarly, for the DR-30 sub-measure, equivalent diagrams were generated, which indicate the dominant variables and the direction of their contribution to the model results.
Compared to Sub-Measure 6.1, the DR-30 results suggest a stronger concentration of influence around core scoring components, indicating increased formalization of the selection criteria in the new programming period. The analysis for DR-30 shows that the variable E S has an overwhelming influence on the prediction of OS, followed at a distance by V C and S O (economic size of the farm). SHAP confirms that high values of the estimated score and the cumulative total value increase the prediction of OS, while other variables, such as year, month or region, have a marginal effect. This suggests that the selection process is strongly correlated with the initial self-assessment and the total value of the project. For DR-30, the success in obtaining a high O S score is strongly dependent on the realistic (or optimistic) self-assessment of the beneficiary and the size/value of the project. Administrative or calendar characteristics play a minor role, suggesting that the technical-economic assessment takes precedence over the regional or temporal context. The patterns indicate that larger and better-capitalized farms systematically obtain higher predicted scores, revealing structural advantages in the DR-30 selection process.
The SHAP contributions reveal the mechanisms that drive the scoring and selection process. For both sub-measures, high Estimated Score (ES) values systematically increase the predicted AFIR score and the probability of selection, confirming that applicants’ self-assessment closely aligns with the official evaluation logic. Farm economic size (SO) and project value (VC) further enhance predicted outcomes, indicating that structurally stronger farms benefit more from the scoring grid. Conversely, small holdings or modest investment plans generate negative SHAP contributions, highlighting persistent structural disadvantages for younger or newly established farmers. These patterns show that the selection process rewards economic robustness and strategic alignment with the scoring criteria while offering limited compensatory effects for structurally weaker applicants.
PRED—Acceptance
For the acceptance variable S e l N of the PRED component, feature importance analysis and SHAP-based interpretation highlight the variables with the greatest influence on the selection status ( S e l N ). Figure 5 presents the relative importance of the predictors in the model and their marginal influence for Sub-Measure 6.1, and Figure 6 illustrates the same for DR-30.
For Sub-Measure 6.1, the Random Forest model used for the S e l N classification highlights, through feature importance (FI) analysis, that the variables C S 1 , Y e a r , M o n t h , and V C have the greatest contribution to the accuracy of predictions, while the rest of the features have a reduced impact. SHAP analysis confirms these results and provides additional information on the direction of influence: high values for C S 1 and Y e a r increase the probability of belonging to the target class, while the effects of the variables M o n t h and V C depend on the combinations with other features. This indicates that the model performance is mainly supported by a narrow set of relevant factors, and the interpretation of SHAP helps to understand how they specifically influence the model’s decision. The SHAP effects reveal that alignment with specific criteria (e.g., CS1) and timing increase selection chances, while weaker structural indicators reduce them.
For sub-measure DR-30 (Gradient Boosting model, SelN classification), the analysis of the importance of features (FI) and SHAP values shows that the variables “Estimated score (self-assessment)”, “CS3.1” and “Cumulative total (public value)” have the greatest impact in predicting project selection. “Estimated score” directly influences the probability of selection, with higher values being associated with an increase in the chances. CS3.1 and CS2.2, specific selection criteria, contribute significantly to the discrimination of projects, while the temporal variables (month of submission) suggest the existence of seasonal effects. SHAP analysis highlights how high or low values of these variables modify the model prediction, confirming the importance of technical and temporal criteria in the financing decision. These effects highlight how technical criteria and project scale drive selection probabilities, confirming the strong role of structural competitiveness in DR-30 outcomes.
These predictive structures indicate that the current evaluation system rewards economically stronger and better-prepared applicants, raising questions about equitable access to support among young farmers.

3.3. Cluster Analysis Results—CLASS

3.3.1. Model Performance

To assess the latent structure of the projects in both sub-measures, we applied the K-Means method with KMeans++ initialization, run 10 times and limited to 300 steps, analyzing the Silhouette score for a different number of clusters. The results are shown in Table 15.
The results shown in Table 15 indicate a clear difference between the two sub-measures: for 6.1, the maximum Silhouette score ( 0.902 ) is obtained at 6 clusters, suggesting a more complex segmentation of projects; for DR-30, the optimal value ( 0.818 ) occurs at 2 clusters, indicating a simpler and less fragmented structure. This difference reflects the greater diversity of applications within 6.1 compared to DR-30, where projects are more homogeneous. Silhouette values above 0.8 are uncommon in real-world socioeconomic datasets, and in this case, they reflect the strong separation induced by a small number of dominant structural variables, particularly SO and VC. This indicates that clusters are well separated but potentially driven by a limited subset of features, which reduces overall cluster complexity. To assess internal consistency, within-cluster variability was examined and found to be low for SO and V C but higher for secondary variables, suggesting a simplified cluster structure. This limitation should be considered when interpreting the results.
The contrast between the two sub-measures indicates distinct underlying farm structures, affecting how applicants are segmented and compared.

3.3.2. Data Results—CLASS

The CLASS results present the clusters obtained after the run of the HCA algorithm and the optimal determination of the number of clusters using k-means. The model was run only for the accepted projects ( S e l N = 1 ). The next sections present the main clusters obtained as a result.
The next table (Table 16) presents the numerical characteristics of the six clusters obtained for Sub-Measure 6.1.
The cluster analysis for 6.1 highlights distinct farm and farmer profiles. C1 and C5 bring together small farms but with high scores (OS and E S above 80), indicating high readiness and compliance with the selection criteria. C2 has farms with a large average economic size ( S O = 35 k) and high eligible values, suggesting more developed holdings. C3 and C4 present lower scores but with small and medium-sized farms, possibly with less experience or resources. C6 is the numerically and financially dominant cluster, representing medium-sized farms and moderate scores, suggesting the main core of beneficiaries of the sub-measure. The cluster patterns show that high-scoring applicants tend to have stronger economic baselines, signaling competitive advantages within the scheme.
The differences described between clusters reflect descriptive contrasts rather than statistically validated effects. No formal significance testing (e.g., pairwise comparisons of O S distributions across clusters) was performed, and the observed patterns should therefore be interpreted as indicative structural profiles rather than statistically confirmed differences.
The next table (Table 17) presents the numerical characteristics of the six clusters obtained for the sub-measure DR-30.
In the case of DR-30, C1 brings together beneficiaries with very high AFIR scores ( O S = 84.6 ) and moderate economic size ( S O = 11.7 k), indicating farms well prepared for the selection requirements. C2, which concentrates most of the projects and financial resources, has larger farms ( S O = 14.1 k) but with significantly lower scores (=56), which may suggest greater diversity in the quality of the projects submitted and high competition within this segment. The DR-30 clusters reveal a strong divide between high-score, moderate-size farms and larger farms with more variable scoring performance.
Overall, the cluster analysis for Sub-Measure 6.1 and Intervention DR-30 highlights clear differences between farm profiles: Sub-Measure 6.1 presents a greater diversity of segments, with wide variations in economic size and scores, while DR-30 is more concentrated, with a dominant financial cluster, but with lower average scores. This structure suggests that 6.1 attracts both small and large farms, while intervention DR-30 predominantly targets farms with high investment capacity but more heterogeneous selection performance.
The cluster profiles reinforce the SHAP-based findings findings (as shown in Table 18). Segments dominated by larger farms and higher investment values exhibit consistently higher scores and selection probabilities, showing strong alignment with the program’s competitiveness objectives. In contrast, clusters representing small or newly formed farms display lower O S and V C levels and reduced likelihood of selection, suggesting that structural constraints limit their capacity to meet high-impact criteria. These disparities indicate that the support scheme primarily favors applicants with greater initial resources, underscoring the need for complementary advisory or financial instruments to ensure more equitable access for young farmers.
The cluster configurations reveal groups of applicants systematically disadvantaged in the selection process, highlighting specific target profiles that could benefit from tailored advisory or revised scoring rules.

4. Discussion

4.1. Study Limitations

The main limitations of this study are as follows:
  • The data analyzed come exclusively from applications for Sub-Measures 6.1 and DR-30, limiting the generalizability of the conclusions to other interventions or programs.
  • Certain variables (e.g., detailed selection criteria) were not included in the prediction models to avoid artificially influencing the results, reducing the degree of explainability.
  • The quality of the data depends on the completeness and accuracy of the information provided by the applicants; some fields present missing values or reporting errors.
  • Machine learning models can be affected by the distribution and structure of the data, as well as by the possible collinearity between variables.
  • The results regarding clustering and the importance of the characteristics reflect the reality of the analyzed period and may vary in the future depending on changes in agricultural and economic policies.

4.2. Results Interpretation in the Context of Common Agricultural Policy

The results of this study align closely with the key priorities of the Common Agricultural Policy (CAP), particularly the objectives of generational renewal, fostering viable farm incomes, and promoting balanced territorial development.
  • Generational renewal: Findings from STAT, PRED, and CLASS analyses confirm that young farmers under Sub-Measures 6.1 and DR-30 often operate smaller farms with lower economic size ( S O ) and reduced cumulative project values ( V C ), which limits their competitiveness in the selection process. This reflects a structural challenge for Common Agricultural Policy’s generational renewal objective, as smaller holdings require tailored support to achieve parity with more established farms.
  • Enhancing competitiveness and knowledge transfer: The predictive modeling (PRED) highlighted that project success is driven by a limited set of variables—most notably the Estimated Score ( E S ) and specific selection criteria ( C S 1 , C S 3.1 , C S 2.2 )—which are not fully exploited in many applications. This suggests gaps in technical knowledge and strategic alignment with the AFIR scoring grid, pointing to the need for targeted training and advisory services in line with Common Agricultural Policy’s knowledge transfer and innovation priority.
  • Balanced territorial development: STAT analysis revealed uneven regional participation, with certain areas submitting significantly fewer projects. This imbalance undermines Common Agricultural Policy’s objective of cohesion between rural regions and highlights the need for localized support measures, including mobile advisory units and targeted outreach in underrepresented areas.
  • Sustainability and resilience: While environmental or climate-related variables were not directly modeled in this study, the concentration of successful projects in larger, more capitalized farms suggests a potential risk of excluding small, diversified holdings that can contribute to environmental sustainability. Common Agricultural Policy’s environmental and climate goals could therefore be reinforced by ensuring that funding mechanisms remain accessible to smaller, sustainable farms.
Overall, the integration of Common Agricultural Policy priorities into the interpretation of results demonstrates that improving young farmers’ success rates requires a combination of strategic alignment with selection criteria, enhanced advisory and training services, and targeted financial instruments to bridge structural gaps. This evidence-based approach supports Common Agricultural Policy’s overarching aim of a competitive, sustainable, and inclusive agricultural sector.
Taken together, the SHAP interpretations and cluster patterns show that the current scoring system strongly rewards self-assessment quality, economic scale, and investment ambition while offering limited compensatory mechanisms for structurally weaker young farmers. This evidence suggests that enhancing generational renewal may require adjusting selection criteria or strengthening advisory tools so that applicants with lower initial capacity can better align their proposals with program objectives.
The patterns identified through STAT, PRED, and CLASS clearly indicate which aspects of the current selection mechanism reinforce existing inequalities and which criteria most strongly contribute to successful applications. These insights can guide targeted adjustments to scoring rules, weighting schemes, and advisory support, enabling the program to better align with its generational renewal objectives.

4.3. Comparative Analyses

The results obtained based on the STAT component show clear differences between the two sub-measures in terms of the profile and size of the supported farms. For 6.1, the average estimated score ( E S ) is approximately 64.16, and that of the AFIR score ( O S ) is 63.35, with an average project value of RON 30.4 million and an average S O of EUR 17,072. These values confirm the uniform nature of the support, specific to small and medium-sized farms, often managed by young farmers at their first investment, where the selection criteria and high scores serve to prioritize their installation.
In the case of DR-30, the average E S is 63.73 and O S 63.42, but with a higher average S O (EUR 13,525) and average project values almost double compared to 6.1 (RON 31.7 million). This structure indicates support for more consolidated farms, where young farmers are present in a smaller proportion, but can benefit indirectly through investments leading to the modernization of agricultural infrastructure.
Thus, STAT data highlights that Sub-Measure 6.1 plays a direct and pronounced role in supporting young farmers at the beginning of their journey, while DR-30 contributes to a broader investment framework, where the effects on young people appear mainly through the development of the general agricultural environment.
The FI and SHAP analysis shows that, for both sub-Measure 6.1 and DR-30, E S (the estimated score at submission) has the greatest influence on the final O S score. Since the selection criteria ( C S ) are directly correlated with O S and would have generated data leakage, they were excluded from the predictive models. The estimated score (ES) used in the predictive models does not incorporate the official AFIR scoring weights and is not computed from the C S 1 C S 6 components. The Estimated Score (ES) is provided by applicants themselves prior to project submission. Applicants are informed about the general scoring criteria ( C S 1 C S 6 ) and the corresponding point intervals published in the Applicant’s Guide, and they indicate the number of points they expect to receive for each criterion based on their own interpretation of eligibility conditions. E S therefore represents the applicant’s self-assessed score rather than a reconstruction of the official AFIR scoring formula. Although E S may be correlated with the underlying criteria, it does not embed the official weights or verification logic used by AFIR, and its inclusion may introduce a form of indirect leakage, which we explicitly acknowledge.
However, a separate analysis of the criteria revealed that:
  • For Sub-Measure 6.1: the most influential are C S 4 (degree of innovation and modernization elements), C S 3 (level of qualification and experience in the field), and C S 2 (economic size of the holding), followed by C S 5 and C S 1 .
  • For DR-30: the top is dominated by C S 2.1 (minimum economic size of the holding), C S 6.1 (priority for young farmers), and C S 3.1 (production diversification), followed by C S 6.2 and C S 2.2 .
The results highlight the importance of a realistic and well-founded self-assessment by young farmers from the project preparation phase. The next variables in impact are V C (total project value) and S O (economic size of the holding), which suggests that larger projects and farms with higher economic potential have a higher chance of a competitive score.
For young farmers, this result sends a clear message: success depends largely on the ability to structure their project in such a way as to maximize the selection criteria with high impact, to support their self-assessment through a coherent investment plan, and to consolidate their economic dimension. Administrative or calendar factors (year, month, region) have little effect, which shows that the essence of the competition is more about the quality and technical-economic substantiation of the project than about the geographical or temporal context.
Related to PRED—Acceptance ( S e l N ), for Sub-Measure 6.1, the feature importance (FI) analysis of the Random Forest model shows that the variables C S 1 (priority for young farmers), Year, Month, and V C (total project value) have the greatest impact in predicting project selection. The SHAP results confirm that high values of C S 1 and Year significantly increase the chances of selection, while the effects of Month and V C depend on the combination with other variables. This indicates that for young farmers, the benefit of dedicated criteria and the timing of submission may be decisive factors, along with the value of the investment.
For DR-30, the Gradient Boosting model highlights the importance of E S (estimated score/self-assessment), C S 3.1 (production diversification), and cumulative public value of the project. The E S exerts a direct influence: higher values increase the probability of selection. Criteria C S 3.1 and C S 2.2 (minimum economic farm size) contribute substantially to the differentiation of projects, while the variable of the month of submission suggests a possible seasonal effect. SHAP analysis shows how these variables modify the probability of selection, confirming that both technical-economic elements and the calendar of submission play a role in the funding decision.
In both cases, the results highlight that young farmers can increase their chances of success by focusing on the criteria with high impact, preparing a rigorous self-assessment, and choosing the right moment to submit the project.
The cluster analysis for Sub-Measure 6.1 identified six distinct project profiles, differentiated mainly by the economic size of the holding ( S O ), the estimated score ( E S ), and the final AFIR score ( O S ). Clusters with low S O and low V C (total project value) indicate small farms, typical of young farmers at the beginning of their activity, while clusters with high values for these indicators correspond to consolidated farms with high investment capacity. For DR-30, the analysis highlighted only two clusters, one representing projects with modest economic size and average scores, and the other large projects with significant investments and high scores. These results suggest that, in both sub-measures, there are structural differences between potential beneficiaries and those already consolidated, and support strategies for young farmers should specifically target clusters with limited resources in order to increase their competitiveness in the selection process.

4.4. Formulation of Support Measures

The integrated analyses (STAT, PRED, CLASS) show that the main bottlenecks (presented in Table 19) in the formulation of project proposals for young farmers are linked to both technical factors (selection criteria, economic size of the farm, time planning) and informational factors (access to consultancy and realistic self-assessment).
Regarding the improvement of the self-assessment process and its closeness to the official score, the feasibility of implementing digital tools must, however, consider existing disparities in rural digital access and literacy. Studies on Romania’s rural areas indicate persistent gaps in broadband availability, computer use, and digital skills, which may limit the immediate uptake of online advisory instruments. These constraints suggest that digital tools should be complemented by offline advisory channels and capacity-building initiatives to ensure equitable accessibility.
The proposed measures aim to reduce these barriers through targeted support, adapted to the profile and real needs of farmers at the beginning of their journey.
The integrated analysis of STAT, PRED, and CLASS results highlights that the performance of young farmers in accessing financing critically depends on the accuracy of self-assessment, the optimal use of high-impact criteria, the economic size of the holding, and the strategic timing of project submission. Although the identified barriers range from technical and financial limitations to information and planning deficiencies, all can be addressed through a combination of specific training, advisory support, and incentive measures adapted to the context of small and medium-sized farms. The implementation of these solutions would directly contribute to increasing the success rate, a more balanced distribution of funds, and strengthening the role of young farmers in the sustainable development of the agricultural sector.
The findings hold direct relevance for paying agencies and program designers, as they highlight which aspects of the current scoring system reinforce structural disparities and which elements support the intended objectives of generational renewal. By identifying these mechanisms, the analysis offers actionable insights for refining selection criteria and advisory interventions.
The empirical patterns identified here align with established findings on structural constraints faced by young farmers, including limited capitalization, reduced economic size, and uneven territorial development. By quantifying how these factors shape selection outcomes, the analysis contributes to ongoing discussions in agricultural economics and Common Agricultural Policy evaluation regarding the balance between competitiveness-oriented criteria and equitable access to support. The analytical structure of the SPCC–CAP framework can be adapted for use in other EU Member States; however, direct replication is not possible because scoring criteria and eligibility rules differ across national Common Agricultural Policy implementations. Cross-national applications would therefore require contextualization and re-specification of the relevant variables and scoring components.

5. Conclusions

This study combined three complementary analytical approaches—descriptive statistics (STAT), predictive modeling (PRED), and clustering (CLASS)—to investigate the determinants of project success under Common Agricultural Policy sub-measures 6.1 and DR-30, with a particular focus on young farmers.
Firstly, through the STAT analysis, we quantified the structural characteristics of the datasets, identifying disparities in project distribution by region, sub-measure, and time. We found that young farmers typically manage smaller farms, with lower economic size ( S O ) and cumulative project values ( V C ), and that certain regions show significantly lower participation rates.
The PRED analysis revealed that a small number of features explain most of the variation in the AFIR score ( O S ) and in selection outcomes ( S e l N ). The most influential variable was the Estimated Score ( E S ), which strongly correlates with the final AFIR score, but often shows large overestimations by applicants. Specific selection criteria ( C S 1 for 6.1, C S 3.1 and C S 2.2 for DR-30) emerged as decisive yet not fully exploited in project proposals. Temporal factors such as month and year of submission also played a measurable role, indicating seasonal competition effects.
The CLASS analysis segmented applicants into distinct profiles. For 6.1, six clusters were identified, ranging from low-scoring, small-scale farms to large, well-capitalized and competitive ones. For DR-30, two main clusters emerged, with a clear divide between higher and lower scoring profiles. Clusters dominated by young farmers were consistently associated with smaller S O and V C , limiting their competitiveness. Lower scores among smaller farms reflect the structure of the program’s scoring system, which assigns higher weights to economic size and investment capacity. These results should not be interpreted as evidence of inherently lower competitiveness among small farms, but rather as an outcome shaped by the policy design and eligibility criteria.
The empirical patterns identified across STAT, PRED, and CLASS components show that the scoring structure consistently favors applicants with higher economic capacity, better training, and more ambitious investments. Such evidence can guide the refinement of selection criteria, ensuring that young farmer measures balance competitiveness with accessibility. These insights contribute directly to the ongoing debate on generational renewal and the effectiveness of Common Agricultural Policy implementation.
Overall, the results converge on a common finding: while the technical and economic viability of projects matters, success in accessing funding is strongly dependent on strategic alignment with the AFIR scoring grid, optimal timing of submission, and adequate project scale. A consistent pattern emerging from the analysis is that selection outcomes are driven primarily by the degree to which applicants align their proposals with the program’s scoring logic, rather than by the inherent development potential of the farms themselves. Young farmers often face disadvantages on all these fronts, which can be addressed through targeted support measures: training and scoring simulations, guidance on high-impact criteria, dedicated funding lines for small farms, strategic submission planning, and expanded advisory networks.
By integrating STAT, PRED, and CLASS results, this study not only identifies the barriers faced by young farmers but also formulates evidence-based recommendations to improve their success rates. Implementing these measures can contribute to Common Agricultural Policy’s broader goals of generational renewal, regional equity, and sustainable rural development.
Given these patterns, this study provides a practical diagnostic tool for institutions responsible for implementing young farmer measures. The SPCC–CAP framework can support paying agencies and policymakers in evaluating whether current selection criteria achieve their intended effects and where targeted adjustments could enhance program effectiveness.
Beyond its technical components, the SPCC–CAP framework provides a policy-relevant diagnostic of the selection mechanism, showing how structural disparities and scoring dynamics affect access to support for young farmers. In this way, this study contributes to debates on generational renewal by offering empirical evidence that can inform future adjustments to Common Agricultural Policy targeting and advisory measures.

Author Contributions

Conceptualization, A.I.C., M.A.D. and I.C.; methodology, N.B.; software, D.A.P.; validation, C.M.M.; formal analysis, N.B.; investigation, M.A.D.; resources, A.I.C., I.A.C. and I.C.; data curation, D.A.P.; writing—original draft preparation, A.I.C. and N.B.; writing—review and editing, M.A.D.; visualization, I.A.C., I.C. and C.M.M.; supervision, C.M.M.; project administration, A.I.C.; funding acquisition, A.I.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the University of Oradea.

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

The data presented in this study were derived from the following resources available in the public domain: https://www.afir.ro/rapoarte/rapoarte-feadr/selectie/, Accessed on 29 November 2024.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AFIRAgency for Rural Investment Financing
AUCArea Under the Curve
CAPCommon Agricultural Policy
C1–C6Clusters 1 to 6
DRIntervention DR
EAFRDEuropean Agricultural Fund for Rural Development
FIFeature Importance
GBGradient Boosting
kNNk-Nearest Neighbors
LRLogistic Regression
MAEMean Absolute Error
MCCMatthews Correlation Coefficient
MSEMean Squared Error
NUTS-2Nomenclature of Territorial Units for Statistics, Level 2
PNDRNational Rural Development Programme
RFRandom Forest
SHAPShapley Additive Explanations
SP PAC 2023–2027Strategic Plan for the Common Agricultural Policy 2023–2027

Appendix A

Table A1. Description of performance metrics used for regression, classification, and clustering.
Table A1. Description of performance metrics used for regression, classification, and clustering.
MetricTypeDescription
Mean Squared Error (MSE)RegressionAverage of the squared differences between predicted and actual values (lower is better).
Root Mean Squared Error (RMSE)RegressionSquare root of MSE, expressed in the same units as the target variable.
Mean Absolute Error (MAE)RegressionAverage of the absolute differences between predicted and actual values.
Coefficient of Determination ( R 2 )RegressionProportion of variance in the target explained by the model (1 indicates perfect fit).
Area Under the ROC Curve (AUC)ClassificationMeasures the model’s ability to discriminate between classes (1.0 indicates perfect classification).
Classification Accuracy (CA)ClassificationProportion of correctly classified instances.
F1 Score (F1)ClassificationHarmonic mean of Precision and Recall, balancing false positives and false negatives.
PrecisionClassificationProportion of predicted positives that are actually positive.
RecallClassificationProportion of actual positives correctly identified.
Matthews Correlation Coefficient (MCC)ClassificationBalanced measure of classification quality, even for imbalanced datasets (ranges from −1 to 1).
Silhouette ScoreClusteringMeasures how similar an object is to its own cluster compared to other clusters (ranges from −1 to 1).

Appendix B

Figure A1. Geographical distribution of Romanian counties of several indicators for DB6.
Figure A1. Geographical distribution of Romanian counties of several indicators for DB6.
Land 15 00131 g0a1
These spatial patterns illustrate territorial inequalities that influence applicant participation and the distribution of young farmer support.

References

  1. Ministerul Agriculturii si Dezvoltării Rurale. National-Rural-Development-Program-2014–2020-Version-19.0. Available online: https://madr.ro/docs/dezvoltare-rurala/2024/Programul-National-de-Dezvoltare-Rurala-2014-2020—versiunea-19.0.pdf (accessed on 19 September 2025).
  2. Raicov, M.; Feher, A.; Sălășan, C.; Goșa, V.; Băneș, A. Supporting the new generation of farmers through european funds. Sci. Pap. Agric. Manag. 2020, 22, 276. [Google Scholar]
  3. Ministerul Agriculturii si Dezvoltării Rurale. Situația Proiectelor Depuse 2014–2020. Available online: https://madr.ro/pndr-2014-2020/implementare-pndr-2014-2020/situatia-proiectelor-depuse-2014-2020.html (accessed on 19 September 2025).
  4. Ministerul Agriculturii si Dezvoltării Rurale. Situația Proiectelor Depuse 2023–2027. Available online: https://madr.ro/planul-national-strategic-pac-post-2020/implementare-ps-pac-2023-2027/situatia-proiectelor-depuse-ps-pac-2023-2027.html (accessed on 19 September 2025).
  5. Wasilewski, A.; Gospodarowicz, M.; Wasilewska, A. Agricultural Land Price Dynamics in Europe: Convergence, Divergence, and Policy Impacts Across EU Member States. Sustainability 2024, 16, 10982. [Google Scholar] [CrossRef]
  6. Kostov, P.; Davidova, S. Common Policy but Different Outcomes: Structural Change in Family Farms of Central and East European Countries after Their Accession to the EU. Agriculture 2021, 11, 1074. [Google Scholar] [CrossRef]
  7. Licciardo, F.; Henke, R.; Piras, F.; Zanetti, B. The Setting-Up Measure to Support Generational Renewal in Agriculture: The Italian Experience. World 2024, 5, 1130–1147. [Google Scholar] [CrossRef]
  8. Micu, M.M.; Dumitru, E.A.; Vintu, C.R.; Tudor, V.C.; Fintineru, G. Models Underlying the Success Development of Family Farms in Romania. Sustainability 2022, 14, 2443. [Google Scholar] [CrossRef]
  9. Zekic, S.; Matkovski, B. Development Opportunities for Rural Areas of Serbia. Zb. Matice Srp. Za Druš. Nauk. 2015, 153, 757–771. [Google Scholar] [CrossRef]
  10. Jurjevic, Z.; Matkovski, B.; Dokic, D.; Zekic, S. A Methodological Framework for Evaluation of Rural Settlements: Rural Index of Serbia. Land 2024, 13, 2183. [Google Scholar] [CrossRef]
  11. Sroka, W.; Dudek, M.; Wojewodzic, T.; Król, K. Generational Changes in Agriculture: The Influence of Farm Characteristics and Socio-Economic Factors. Agriculture 2019, 9, 264. [Google Scholar] [CrossRef]
  12. Strategic National Plan—PNS 2023–2027, Version 2. 2022. Available online: https://www.madr.ro/docs/dezvoltare-rurala/2022/PNS_2023-2027-versiunea_1.2-21.11.2022.pdf (accessed on 19 September 2025).
  13. Agency for Rural Investment Financing (AFIR). Applicant’s Guide—DR-30 Intervention “Support for the Installation of Young Farmers”; Ministry of Agriculture and Rural Development: Bucharest, Romania, 2023; Available online: https://www.afir.ro/ (accessed on 10 June 2025).
  14. Agency for Rural Investment Financing (AFIR). 2023. Available online: https://www.afir.ro/comunicate/publicare-versiuni-finale-ale-ghidurilor-solicitantului-pentru-dr-27-28-30-si-37/ (accessed on 10 June 2025).
  15. European Union. Regulation (EU) No 1305/2013, Title 1, Chapter 1, Article 2 Definitions. Available online: https://eur-lex.europa.eu/legal-content/RO/TXT/PDF/?uri=CELEX:32013R1305 (accessed on 19 September 2025).
  16. European Union. Regulation (EU) No. 2115/2021, Art.75, Point 2 Setting Up of Young Farmers and New Farmers and Setting Up of Rural Businesses. Available online: https://eur-lex.europa.eu/legal-content/RO/TXT/PDF/?uri=CELEX:32021R2115 (accessed on 19 September 2025).
  17. Ministerul Agriculturii si Dezvoltării Rurale. Applicant’s Guide for Accessing Sub-Measure 6.1 “Support for the Installation of Young Farmers”. Available online: https://www.afir.ro/api/file?url=/media/vcuepqen/ghidul_solicitantului_sm_61_-_2021.pdf&filename=Ghidul_Solicitantului_sM_6.1_-_2021&filetype=pdf (accessed on 19 September 2025).
  18. Ministerul Agriculturii si Dezvoltării Rurale. Applicant Guide for accessing Intervention DR-30—“Support for the Installation of Young Farmers”. Available online: https://www.afir.ro/api/file?url=/media/tvmcdzo1/ghidul-solicitantului-dr-30.pdf&filename=Ghidul%20Solicitantului%20pentru%20DR%2030&filetype=pdf (accessed on 19 September 2025).
  19. Badan (Voicila), D.N.; Fintineru, G. The New Payment Scheme for Romanian Young Farmers: Evolution and Territorial Characteristics. Sci. Pap. Ser. Manag. Econ. Eng. Agric. Rural Dev. 2021, 21, 149–158. [Google Scholar]
  20. Badan (Voicila), D.N.; Fintineru, G. Young Farmers—A Fundamental Factor in the Development of the Agricultural Sector. Sci. Pap. Ser. Manag. Econ. Eng. Agric. Rural Dev. 2022, 22, 73–80. [Google Scholar]
  21. Micu, M.M. Research on accessing European Funds for young farmers in Romania under the two National Rural Development Programs. In Proceedings of the 33rd International Scientific Conference on Economic and Social Development—“Managerial Issues in Modern Business”, Warsaw, Poland, 26–27 September 2018; pp. 184–190. [Google Scholar]
  22. Zagata, L.; Sutherland, L. Deconstructing the ‘young farmer’ problem in Europe: Towards a research agenda. J. Rural Stud. 2015, 38, 39–51. [Google Scholar] [CrossRef]
  23. May, D.; Arancibia, S.; Behrendt, K.; Adams, J. Preventing young farmers from leaving the farm: Investigating the effectiveness of the young farmer payment using a behavioral approach. Land Use Policy 2019, 82, 317–327. [Google Scholar]
  24. Leonard, B.; Kinsella, A.; O’Donoghue, C.; Farrell, M.; Mahon, M. Policy drivers of farm succession and inheritance. Land Use Policy 2017, 61, 147–159. [Google Scholar] [CrossRef]
  25. Chereji, A.I.; Chiurciu, I.A.; Chereji, I.; Maerescu, C.M.; Tutui, D. Preliminary study regarding DR 30 young farmers installation. Strategic Plan CAP 2023–2027 Romania. Lucr. Științifice Manag. Agric. 2024, 26, 25–30. [Google Scholar]
  26. Food and Agriculture Organization of the United Nations. Farm Succession in Romania. 2016. Available online: https://www.fao.org/family-farming/detail/en/c/1010757/ (accessed on 19 September 2025).
  27. Popa, A.M.P.; Turek Rahoveanu, A. Supporting young farmers and the sustainability of rural regions. Case study—Olt county, Romania. Sci. Pap. Ser. Manag. Econ. Eng. Agric. Rural Dev. 2021, 21, 651–657. [Google Scholar]
  28. Balezentis, T.; Ribasauskiene, E.; Morkunas, M.; Volkov, A.; Streimikiene, D.; Toma, P. Young farmers’ support under the Common Agricultural Policy and sustainability of rural regions: Evidence from Lithuania. Land Use Policy 2020, 94, 104542. [Google Scholar] [CrossRef]
  29. Kahanec, M.; Zimmermann, K. Labor Migration, EU Enlargement, and the Great Recession; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  30. Gkatsikos, A.; Natos, D.; Staboulis, C.; Mattas, K.; Tsagris, M.; Polymeros, A. An Impact Assessment of the Young Farmers Scheme Policy on Regional Growth in Greece. Sustainability 2022, 14, 2882. [Google Scholar] [CrossRef]
  31. Figurek, A.; Morphi, K.; Thrassou, A. A Sustainable Risk Management Model and Instruments for Young Farmers in EU Agriculture. Sustainability 2024, 16, 283. [Google Scholar] [CrossRef]
  32. Kovách, I.; Megyesi, B.G.; Bai, A.; Balogh, P. Sustainability and Agricultural Regeneration in Hungarian Agriculture. Sustainability 2022, 14, 969. [Google Scholar] [CrossRef]
  33. Kabadzhova, M. Attractiveness of the agricultural sector to achieving generational renewal. Bulg. J. Agric. Sci. 2022, 28, 3–9. [Google Scholar]
  34. Bournaris, T.; Moulogianni, C.; Manos, B. A multicriteria model for the assessment of rural development plans in Greece. Land Use Policy 2014, 38, 1–8. [Google Scholar] [CrossRef]
  35. Kouriati, A.; Tafidou, A.; Lialia, E.; Prentzas, A.; Moulogianni, C.; Dimitriadou, E.; Bournaris, T. A Multicriteria Decision Analysis Model for Optimal Land Uses: Guiding Farmers under the New European Union’s Common Agricultural Policy (2023–2027). Land 2024, 13, 788. [Google Scholar] [CrossRef]
  36. AFIR. Rapoarte FEADR: Selecție. Available online: https://www.afir.ro/rapoarte/rapoarte-feadr/selectie/ (accessed on 19 December 2024).
  37. Agency for Rural Investment Financing (AFIR). SO Platform—About Standard Output. 2025. Available online: https://so.afir.info/Home/About (accessed on 10 June 2025).
Figure 1. Data distributions for DB6. The figure illustrates the empirical distribution of key applicant and project characteristics, highlighting structural heterogeneity in farm size, investment value, and application timing.
Figure 1. Data distributions for DB6. The figure illustrates the empirical distribution of key applicant and project characteristics, highlighting structural heterogeneity in farm size, investment value, and application timing.
Land 15 00131 g001
Figure 2. Data distributions for DB30. The figure illustrates the empirical distribution of key applicant and project characteristics under the DR-30 intervention, highlighting structural patterns relevant for interpreting selection and scoring outcomes.
Figure 2. Data distributions for DB30. The figure illustrates the empirical distribution of key applicant and project characteristics under the DR-30 intervention, highlighting structural patterns relevant for interpreting selection and scoring outcomes.
Land 15 00131 g002
Figure 3. Feature importance and SHAP values derived from the Random Forest model for the official project score ( O S ) under Sub-Measure 6.1 supporting young farmers. In the figure, “Punctaj estimat” is E S , “Total cumulat” is V C , “Luna” is Month, “An” is Year, “Judet” is County, “ Zi” is Day, “Regiune” is Region, “Valoare eligibila” is VE and “Valoare publica” is VP.
Figure 3. Feature importance and SHAP values derived from the Random Forest model for the official project score ( O S ) under Sub-Measure 6.1 supporting young farmers. In the figure, “Punctaj estimat” is E S , “Total cumulat” is V C , “Luna” is Month, “An” is Year, “Judet” is County, “ Zi” is Day, “Regiune” is Region, “Valoare eligibila” is VE and “Valoare publica” is VP.
Land 15 00131 g003
Figure 4. Feature importance and SHAP values derived from the Random Forest model for the project selection outcome ( S e l N ) under the DR-30 intervention for young farmer support. In the figure, “Punctaj estimat” is E S , “Total cumulat” is V C , “Luna” is Month, “An” is Year, “Judet” is County, “ Zi” is Day, “Regiune” is Region, “Valoare eligibila” is VE and “Valoare publica” is VP.
Figure 4. Feature importance and SHAP values derived from the Random Forest model for the project selection outcome ( S e l N ) under the DR-30 intervention for young farmer support. In the figure, “Punctaj estimat” is E S , “Total cumulat” is V C , “Luna” is Month, “An” is Year, “Judet” is County, “ Zi” is Day, “Regiune” is Region, “Valoare eligibila” is VE and “Valoare publica” is VP.
Land 15 00131 g004
Figure 5. Feature importance and SHAP values derived from the Gradient Boosting model for the project selection outcome ( S e l N ) under Sub-Measure 6.1 for young farmer support.
Figure 5. Feature importance and SHAP values derived from the Gradient Boosting model for the project selection outcome ( S e l N ) under Sub-Measure 6.1 for young farmer support.
Land 15 00131 g005
Figure 6. Feature importance and SHAP values derived from the Gradient Boosting model for the project selection outcome ( S e l N ) under the DR-30 intervention for young farmer support.
Figure 6. Feature importance and SHAP values derived from the Gradient Boosting model for the project selection outcome ( S e l N ) under the DR-30 intervention for young farmer support.
Land 15 00131 g006
Table 1. The variables included in the predictive and classification models.
Table 1. The variables included in the predictive and classification models.
No.FeatureAbbreviationData TypeValue Domain
1AFIR official assessment proposal score O S real 0 O S 100
2County G C categorical-
3Region G R categorical (integer) 1 G R 7
4CS1–CS6 Criteria (6.1) C S 6 i real 0 C S i max points
5CS1–CS6 Criteria (DR-30) C S 30 i real 0 C S i max points
6Standard Output 1 S O real S O 8000
7Eligible value V E real V E 0
8Public value V P real V P 0
9Cumulative total V C real V C 0
10Selection status (categorical) S e l C categorical { Y e s ,   N o }
11Selection status (binary) S e l N integer (binary) { 0 ,   1 }
1 Standard Output ( S O ) represents the value of the standard production of an agricultural holding, expressed in EUR [37]. Variable definitions follow the official program guidelines; abbreviations are explained at first occurrence in the text. O S denotes the official project score assigned by the paying agency, while S e l N indicates the binary project selection outcome.
Table 2. Selection criteria for Sub-Measure 6.1.
Table 2. Selection criteria for Sub-Measure 6.1.
Abb.DescriptionMax. Score
C S 6 1 Principle of farm consolidation, considering the number of fully acquired farms20
C S 6 2 Principle of qualification level in the agricultural/veterinary/agricultural economics field20
C S 6 3 Principle of agricultural potential targeting areas with potential determined from specialized studies5
C S 6 4 Principle of integrating environmental protection and efficient resource use into business plans15
C S 6 5 Principle of integrating the construction and modernization of agrifood facilities and the acquisition of equipment to enhance the farm’s economic performance into business plans25
C S 6 6 Principle of membership in an associative organization with an economic role (cooperative, group, or producers’ organization)15
Table 3. Selection criteria for Intervention DR-30.
Table 3. Selection criteria for Intervention DR-30.
Abb.DescriptionMax. Score
NMM
C S 30 1 Principle of qualification level: applicant must have completed secondary, post-secondary, or higher education in the targeted agricultural branch (vegetal/livestock/mixed)1515
C S 30 1.1 Applicant has obtained a diploma in the relevant agricultural branch1515
C S 30 1.2 Applicant provides proof of graduation from an agricultural high school (even without baccalaureate) or proof of attending a qualification/training course above the minimum required level1010
C S 30 2 Principle of promotion of the livestock/vegetables sector3025
C S 30 2.1 Applicant holds a majority share (>50%) in the farm’s operating unit related to the livestock/vegetables sector3025
C S 30 2.2 Applicant’s farm production value from vegetables in protected areas is between EUR 2300 and 7100, with a proposal for heating system investments covering the entire area20-
C S 30 3 Principle of consolidation through the takeover of farms1015
C S 30 3.1 Applicant takes over at least one farm in full from a transferor aged at least 60 years1015
C S 30 3.2 Applicant takes over at least two farms in full710
C S 30 3.3 Applicant takes over one farm in full57
C S 30 4 Principle of membership in an associative organization with an economic role (cooperative, group, or producers’ organization)1010
C S 30 4.1 Applicant is part of an associative organization with an economic role1010
C S 30 5 Principle of ownership of the farm105
C S 30 5.1 Applicant owns the agricultural land area of the farm and the total livestock105
C S 30 6 Principle of promoting modern production technologies with reduced environmental impact and efficient use of natural resources2530
C S 30 6.1 Organic farming510
C S 30 6.2 Precision agriculture, including automated systems for optimizing production flow1010
C S 30 6.3 Circular economy/use of renewable energy sources1010
Table 4. Expected output data from predictive, classification, and clustering models.
Table 4. Expected output data from predictive, classification, and clustering models.
Model TypeOutput DescriptionFormat
Predictive models (OS target)Performance metrics (MSE, RMSE, MAE, R2), feature importance scores, SHAP value plotsNumeric + graphical
Classification models (SelN target)AUC, CA, F1, Precision, Recall, MCC, confusion matrix, feature importance, SHAP plotsNumeric + graphical
Clustering models (K-means, HCA)Optimal number of clusters (Silhouette score), centroid values, project distribution by clusterNumeric + graphical
Aggregated policy insightsList of most influential criteria for OS and SelN, interpretation of results for policy designTextual + graphical
Note: OS = Official Score assigned by the paying agency; SelN = binary selection outcome (1 = selected, 0 = not selected); MSE = Mean Squared Error; RMSE = Root Mean Squared Error; MAE = Mean Absolute Error; AUC = Area Under the Receiver Operating Characteristic Curve; CA = Classification Accuracy; F1 = F1-score; MCC = Matthews Correlation Coefficient; SHAP = Shapley Additive Exlanations; HCA = Hierarchical Cluster Analysis.
Table 5. Role of the SPCC–CAP components in evaluating support for young farmers.
Table 5. Role of the SPCC–CAP components in evaluating support for young farmers.
ComponentAnalytical FunctionContribution to Evaluating Young Farmer Support
STATDescribes structural patterns of applicants, farms, and regional distributionIdentifies structural constraints, dominant farm types, and territorial disparities relevant for assessing targeting and accessibility of the scheme
PREDModels score formation and selection outcomes using supervised learningQuantifies how specific characteristics (economic size, training, investment type, timing) influence the probability of receiving support
EXPLProvides interpretability tools (feature importance, SHAP values)Reveals mechanisms behind score allocation and selection; shows whether criteria favor or disadvantage certain young farmer profiles
CLASSSegments applicants into meaningful clusters based on shared attributesIdentifies groups of young farmers with similar structural conditions, highlighting which profiles benefit most or least from the scheme
Table 6. Descriptive statistics for Sub-Measure 6.1 projects.
Table 6. Descriptive statistics for Sub-Measure 6.1 projects.
FeatureMeanModeMedianMinMaxMissing (%)
ES63.50656525.00100.000
OS62.49-64.730.00100.0026
C S 1 28.24303003026
C S 2 2.340002026
C S 3 11.08101003526
C S 4 18.74202002526
C S 5 1.700002526
C S 6 14.66-1501598
SO17,012.32-14,359.0212,002.2051,704.1332
VE41,261.5640,00040,000070,0001
VP41,261.5640,00040,000070,0001
VC 2.22 × 10 7 - 8.00 × 10 6 0 1.30 × 10 8 3
Year2016.7320172017201520210
Month7.18672100
Day15.851161310
Date-1 June 20171 June 20177 April 201526 September 20210
GR4.2064180 (0%)
GC-Dâmbovița---0
Table 7. Descriptive statistics for DR-30 dataset.
Table 7. Descriptive statistics for DR-30 dataset.
FeatureMeanMedianModeMinMaxMissing (%)
ES61.4960.0050.0030.0099.500
OS59.6855.0050.0010.0097.940
C S 1.1 1.29000720
C S 1.2 8.9010100100
C S 2.1 8.78000300
C S 2.2 3.090002536
C S 3.1 5.86000150
C S 3.2 0.30000100
C S 3.3 0.7600070
C S 4.1 9.7110100100
C S 5.1 0.79000100
C S 6.1 2.20000100
C S 6.2 9.4610100100
C S 6.3 9.6410100100
SO13,543.1413,168.852364.97097,896.800
VE70,00070,00070,00070,00070,0000
VP70,00070,00070,00070,00070,0000
VC31,747,400.7223,730,00070,00070,000106,750,0000
Year2023.5620242024202320240
Month5.83111120
Day6.18451310
Date21 December 20231 January 20241 January 20242 November 202319 January 20240 (0%)
GR4.9166180
County BIHOR 0 (0%)
Table 8. General dataset structure comparison between Sub-Measure 6.1 and DR-30.
Table 8. General dataset structure comparison between Sub-Measure 6.1 and DR-30.
Feature6.1DR-30
Number of projects16,1295827
Selection rate (SelN = 1)64.8%67.4%
Missing O S (%)26%0%
Largest missing featureCS6 (98%)CS2.2 (36%)
Application years2015–20212023–2024
Geographical coverageAll countiesAll counties
Proposals per year248134,277
Proposals per month2072914
Table 9. Financial characteristics comparison.
Table 9. Financial characteristics comparison.
Feature6.1DR-30
Mean SO17,01213,543
Median SO14,35913,169
SO range12,002–51,7040–97,896
Median eligible/public value (EUR)40,00070,000
Public value fixedNoYes
Cumulative total (EUR)0–129.85 M70 k–106.75 M
Table 10. Scoring comparison between measures.
Table 10. Scoring comparison between measures.
Feature6.1DR-30
Mean E S 63.5061.49
Median E S 65.0060.00
Mean O S 62.4959.68
Median O S 64.7355.00
Min–Max O S 0–10010–97.94
Criteria with highest mean C S 4 , C S 3 C S 2.1 , C S 6.2 , C S 6.3
Table 11. Categorical variable comparison.
Table 11. Categorical variable comparison.
Feature6.1DR-30
Counties coveredAllAll
NUTS-2 regionsAll 8All 8
Applicant typeBroad categoriesClear beneficiary type codes
Temporal concentrationMulti-year2 months
Table 12. Performance comparison for regression ( O S ) and classification ( S e l N ) across Sub-Measures 6.1 and DR-30.
Table 12. Performance comparison for regression ( O S ) and classification ( S e l N ) across Sub-Measures 6.1 and DR-30.
ModelDatasetRegression ( OS )Classification ( SelN )
MSERMSEMAE R 2 AUCCAF1PrecRecallMCC
Stacking 6.113.7143.7031.4380.9090.9990.9820.9820.9820.9820.961
DR-3014.0013.7421.4370.9440.9510.8600.8600.8600.8600.710
Random Forest6.114.1893.7671.2460.9060.9980.9770.9770.9770.9770.950
DR-3014.2753.7781.1490.9430.9220.8170.8170.8170.8170.619
Gradient Boosting6.116.5664.0701.9200.8900.9980.9790.9790.9790.9790.955
DR-3019.1134.3722.1800.9240.9460.8480.8480.8480.8480.684
Neural Network6.118.9434.3522.1940.8740.9890.9640.9640.9640.9640.920
DR-3027.9105.2832.8920.8890.9320.8300.8300.8310.8300.650
kNN6.124.0004.8992.6360.8400.9770.9390.9390.9390.9390.867
DR-3027.9355.2852.5510.8890.9230.8190.8190.8190.8190.624
Logistic Regression6.1n/an/an/an/a0.8240.7750.7700.7700.7750.493
DR-30n/an/an/an/a0.9100.8350.8350.8360.8350.658
Notes: Linear Regression ( O S ) produced numerically unstable results due to probable multicollinearity or feature leakage. MAPE excluded due to division-by-zero artifacts in 6.1 ( O S includes zeros). Stacking is reported but excluded from ranking as it is an ensemble of base learners. Performance metrics are reported for comparative purposes across models and programming periods. Differences should be interpreted in relative terms rather than as absolute predictive superiority.
Table 13. The ranking of the influence of the scoring criteria ( C S 1 C S 6 ) on the official project score ( O S ).
Table 13. The ranking of the influence of the scoring criteria ( C S 1 C S 6 ) on the official project score ( O S ).
#Univariate RegressionRReliefFLinear Regression (Coefficient)
C S 4 3391.4790.0097.222
C S 3 2705.9960.0636.033
C S 2 1758.5710.0524.236
C S 5 1422.1310.0466.147
C S 1 528.5610.0574.981
C S 6 6.0980.2122.012
Note: Rankings reflect relative importance within the predictive models and should not be interpreted as causal effects.
Table 14. Ranking of selection criteria for DR-30 based on multiple importance metrics.
Table 14. Ranking of selection criteria for DR-30 based on multiple importance metrics.
#FeatureUnivariate RegressionRReliefFLinear Regression
1 C S 2.1 8027.2450.47111.192
2 C S 6.1 1198.5030.1893.442
3 C S 3.1 1194.0580.1866.023
4 C S 6.2 325.2900.1641.999
5 C S 2.2 174.4880.1786.976
6 C S 6.3 66.6840.2831.836
7 C S 4.1 39.2300.3581.639
8 C S 1.2 22.0720.1353.159
9 C S 3.3 10.7550.3121.793
10 C S 1.1 6.2810.1694.235
11 C S 5.1 5.8830.3442.405
12 C S 3.2 5.3710.4771.429
Note: Importance rankings are model-dependent and reflect relative contributions under the DR-30 selection framework.
Table 15. Silhouette scores for different numbers of clusters in sub-measures 6.1 and DR-30.
Table 15. Silhouette scores for different numbers of clusters in sub-measures 6.1 and DR-30.
Number of Clusters6.1DR-30
20.8140.818
30.7560.817
40.8620.758
50.8980.740
60.9020.731
70.8560.643
80.8600.618
Table 16. Aggregate cluster profiles for Sub-Measure 6.1 (totals and averages).
Table 16. Aggregate cluster profiles for Sub-Measure 6.1 (totals and averages).
ClusterTotal NTotal ES Total OS Total VC Average VC Average SO
C128283.7882.81 2.06 × 10 9 7.30 × 10 6 17,168.13
C2100865.5064.39 2.16 × 10 10 2.14 × 10 7 35,144.68
C354158.1755.63 3.41 × 10 9 6.61 × 10 6 14,734.53
C467944.6043.92 1.14 × 10 10 1.68 × 10 7 16,050.46
C5103780.4180.02 2.41 × 10 9 2.36 × 10 6 15,631.12
C6692363.1262.41 2.76 × 10 11 3.99 × 10 7 14,977.40
Total10,47064.1663.35 3 . 17 × 10 11 3 . 04 × 10 7 17,072.43
Table 17. Aggregate cluster profiles for Sub-Measure DR-30 (totals and averages).
Table 17. Aggregate cluster profiles for Sub-Measure DR-30 (totals and averages).
ClusterTotal NAverage SO Total ES Total OS Total VC Average VC
C192111,757.2884.7284.62 1.22 × 10 10 1.33 × 10 7
C2255714,161.7856.1855.78 9.80 × 10 10 3.84 × 10 7
Total347813,525.0563.7363.42 1 . 10 × 10 11 3 . 17 × 10 7
Table 18. Feature importance ranking based on the Random Forest and Gradient Boosting models.
Table 18. Feature importance ranking based on the Random Forest and Gradient Boosting models.
Sub-Measure/TargetMost Influential Features (Ordered)
6.1—OS (Random Forest)ES ≫ V C > (Month, Year, GC, SO, others)
DR-30—OS (Random Forest)ES ≫ V C > SO > (Year, Month, Region)
6.1—SelN (Random Forest)CS1, Year, Month, V C (others marginal)
DR-30—SelN (Gradient Boosting)ES, CS3.1, VC, CS2.2 (Month seasonal effect)
Note: Feature importance scores indicate relative contribution to model predictions and do not imply causality.
Table 19. Problems identified in the formulation of project proposals for young farmers, based on STAT, PRED, and CLASS analyses.
Table 19. Problems identified in the formulation of project proposals for young farmers, based on STAT, PRED, and CLASS analyses.
Problem IdentifiedAnalysis InterpretationImpact on Young FarmersPossible Support Measures
Self-assessment not correlated with AFIR gridPRED ( O S ) shows that Estimated Score ( E S ) is the most important predictor; large differences between E S and the final O S score.Promising projects are rejected or lose significant points; resources are wasted on unrealistic proposals.Specific training on AFIR grid; scoring simulations; online tools for accurate self-assessment.
Weak use of high-impact criteriaFI and SHAP (SelN) show that C S 1 , C S 3.1 , C S 2.2 bring major points, but are not maximized.Low chances of selection even for technically viable projects; loss of competitive advantage.Practical guides and examples of good practices; assistance in formulating strategic criteria.
Low economic size (OS) and cumulative value (VC)CLASS shows clusters with small SO and VC, specific to young farms.Limited co-financing; difficulties in competing with large holdings.Funding lines dedicated to micro-farms; additional grants for increasing SO.
Calendar and seasonality issuesSelN indicates the influence of the month/year of submission on selection.Application in highly competitive sessions decreases the chances of success.Strategic planning of submissions; regular information on the intensity of the competition.
Limited access to technical and financial adviceSTAT shows areas/regions with low density of submitted projects.Low quality of projects; loss of financing opportunities.Regional mobile advisory network; local information centers for young farmers.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chereji, A.I.; Bold, N.; Dodu, M.A.; Chereji, I.; Maerescu, C.M.; Popescu, D.A.; Chiurciu, I.A. Enhancing Rural Economies Through Young Farmer Support: A Romanian Case Within the European Union Policy Framework. Land 2026, 15, 131. https://doi.org/10.3390/land15010131

AMA Style

Chereji AI, Bold N, Dodu MA, Chereji I, Maerescu CM, Popescu DA, Chiurciu IA. Enhancing Rural Economies Through Young Farmer Support: A Romanian Case Within the European Union Policy Framework. Land. 2026; 15(1):131. https://doi.org/10.3390/land15010131

Chicago/Turabian Style

Chereji, Aurelia Ioana, Nicolae Bold, Monica Angelica Dodu, Ioan Chereji, Cristina Maria Maerescu, Doru Anastasiu Popescu, and Irina Adriana Chiurciu. 2026. "Enhancing Rural Economies Through Young Farmer Support: A Romanian Case Within the European Union Policy Framework" Land 15, no. 1: 131. https://doi.org/10.3390/land15010131

APA Style

Chereji, A. I., Bold, N., Dodu, M. A., Chereji, I., Maerescu, C. M., Popescu, D. A., & Chiurciu, I. A. (2026). Enhancing Rural Economies Through Young Farmer Support: A Romanian Case Within the European Union Policy Framework. Land, 15(1), 131. https://doi.org/10.3390/land15010131

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop