Explainable AI and Fuzzy Linguistic Interpretation for Enhanced Transparency in Public Procurement: Analyzing EU Tender Awards

Cernăzanu-Glăvan, Cosmin; Bulzan, Andrei-Ștefan

doi:10.3390/math13132215

Open AccessFeature PaperArticle

Explainable AI and Fuzzy Linguistic Interpretation for Enhanced Transparency in Public Procurement: Analyzing EU Tender Awards

by

Cosmin Cernăzanu-Glăvan

^†

and

Andrei-Ștefan Bulzan

^*,†

Department of Computer and Information Technology, Politehnica University Timișoara, 300223 Timișoara, Romania

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(13), 2215; https://doi.org/10.3390/math13132215

Submission received: 9 June 2025 / Revised: 2 July 2025 / Accepted: 5 July 2025 / Published: 7 July 2025

(This article belongs to the Special Issue Multi-attribute Decision Making and Intelligent Computing in Smart Governance)

Download

Browse Figures

Versions Notes

Abstract

Despite the ideal of a unified Single Market, a powerful “home bias” pervades EU public procurement, hinting at unseen barriers that conventional analysis fails to capture. This study introduces an interpretable AI framework to investigate these dynamics, pairing a LightGBM model with SHapley Additive exPlanations (SHAP) to examine the vast Tenders Electronic Daily (TED) database (2018–2023). Concretely, we propose a fuzzy linguistic layer that translates SHAP’s complex quantitative outputs into intuitive, human-readable terms. Our model effectively distinguishes local from non-local awards (AUC ≈ 0.855), revealing that while high-value contracts expectedly attract broader competition, the most potent predictors are a country’s own history of local awards and structural factors like the buyer’s type and location. This points not to isolated incidents, but, rather, to deep-seated patterns shaping market fairness. Our combined XAI-Fuzzy approach offers a new instrument for transparent governance, enabling policymakers to diagnose market realities and forge a more genuinely open and equitable European public square.

Keywords:

public procurement; Explainable AI (XAI); SHAP; fuzzy linguistic interpretation; smart governance; Tenders Electronic Daily (TED)

MSC:

68T01; 91-05

1. Introduction

Public procurement represents a cornerstone of the EU economy, with public authorities annually spending approximately 14% of the collective GDP—over EUR 2 trillion—on essential goods, works, and services [1]. This expenditure is governed by the core principles of the Single Market: transparency, equal treatment, non-discrimination, and open competition, all intended to foster robust cross-border activity [2]. However, despite this clear policy framework, a powerful “home bias” pervades the market. Direct cross-border procurement remains stubbornly low, with most contracts awarded to domestic suppliers, revealing unseen barriers that conventional analysis fails to capture [3]. Diagnosing the reasons for this gap between policy and reality—from administrative hurdles to implicit national preferences—is a paramount challenge for achieving the EU’s goal of smart, efficient, and transparent governance [4]. The vast and complex data within the Tenders Electronic Daily (TED) platform [5] offers a rich opportunity, but its scale and intricacy demand more powerful analytical tools than traditional statistics can provide.

To bridge this analytical gap, this study introduces an interpretable AI framework to dissect the drivers of public procurement outcomes. The “black box” nature of many advanced machine learning models hinders their adoption in public governance, where accountability and the ability to understand why a decision was made are non-negotiable [6]. We address this by pairing a high-performance LightGBM model with SHapley Additive exPlanations (SHAP), a leading Explainable AI (XAI) method that attributes model predictions to the underlying features in a fair and transparent manner [7]. Going a step further, we introduce a novel fuzzy linguistic interpretation layer that translates SHAP’s complex quantitative outputs into intuitive, human-readable terms. This innovation aims to make the intricate dynamics of market behavior accessible to non-technical stakeholders, such as policymakers and oversight bodies. By applying this framework to a comprehensive dataset of EU tenders from 2018 to 2023, we seek to answer the following research questions:

RQ1:: What are the primary characteristics of public tenders (e.g., value, competition level, buyer type) that drive the likelihood of a contract being awarded to a local versus a non-local supplier?
RQ2:: How can SHAP values, further clarified through a fuzzy linguistic framework, illuminate the specific ways these features influence outcomes, revealing patterns indicative of market integration levels or systemic biases?
RQ3:: What practical insights for smart governance and evidence-based policymaking can be derived from this interpretable machine learning framework?

This study’s contribution is threefold. Empirically, we provide data-driven evidence identifying the key factors that shape the EU’s procurement landscape. Methodologically, we demonstrate a novel, integrated XAI-Fuzzy framework designed for enhanced transparency and accessibility in complex policy analysis. Practically, we offer a powerful diagnostic tool that enables policymakers to better understand market realities, monitor fairness, and forge a more genuinely open and equitable European public square. The remainder of this paper details our methodology in Section 3, presents the model performance and explanatory findings in Section 4, discusses their implications in Section 5, and offers concluding remarks in Section 6.

2. Literature Review

This section synthesizes the literature on EU public procurement, the application of AI and XAI in its analysis, and the methodological foundations for enhancing interpretability, thereby contextualizing our study’s contribution.

2.1. Regulation and Market Challenges in EU Public Procurement

Public procurement is a cornerstone of the EU economy, vital for delivering public services and stimulating growth. Its governance rests on core principles of transparency, equal treatment, and open competition, enshrined in EU Treaties and a series of evolving directives [8,9,10]. The Tenders Electronic Daily (TED) platform serves as the central repository for procurement notices and the primary data source for analysis, although its utility is conditioned by known data quality issues that necessitate careful pre-processing [11].

Despite this robust regulatory framework, a significant gap persists between policy ideals and market realities. Direct cross-border procurement shows persistence in remaining low—often below 5% by value—indicating powerful “border effects” rooted in administrative complexities, national preferences, and information barriers [3,12,13]. These challenges are often magnified for small and medium-sized enterprises (SMEs), which face disproportionate hurdles related to procedural complexity and financial constraints, despite policies designed to support their participation [10,14]. Compounding these issues, overall competition has seen a decline, marked by an increase in single-bid procedures and direct awards [1].

This complex landscape is further shaped by policy initiatives like the 2014 directives, which promoted the “Most Economically Advantageous Tender” (MEAT) criterion. This principle advocates for a holistic value assessment over simply the lowest price, encouraging the inclusion of quality, environmental, social, and innovation aspects [2]. However, in practice, a large number of contracts continue to be awarded based primarily on price [10,15]. This implementation gap not only hinders the achievement of strategic goals like innovation or Green Public Procurement [16,17,18], but also adds another layer of complexity to an already challenging environment. The intricate interplay of these legal, economic, and procedural factors suggests that subtle, systemic forces are at play, requiring analytical methods that can look beyond conventional statistics to uncover them.

2.2. From AI to XAI in Public Procurement

In response to this complexity, researchers and practitioners are increasingly turning to artificial intelligence (AI) and machine learning (ML) to extract deeper insights from large-scale procurement data. These technologies offer transformative potential for improving efficiency, generating cost savings, and enhancing decision-making [19,20]. Applications are diverse, ranging from using ML models on tender data to detect fraud and collusion [21] to leveraging natural language processing (NLP) for bias detection in tender documents [22]. Furthermore, ML is being applied to predictive analytics for tasks like award prediction, supplier risk assessment, and demand forecasting [23], while NLP also facilitates the automatic classification and analysis of tender texts [24].

However, the growing power of these models is often accompanied by a critical drawback: many advanced ML systems operate as “black boxes”, with their opaque internal logic hindering their adoption in high-stakes public domains where accountability, fairness, and transparency are non-negotiable [6,25,26]. This lack of transparency erodes public trust and critically complicates the processes of auditing models for errors or correcting for systemic biases [27]. In recognition of this challenge, international bodies like the OECD have made explainability a core tenet of their principles for trustworthy AI [28].

Explainable AI (XAI) has emerged as the critical field dedicated to developing methods that make AI model outputs interpretable to humans [29]. Among the various techniques, SHapley Additive exPlanations (SHAP) has become a leading approach due to its solid theoretic foundation for fairly attributing a model’s prediction to its input features [7]. For tree-based models such as the LightGBM used in this study, the TreeSHAP algorithm offers an efficient method to compute exact and consistent explanation values, a notable advantage over local approximation methods like LIME, which can sometimes suffer from instability [30,31]. The imperative for such explainability is not merely technical; it is fundamental for public policy, regulatory compliance, and democratic accountability in an era of algorithmic governance [27,32,33]. This is further underscored by emerging regulations like the EU AI Act, which imposes stringent transparency and oversight requirements for high-risk AI systems, implicitly necessitating XAI capabilities [34,35].

2.3. Fuzzy-Linguistic Interpretation of XAI Results

Applying this XAI paradigm to EU public procurement offers a powerful new method for assessing market integration and diagnosing potential biases. Traditional analyses have long identified persistent barriers to a fully integrated market, including regulatory divergence and information asymmetries [36], but have struggled to quantify the precise influence of specific tender characteristics. XAI moves beyond correlation to attribute outcomes directly to contributing factors, providing a lens to reveal implicit barriers or systemic biases that conventional methods may miss [26,37]. For instance, if certain buyer categories or procedural choices consistently favor domestic suppliers after controlling for economic variables, XAI can highlight and quantify these nuanced associations [22]. Such insights are vital for upholding the EU’s core principles of fairness and non-discrimination, offering a robust method to scrutinize whether automated systems maintain these values and mitigate biases stemming from data or models [38,39].

While XAI provides invaluable quantitative explanations, the numerical outputs of methods like SHAP can still pose an interpretive challenge for non-technical stakeholders such as policymakers and auditors. To bridge this final gap between technical precision and actionable insight, our study introduces a novel interpretation layer based on fuzzy set theory. Public procurement, especially under the MEAT framework, is inherently a multi-attribute decision-making (MADM) or multi-criteria decision analysis (MCDA) problem, where authorities must evaluate tenders against multiple, often conflicting, criteria [2,40,41]. While formal MADM methods like AHP and TOPSIS provide structured frameworks [42,43], their outputs can also lack intuitive transparency.

Fuzzy set theory, introduced by Zadeh [44], provides a mathematical framework for handling the uncertainty and vagueness inherent in human language. By creating “linguistic variables”—variables whose values are words like “Low”, “Medium”, or “High”—it becomes possible to map complex numerical outputs to intuitive, qualitative descriptors [45]. This approach has found success in various decision-making contexts, including MADM, by translating quantitative data into human-understandable terms [46,47]. By defining fuzzy sets over the range of SHAP values, we can categorize their impact using accessible labels like “Strong Positive Influence” or “Slight Negative Influence”. The choice of simple triangular membership functions for this task is common due to their interpretability and ease of implementation [48]. This layer of fuzzy linguistic interpretation, built upon the rigorous foundation of SHAP, aims to make the complex dynamics of public procurement more accessible, aligning with the principles of smart governance where clarity for a diverse audience is paramount.

2.4. Synthesis and Contribution

The literature demonstrates that while AI and ML are increasingly applied to public procurement, a primary focus on explanation for pan-European policy insight remains a developing area. Existing research has established the value of ML for predictive tasks and has begun applying XAI in specific national contexts. For instance, studies have used XGBoost and XAI to analyze contract variations in Italy [49] or developed ML systems to predict tender complaints [50]. Other work has focused on automating tender classification using large language models [51] or conducting structural analyses of market barriers for specific firm types [52]. To situate our study within this landscape, a comparative overview is presented in Table 1.

The existing body of work provides a strong foundation, yet a research gap persists in applying a dedicated XAI framework to the comprehensive pan-European TED dataset to explain the systemic drivers of market integration, rather than focusing on national-level prediction or classification. Furthermore, a significant opportunity exists to improve the accessibility of XAI findings for the non-technical stakeholders who shape policy. This study addresses these gaps by not only using an interpretable model to explain the drivers of local versus non-local awards across the EU Single Market but also by introducing a novel fuzzy linguistic layer designed to make these complex findings transparent, actionable, and policy-relevant.

3. Materials and Methods

This section outlines data sources, processing, modeling, and interpretability techniques for analyzing drivers of local vs. non-local contract awards.

3.1. Data Source and Cohort Definition

Data is from the Tenders Electronic Daily (TED) database via EU Open Data Portal [53], using Contract Award Notices (CANs) from 2018 to 2023. Cancelled tenders (25 notices) were removed. The target variable, is_local_winner, was derived by comparing winner (WIN_COUNTRY_CODE) and authority (ISO_COUNTRY_CODE) country codes (1 = local, 0 = non-local). Tenders with missing country codes necessary for this derivation were excluded. We acknowledge TED data limitations like missing fields and potential reporting biases. The final dataset comprised 4,274,324 notices, with 90.29% local and 9.71% non-local winners.

3.2. Feature Engineering

To prevent data leakage, information about the winning entity (e.g., WIN_NAME, WIN_COUNTRY_CODE) was removed prior to feature engineering related to the winner. We generated 27 new features based on contract value, competition levels, historical country performance, and other characteristics. These were combined with 10 original categorical variables to form an initial candidate pool of 37 features. Key features influencing award outcomes were extracted or derived, including:

Numerical Features:

AWARD_VALUE_EURO_FIN_1_log: Log-transformed final contract value in Euros.
NUMBER_OFFERS_log: Log-transformed number of offers.
historical_local_win_rate: Country-level historical rate of local wins, calculated using time-aware features from DT_DISPATCH.

Categorical Features:

ISO_COUNTRY_CODE: Buyer country code.
MAIN_ACTIVITY: Main activity sector of the contracting authority.
CAE_TYPE: Type of contracting authority.
TYPE_OF_CONTRACT: Type of contract (e.g., works, services, supplies).
no_competition: Binary indicator for tenders receiving only one offer.

Categorical features were label encoded for compatibility with LightGBM (e.g., MAIN_ACTIVITY had 108 unique values, ISO_COUNTRY_CODE had 33). Choices were guided by domain knowledge, data availability, and initial feature importance screening. After an initial model run to assess feature importance, 26 features were selected for the final tuned model. The full list of all features considered for the model is detailed in Table A1 in the Appendix C.

3.3. Predictive Modeling

LightGBM [54] was used for binary classification (is_local_winner) due to its efficiency and performance with large datasets and categorical features. Class imbalance was addressed using scale_pos_weight = 0.11 (ratio of non-local to local instances in the training set). Data was split into 70% train (2,992,026 instances)/30% test (1,282,298 instances) using stratification and random_state = 42.

Hyperparameter tuning was performed using grid search with 3-fold cross-validation, achieving a best cross-validation AUC score of 0.8534. The best parameters found were n_estimators = 500, learning_rate = 0.1, max_depth = 8, num_leaves = 63, colsample_bytree = 0.8, and subsample = 0.8. The model was trained with objective = ‘binary’ and metric = ‘auc’, employing early stopping with 30 rounds on a validation set during the initial feature selection phase, though the final model used the fixed 500 estimators from grid search.

3.4. Explainability Method (SHAP)

SHAP interpreted the LightGBM model. SHAP, from game theory, provides consistent and locally accurate feature attributions. We used a SHAP TreeExplainer on a random sample of 5000 test instances, calculating SHAP values for the positive class (‘local winner’). Our analysis of the SHAP outputs focused on three aspects:

Global Feature Importance: The mean absolute SHAP value per feature, which ranks overall predictive impact.
Feature Effects: The distribution of SHAP values, showing how feature values affect prediction magnitude and direction.
Dependence and Interactions: The marginal effect of a feature on prediction, used to reveal non-linearities and potential interactions.

These quantitative analyses provide the basis for interpreting the model’s behavior.

3.5. Fuzzy Linguistic Interpretation of SHAP Values

To enhance the interpretability of the quantitative SHAP values, we introduce a fuzzy linguistic interpretation layer. This involves defining linguistic variables and their corresponding fuzzy sets based on a data-driven heuristic analysis of the observed distribution of SHAP values from our model. This approach ensures that the linguistic categories are directly relevant to the scale of impacts produced by the model. A detailed justification for the threshold selection and a discussion of its robustness are provided in Appendix A, and details of fuzzy parameter selection for linguistic interpretation can be found in Appendix B.

The impact of an individual feature’s SHAP value (denoted

s_{j}

) on the prediction is categorized using a linguistic variable “Influence Strength”, with five fuzzy sets: “Very Low”, “Low”, “Moderate”, “High”, and “Very High”. These are defined over the range of absolute SHAP values

| s_{j} |

using triangular membership functions

μ (x; a, b, c) = max (0, min (\frac{x - a}{b - a}, \frac{c - x}{c - b}))

with the following empirically determined parameters:

$P_{1} = 0.0, P_{2} = 0.1, P_{3} = 0.5, P_{4} = 1.0, P_{5} = 2.0, P_{6} = 3.0, P_{7} = 4.5$ .

The direction of influence is captured by the sign of the SHAP value

s_{j}

:

Positive SHAP value ( $s_{j} > ϵ$ , for a small threshold $ϵ$ ): Indicates “Positive Influence” (increased likelihood of a local winner).
Negative SHAP value ( $s_{j} < - ϵ$ ): Indicates “Negative Influence” (decreased likelihood of a local winner).
Near-zero SHAP value ( $| s_{j} | \leq ϵ$ ): Indicates “Neutral Influence”.

The global feature importance, calculated as the mean absolute SHAP value (

{\bar{s}}_{k}

), is categorized using the linguistic variable “Overall Impact”. Its terms (“Negligible”, “Low”, “Medium”, “High”, “Very High”) are defined with the following empirically determined parameters:

$Q_{1} = 0.0, Q_{2} = 0.02, Q_{3} = 0.07, Q_{4} = 0.13, Q_{5} = 0.20, Q_{6} = 0.35, Q_{7} = 0.60$ .

For individual predictions, the linguistic term for magnitude with the highest membership value is combined with its direction (e.g., “Moderate Positive Influence”). For global importance, features are categorized by their “Overall Impact” level. This provides a qualitative summary that is easier to communicate, with the choice of triangular membership functions being common due to their simplicity and interpretability [48].

4. Results

This section details empirical findings: model performance and SHAP insights into outcome drivers.

4.1. Model Performance Comparison

To evaluate the effectiveness of our chosen modeling approach, we benchmarked the LightGBM model against two standard baseline classifiers: a logistic regression and a random forest. All models were trained and tested on the identical feature set and data splits to ensure a fair comparison. The performance metrics, summarized in Table 2, demonstrate that the LightGBM model achieves superior predictive power across all key metrics. This superior performance is a crucial prerequisite for building a reliable explanatory model, as a model that better captures the underlying patterns in the data can provide more trustworthy insights.

4.2. Predictive Model Performance

The LightGBM model effectively distinguished local from non-local awards, achieving an AUC of 0.8552 on the test set, indicating good class separability.

Classification performance was examined at the default 0.5 threshold and an optimal threshold. The optimal threshold to maximize accuracy was found to be 0.1387 (achieving an accuracy of 0.9129), while the optimal threshold to maximize the F1-score for the positive class was 0.1288 (achieving an F1-score of 0.9534). Figure 1 illustrates the performance of different metrics across various thresholds. For subsequent reporting of optimal performance, we use the threshold of 0.1387.

Figure 2 shows the confusion matrices. The default threshold (Figure 2a) results in an accuracy of 0.7520. The optimal threshold of 0.1387 (Figure 2b) significantly improves overall accuracy to 0.9129 and the F1-score for the local class to 0.9534 by adjusting the decision boundary, which is particularly relevant given the class imbalance.

Table 3 details precision, recall, and F1-scores. The optimal threshold (0.1387) substantially improves recall for the minority non-local class (Class 0) compared to what it would be at default, while maintaining high performance for the majority local class (Class 1). The choice of threshold depends on specific application priorities (e.g., prioritizing detection of non-local winners vs. overall accuracy).

4.3. Explainability Insights (SHAP Results)

SHAP values were calculated from the LightGBM model using a sample of 5000 test instances. These quantitative insights are further clarified using the fuzzy linguistic framework defined in Section 3.5.

Global Feature Importance with Fuzzy Linguistic Labels: Figure 3 shows the mean absolute SHAP value for each feature, indicating overall predictive power. Applying our fuzzy linguistic variable “Overall Impact”, historical_local_win_rate (mean abs SHAP ≈ 0.53) and AWARD_VALUE_EURO_FIN_1_log (mean abs SHAP ≈ 0.31) demonstrate a “Very High Overall Impact”. Features like ISO_COUNTRY_CODE (≈0.18), MAIN_ACTIVITY (≈0.17), and CAE_TYPE (≈0.14) exhibit a “High Overall Impact”. Others such as no_competition (≈0.12), NUMBER_OFFERS_log (≈0.11), and TYPE_OF_CONTRACT (≈0.10) show “Medium Overall Impact”. The remaining features display “Low” to “Negligible Overall Impact”. A summary is provided in Table 4.

Feature Effects and Interactions (Beeswarm and Dependence Plots): The SHAP summary plot (Figure 4) details SHAP value distributions. The SHAP dependence plots in Figure 5 provide more granular insights into feature effects, which are interpreted here using our fuzzy linguistic framework:

historical_local_win_rate (Figure 5a): This top feature exhibits a very strong, almost linear positive relationship. Low historical local win rates (e.g., below 0.7) exert a “Strong to Very High Negative Influence” (SHAP values often <−2.0) on predicting a current local win. Conversely, high historical rates (e.g., above 0.9) show a “Strong to Very High Positive Influence” (SHAP values often >+2.0). This suggests powerful path dependency. The interaction with no_competition (color) shows that for any given historical rate, the presence of no competition (red dots) tends to slightly increase the SHAP value (more positive).
AWARD_VALUE_EURO_FIN_1_log (Figure 5b): As log-transformed award value increases, especially beyond a log value of ≈7–8, there is an increasingly “Strong to Very High Negative Influence” on predicting a local winner (SHAP values become more negative). For very high values (log > 12–15), this negative influence is substantial. Lower values show more dispersed, “Neutral to Slight Influence”. The interaction with ISO_COUNTRY_CODE (color) indicates that the general trend holds across countries, but the exact magnitude of SHAP values can vary.
ISO_COUNTRY_CODE (Figure 5c): This plot shows the SHAP values for different (label-encoded) buyer countries. There is considerable variation. Some countries (e.g., with codes around 25–30 in the plot) tend to have negative SHAP values, indicating a “Slight to Moderate Negative Influence” on local awards (i.e., more openness to non-local). Other countries show positive SHAP values, suggesting a “Slight to Moderate Positive Influence” towards local awards. The interaction with no_competition (color) shows that single-bid situations (red dots) generally push SHAP values higher (more positive/less negative) for most countries.
CAE_TYPE (Figure 5d): Contracting Authority Type shows varied influence. For instance, CAE_TYPE ‘5’ (European institutions) consistently shows “Very High Negative Influence” (large negative SHAP values). Other types like ’1’ (National) or ‘3’ (Regional) often have “Slight to Moderate Positive Influence”. CAE_TYPE ‘4’ (Public Law Bodies) shows high variance. Interaction with NUMBER_OFFERS_log (color) indicates that higher competition (blue dots) tends to lower SHAP values (more negative/less positive) across most CAE types.
NUMBER_OFFERS_log: Based on the summary plot (Figure 4), very low numbers of offers (log value 0, i.e., single offer, related to no_competition) tend to have a “Moderate Positive Influence”. As the number of offers increases (log value > 1–1.5), the influence generally becomes negative (“Slight to Moderate Negative Influence”), suggesting that more competition favors non-local winners.
no_competition (Figure 5e): The presence of no competition (value 1) typically results in a “Moderate Positive Influence” on predicting a local winner (SHAP values mostly between 0 and +1.0). When there is competition (value 0), SHAP values are generally negative. Interaction with ISO_COUNTRY_CODE shows that this effect is fairly consistent across countries.

5. Discussion

This section discusses findings regarding market integration, potential biases, smart governance, limitations, and future research.

5.1. Interpreting Model Findings in the Context of Market Integration

RQ1 sought the key drivers of local vs. non-local awards. SHAP identified historical_local_win_rate and log-contract value (AWARD_VALUE_EURO_FIN_1_log) as paramount (Figure 3), both exhibiting a “Very High Overall Impact”. Higher contract values strongly decrease local award probability (Figure 5b), translating to a “Strong Negative Influence” on local awards, especially for very large contracts. This aligns with economic intuition: scale fosters integration by attracting international bidders and justifying the transaction costs of cross-border participation [55]. The historical_local_win_rate demonstrates a powerful path-dependency effect (Figure 5a), where past national tendencies strongly predict current outcomes.

Regarding competition, the data reveals nuanced patterns. The binary feature no_competition (single bids) shows a clear “Moderate Positive Influence” favoring local winners (Figure 5e). As competition increases (reflected partly by NUMBER_OFFERS_log, which has a “Medium Overall Impact”), the tendency generally shifts towards favoring non-local winners, as suggested by the SHAP summary plot (Figure 4).

Addressing RQ2, significant heterogeneity exists. The substantial importance of features like ISO_COUNTRY_CODE (Figure 5c), MAIN_ACTIVITY, and CAE_TYPE (Figure 5d) indicates uneven market openness. SHAP dependence plots reveal that these categorical features have distinct influence patterns. For instance, certain contracting authority types (e.g., ‘5’—European institutions) show a “Very High Negative Influence” toward local awards, while others (e.g., ‘1’—National authorities) often exert a “Slight to Moderate Positive Influence” favoring local winners. Similarly, different countries (ISO_COUNTRY_CODE) exhibit varying baseline tendencies.

The fuzzy linguistic layer aids in communicating these strengths of influence. For example, stating that contract value has a “Very High Overall Impact” and typically exerts a “Strong Negative Influence” on local awards for large contracts is more immediately understandable for policymakers than citing raw SHAP values alone.

5.2. Exploring Potential Biases and Fairness Implications

Observed heterogeneity, especially from features like CAE_TYPE or B_GPA after controlling for economic factors, raises questions about systemic biases (RQ2). The fuzzy interpretation helps to qualitatively grade the strength of these tendencies. If a buyer type consistently shows a “Moderate or High Positive Influence” towards local winners, it might suggest practices disadvantaging non-local bidders, conflicting with Single Market principles. Caution is crucial: SHAP reveals model prediction drivers based on correlations, not causality. A local award tendency might stem from legitimate unmodeled factors (e.g., specialized local needs) not overt bias. The model could also perpetuate historical data biases [39]. SHAP, with its fuzzy linguistic enhancement, offers diagnostics for fairness investigations but does not definitively prove bias without further analysis.

5.3. Implications for Smart Governance and Policy

This interpretable framework has significant practical implications for smart governance and evidence-based policy (RQ3).

First, it offers a powerful monitoring tool for procurement oversight. Authorities can compare model predictions and SHAP values for new tenders against actual outcomes to flag anomalous awards for scrutiny. Particularly, tenders with high contract values (AWARD_VALUE_EURO_FIN_1_log) and numerous offers (NUMBER_OFFERS_log) that nevertheless go to local winners represent statistical outliers that might warrant closer examination, as these characteristics typically exert a “Strong Negative Influence” on local award probability.

Second, global feature importance and dependence plots offer actionable insights for policy interventions. Identifying specific buyer types (CAE_TYPE) or sectors (MAIN_ACTIVITY) with consistently low international openness allows for targeted capacity-building, training programs, or enhanced transparency requirements. For instance, contracting authority types showing a “Moderate to High Positive Influence” toward local winners might benefit from specialized support for managing international procurement processes. Understanding the impact of award criteria types (CRIT_CODE) can prompt reviews of whether certain evaluation approaches inadvertently disadvantage non-local bidders.

Third, the model quantifies the impact of existing regulations and procurement framework characteristics. For example, the influence of government procurement agreement coverage (B_GPA) can be described with terms like “Moderate Negative Influence” on local awards in certain contexts, providing accessible feedback on the effectiveness of international procurement agreements. This helps policymakers assess whether such frameworks are achieving their intended goals of market openness.

Fourth, the findings suggest concrete governance applications:

Targeted oversight: Monitoring efforts should focus on high-value contracts with many offers that unexpectedly go to local winners, as these represent statistical anomalies based on our model.
Benchmarking contracting authorities: The significant CAE_TYPE effect suggests creating a data-driven ranking of authority types by their tendency toward local awards, controlling for tender characteristics. This could help identify best practices from the most internationally open buyer categories.
Procedural reforms: The substantial influence of categorical features like award criteria (CRIT_CODE) suggests reviewing how qualification requirements and evaluation approaches might inadvertently advantage local firms.

Fifth, the fuzzy linguistic interpretation layer makes these insights more readily communicable to diverse stakeholders, fostering a shared understanding essential for collaborative governance. By translating complex SHAP values into intuitive terms like “Strong Positive Influence” or “Moderate Negative Influence”, the findings become accessible to non-technical policymakers, procurement professionals, and oversight bodies. This common interpretative framework supports more inclusive policy discussions around market integration and procurement fairness.

In summary, this XAI approach transforms raw procurement data into active intelligence for adaptive oversight and policy improvement. It moves beyond simple descriptive statistics to offer deeper, more nuanced insights into the complex factors driving cross-border procurement outcomes, while ensuring that these insights remain understandable to all stakeholders through the fuzzy linguistic framework.

5.4. Limitations

This study has limitations. Data: Analysis relies on awarded CANs in TED, excluding unsuccessful/cancelled tenders, potentially biasing the sample. TED data has inconsistencies and missing values; imputation handles missingness statistically. Lack of detailed bidder information (firm size, capabilities) limits control for firm-level factors. Model: LightGBM learns from data, including potential biases. Predictions are correlational, not causal. SHAP, while advanced, has assumptions and explains model behavior, an approximation of reality [29]. Scope: Analysis is for 2018–2023 and treats EU market homogenously (though ISO_COUNTRY_CODE captures some country effects). Deeper regional analysis or focus beyond winner nationality (e.g., SMEs) could offer more insights.

Fuzzy threshold selection: The fuzzy linguistic framework relies on thresholds selected via a data-driven heuristic approach, as detailed in Appendix A. While this method ensures that the thresholds are relevant to our model’s output, their exact placement is a methodological choice. The qualitative conclusions of this study, however, are robust to minor variations in these breakpoints, as they are based on the large, underlying differences in SHAP values between high- and low-impact features. Future research could explore more formalized methods for threshold selection, such as statistical clustering or expert elicitation, to further refine the linguistic mappings.

5.5. Methodological Contribution

The comparative experiments (Section 4.2) serve to validate the choice of LightGBM as a robust foundation for our explanatory framework. Focusing on the threshold-independent AUC metric, which provides the fairest comparison of inherent predictive power, our model (0.855) demonstrates a significant advantage in discriminative capability over both logistic regression (0.717) and random forest (0.810). This superior predictive performance is a critical prerequisite; a model that cannot reliably capture the underlying patterns in the data cannot be trusted to provide meaningful explanations.

With the model’s high performance established, our primary methodological contributions unfold. Our approach moves beyond simple correlation to the attribution of effects, using SHAP to quantify the “local advantage” or disadvantage associated with specific procurement practices. We also address the practical challenge of class imbalance typical in procurement data through the optimization of the classification threshold, which allows for achieving high accuracy (0.9129) while understanding trade-offs in class detection (Table 3). Perhaps most significantly, we introduce a novel fuzzy linguistic interpretation layer over the SHAP analysis. By translating numerical SHAP values into qualitative descriptors like “Strong Positive Influence”, we bridge the gap between technical precision and human understanding, a combination particularly valuable in governance applications.

This methodological approach offers a template for analyzing other complex policy domains where administrative data contains mixed features, class imbalance challenges, and the need for both technical precision and interpretable outputs for non-technical audiences.

5.6. Future Research Directions

Future research could enrich datasets with bidder-specific information (e.g., from Orbis) and finer geographical data (NUTS levels). NLP on tender text could extract nuanced features from technical specifications and award criteria descriptions that might influence cross-border participation. Longitudinal analysis could track evolving patterns over time, especially in response to policy changes.

Several promising research extensions emerge from our findings:

Counterfactual analysis: The model could be used to simulate “what if” scenarios—for example, estimating cross-border procurement rates if all tenders followed the practices of the most internationally open buyer types or if certain policy changes were implemented.
Pre/post policy evaluation: This XAI framework could evaluate the effectiveness of EU procurement directives and initiatives by analyzing changes in feature importance and SHAP values over time, particularly before and after major policy implementations.
Tender text analysis: Combining our approach with natural language processing could reveal whether specific language patterns or terminology in tender documents correlates with local awards, potentially identifying subtle linguistic barriers to cross-border participation.
Network effects: Extending the analysis to include network relationships between contracting authorities and suppliers could reveal patterns of repeated interactions and their influence on cross-border procurement.

Methodologically, future work should move toward causal inference (e.g., propensity score matching, regression discontinuity designs) to isolate the impacts of specific policies or procurement characteristics. This would provide stronger evidence for policy interventions by distinguishing causation from correlation. Additionally, refining the fuzzy linguistic interpretation framework through stakeholder feedback could further enhance the usability of XAI outputs for diverse users in the procurement ecosystem.

6. Conclusions

This study aimed to identify factors influencing local versus non-local public contract awards in the EU, an indicator of Single Market integration, using interpretable machine learning (LightGBM with SHAP) on 2018–2023 TED data.

The model achieved robust predictive performance (AUC = 0.8552; accuracy = 0.9129 and F1-score for local class = 0.9534 at an optimal threshold of 0.1387). SHAP analysis revealed that historical_local_win_rate and higher contract values (AWARD_VALUE_EURO_FIN_1_log) are the most influential predictors, both categorized linguistically as having a “Very High Overall Impact”. Larger contracts significantly increase non-local winner likelihood, typically exerting a “Strong Negative Influence” on local awards.

Competition levels also play a key role, with single-bid procedures (no_competition) showing a “Moderate Positive Influence” toward local winners, while increased competition generally favors non-local suppliers. Structural factors like buyer country (ISO_COUNTRY_CODE), contracting authority type (CAE_TYPE), and main activity (MAIN_ACTIVITY) exert substantial influence (“High Overall Impact”) despite controlling for economic variables, indicating systemic differences in market openness.

The key contribution is demonstrating XAI’s value in public procurement governance, further enhanced by a fuzzy linguistic interpretation layer that translates complex quantitative outputs into accessible, human-understandable terms. This combined approach provides transparent, interpretable evidence to inform policy, assess market integration nuance, identify potential barriers or fairness concerns, and support data-driven monitoring.

As the EU pursues a more integrated, fair, and efficient Single Market, this interpretable AI approach, through its fuzzy linguistic framework, offers a potent path to enhanced transparency and accountability in public spending. It modernizes procurement oversight by moving beyond simple descriptive statistics to deeper, more nuanced understanding, while ensuring that these insights remain accessible to all stakeholders.

Author Contributions

Conceptualization, C.C.-G. and A.-Ș.B.; methodology, C.C.-G. and A.-Ș.B.; software, A.-Ș.B.; formal analysis, A.-Ș.B.; investigation, C.C.-G. and A.-Ș.B.; resources, C.C.-G. and A.-Ș.B.; data curation, A.-Ș.B.; writing—original draft preparation, C.C.-G. and A.-Ș.B.; writing—review and editing, C.C.-G. and A.-Ș.B.; visualization, A.-Ș.B.; supervision, C.C.-G.; project administration, C.C.-G. and A.-Ș.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Tenders Electronic Daily (TED) dataset used in this study is publicly available from the EU Open Data Portal, https://data.europa.eu/data/datasets/ted-csv-archives (accessed on 28 March 2025). The code used for data processing, model training, and analysis is available from the corresponding author upon reasonable request.

Acknowledgments

The authors acknowledge the European Commission for providing access to the Tenders Electronic Daily (TED) CSV dataset used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AUC	Area Under the ROC Curve
CAN	Contract Award Notice
ECA	European Court of Auditors
EU	European Union
GDP	Gross Domestic Product
LightGBM	Light Gradient Boosting Machine
MADM	Multi-Attribute Decision Making
MCDA	Multi-Criteria Decision Analysis
MEAT	Most Economically Advantageous Tender
ML	Machine Learning
NLP	Natural Language Processing
OECD	Organisation for Economic Co-operation and Development
ROC	Receiver Operating Characteristic
RQ	Research Question
SHAP	SHapley Additive exPlanations
SME	Small and Medium-sized Enterprise
TED	Tenders Electronic Daily
XAI	Explainable Artificial Intelligence

Appendix A. Justification and Robustness of Fuzzy Thresholds

Appendix A.1. Rationale for Threshold Selection

The thresholds were established using a data-driven heuristic approach. This involved a qualitative analysis of the empirical distribution of SHAP values generated by the model, wherein we partitioned the observed value ranges into five distinct categories. The knot points were selected to align with discernible clusters and gaps in the data, ensuring that the linguistic categories reflect the natural grouping of feature impacts.

Influence Strength Thresholds (P1–P7): The absolute SHAP values for individual predictions (Figure 4 and Figure 5) exhibit a wide range. Many features have a subtle influence (e.g., $| s_{j} | < 0.5$ ), while key features can produce highly dominant effects ( $| s_{j} | > 2.0$ ). The chosen P-parameters create bins that distinguish between these qualitatively different levels of impact.
Overall Impact Thresholds (Q1–Q7): The mean absolute SHAP values for global feature importance (Figure 3) operate on a much smaller scale (from near-zero to about 0.53). The Q-parameters were selected specifically to partition this distribution, allowing for a clear distinction between features with “Negligible”, “Low”, “Medium”, “High”, and “Very High” overall impact. For instance, the threshold for “Very High” impact ( $Q_{5} = 0.20$ ) effectively isolates the top-tier features from the rest.

This empirical approach ensures that the linguistic framework is directly tied to the context of our model, rather than imposing arbitrary, predefined scales.

Appendix A.2. Sensitivity and Robustness of Linguistic Labels

An important consideration for this method is the sensitivity of the linguistic labels to the specific choice of thresholds. The purpose of the fuzzy layer is to provide an intuitive, qualitative grouping rather than a new form of precise measurement. The primary findings of this study are, therefore, dependent on the clear hierarchy of feature importance, which is a direct, quantitative output of the SHAP analysis.

For instance, the mean absolute SHAP values for historical_local_win_rate (≈0.53) and AWARD_VALUE_EURO_FIN_1_log (≈0.31) are substantially larger than those for features in the middle or lower tiers. Consequently, these features would be categorized as having “Very High” or “High” impact under any reasonable partitioning of the data. A minor adjustment to a breakpoint might re-classify a feature on the cusp of two categories (e.g., from “Medium” to “Low”), but it would not alter the fundamental conclusion that the top features are orders of magnitude more influential than the bottom ones.

Therefore, the overall narrative derived from the linguistic framework—identifying which features are dominant and the general nature of their influence—is robust and not contingent on the exact placement of the thresholds.

Appendix B. Details of Fuzzy Parameter Selection for Linguistic Interpretation

The parameters

P_{1}, \dots, P_{7}

for the linguistic variable “Influence Strength” (based on individual absolute SHAP values

| s_{j} |

) and

Q_{1}, \dots, Q_{7}

for “Overall Impact” (based on mean absolute SHAP values

{\bar{s}}_{k}

) were determined by examining the empirical distribution of these SHAP values derived from the model. Specifically, individual SHAP value distributions were observed from SHAP summary and dependence plots (e.g., Figure 4 and Figure 5), while mean absolute SHAP values were taken from the global feature importance plot (Figure 3).

The knot points for the triangular membership functions were chosen to create five distinct linguistic categories for each variable, ensuring that the ranges covered by these categories are meaningful in the context of the observed data distributions. This data-driven approach aims to provide a balanced spread across the linguistic terms and to align the fuzzy categorization with the qualitative interpretations of feature influences and importances discussed in Section 4 Table 4.

The chosen parameter values are as follows:

Appendix B.1. Parameters for “Influence Strength” ( $| s_{j} |$ )

The knot points

P_{1}, \dots, P_{7}

define the triangular membership functions

μ (x; a, b, c) = max (0, min (\frac{x - a}{b - a}, \frac{c - x}{c - b}))

for the linguistic terms describing the magnitude of an individual feature’s SHAP value:

$P_{1} = 0.0$ .
$P_{2} = 0.1$ .
$P_{3} = 0.5$ .
$P_{4} = 1.0$ .
$P_{5} = 2.0$ .
$P_{6} = 3.0$ .
$P_{7} = 4.5$ .

The membership functions are:

$μ_{VeryLow} (| s_{j} |)$ : Triangle( $P_{1}, P_{2}, P_{3}$ ).
$μ_{Low} (| s_{j} |)$ : Triangle( $P_{2}, P_{3}, P_{4}$ ).
$μ_{Moderate} (| s_{j} |)$ : Triangle( $P_{3}, P_{4}, P_{5}$ ).
$μ_{High} (| s_{j} |)$ : Triangle( $P_{4}, P_{5}, P_{6}$ ).
$μ_{VeryHigh} (| s_{j} |)$ : Triangle( $P_{5}, P_{6}, P_{7}$ ).

Appendix B.2. Parameters for “Overall Impact” ( ${\bar{s}}_{k}$ )

The knot points

Q_{1}, \dots, Q_{7}

define the triangular membership functions for the linguistic terms describing the global importance of a feature:

$Q_{1} = 0.0$ .
$Q_{2} = 0.02$ .
$Q_{3} = 0.07$ .
$Q_{4} = 0.13$ .
$Q_{5} = 0.20$ .
$Q_{6} = 0.35$ .
$Q_{7} = 0.60$ .

The membership functions are:

$μ_{Negligible} ({\bar{s}}_{k})$ : Triangle( $Q_{1}, Q_{2}, Q_{3}$ ).
$μ_{Low} ({\bar{s}}_{k})$ : Triangle( $Q_{2}, Q_{3}, Q_{4}$ ).
$μ_{Medium} ({\bar{s}}_{k})$ : Triangle( $Q_{3}, Q_{4}, Q_{5}$ ).
$μ_{High} ({\bar{s}}_{k})$ : Triangle( $Q_{4}, Q_{5}, Q_{6}$ ).
$μ_{VeryHigh} ({\bar{s}}_{k})$ : Triangle( $Q_{5}, Q_{6}, Q_{7}$ ).

Appendix C. Candidate Feature Set

Table A1 details the full set of candidate features that were generated and considered for the predictive model. The final model utilized the top 26 features from this list, as determined by their initial importance scores (see Section 3.2).

Table A1. Full list of candidate features for the predictive model.

Feature Name	Description	Type
Value-Based Features
`AWARD_VALUE_EURO_FIN_1_log`	Log-transform of the final contract value.	Numerical
`AWARD_VALUE_EURO_FIN_1_sqrt`	Square-root transform of the final contract value.	Numerical
`AWARD_VALUE_EURO_FIN_1_quartile`	Quartile bin of the final contract value.	Categorical
Competition-Based Features
`NUMBER_OFFERS_log`	Log-transform of the number of offers received.	Numerical
`competition_level`	Categorical bin for number of offers (1, 2–3, 4–5, etc.).	Categorical
`no_competition`	Binary: 1 if the tender received exactly one offer.	Binary
`high_competition`	Binary: 1 if the tender received 5 or more offers.	Binary
Historical and Country-Level Features
`historical_local_win_rate`	The tender country’s historical rate of awarding to local winners.	Numerical
`country_tender_count`	Total number of tenders from the buyer’s country in the dataset.	Numerical
`tender_in_eu15`	Binary: 1 if buyer country is in the EU15 group.	Binary
`tender_in_eastern_eu`	Binary: 1 if buyer country is in the Eastern EU group.	Binary
Contract and Procedure Characteristics (Original)
`ISO_COUNTRY_CODE`	The 2-letter code of the contracting authority’s country.	Categorical
`CAE_TYPE`	The type of the Contracting Authority (e.g., National, Regional).	Categorical
`MAIN_ACTIVITY`	The main sector of activity for the contracting authority.	Categorical
`TYPE_OF_CONTRACT`	The primary type of the contract (e.g., Services, Works).	Categorical
`TOP_TYPE`	The type of procedure used (e.g., Open, Restricted).	Categorical
`CRIT_CODE`	Code for the award criteria (e.g., Price only, MEAT).	Categorical
`B_GPA`	Binary: Whether the tender is covered by the GPA.	Categorical
`B_SUBCONTRACTED`	Binary: Whether subcontracting is foreseen.	Categorical
Complexity and Temporal Features (Engineered)
`has_eu_funds`	Binary: 1 if contract is financed by EU funds.	Binary
`is_framework`	Binary: 1 if the tender is part of a framework agreement.	Binary
`complexity_score`	A composite score based on value, duration, funds, etc.	Numerical
`high_complexity`	Binary: 1 if `complexity_score` is high.	Binary
`DT_DISPATCH_year`	The year the tender notice was dispatched.	Categorical
`DT_DISPATCH_quarter`	The quarter the tender notice was dispatched.	Categorical
`DT_DISPATCH_month`	The month the tender notice was dispatched.	Categorical

References

European Court of Auditors. Public Procurement in the EU—Less Competition for Contracts Awarded for Works, Goods and Services in the 10 Years up to 2021; Technical Report; European Court of Auditors: Luxembourg, 2023. [CrossRef]
European Union. Directive 2014/24/EU of the European Parliament and of the Council of 26 February 2014 on Public Procurement and Repealing Directive 2004/18/EC; Technical Report; Publications Office of the European Union: Brussels, Belgium, 2014. [Google Scholar]
Herz, B.; Varela-Irimia, X.L. Border effects in European public procurement. J. Econ. Geogr. 2020, 20, 1359–1405. [Google Scholar] [CrossRef]
Pereira, G.V.; Parycek, P.; Falco, E.; Kleinhans, R. Smart governance in the context of smart cities: A literature review. Inf. Polity 2018, 23, 143–162. [Google Scholar] [CrossRef]
Publications Office of the European Union. About TED. Available online: https://ted.europa.eu/TED/main/HomePage.do (accessed on 16 May 2025).
Coglianese, C. Procurement and artificial intelligence. In Handbook on Public Policy and Artificial Intelligence; Edward Elgar Publishing Limited: Chelthenham, UK, 2024; pp. 235–248. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017); Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 4765–4774. [Google Scholar]
European Parliament. Public Procurement Contracts. Available online: https://www.europarl.europa.eu/factsheets/en/sheet/40/public-procurement-contracts (accessed on 16 May 2025).
de Mars, S. General principles in EU public procurement law. In Research Handbook on General Principles in EU Law; Edward Elgar Publishing: Camberley, UK, 2022; pp. 462–479. [Google Scholar] [CrossRef]
European Commission. Public Procurement. Available online: https://single-market-economy.ec.europa.eu/single-market/public-procurement_en (accessed on 16 May 2025).
Csáki, C.; Prier, E. Quality issues of public procurement open data. In Proceedings of the Electronic Government and the Information Systems Perspective: 7th International Conference, EGOVIS 2018, Regensburg, Germany, 3–5 September 2018; Proceedings 7. pp. 177–191. [Google Scholar] [CrossRef]
Schoeberlein, J. Foreign Bidders in Public Procurement: Corruption Risks and Mitigation Strategies; Transparency Internationl: Berlin, Germany, 2022. [Google Scholar]
BusinessEurope. Single Market Barriers Overview; Technical Report. Available online: https://www.eurocommerce.eu/2024-05-single-market-barriers-overview-2/ (accessed on 16 May 2025).
Mutangili, S.K. SME Participation Barriers in European Union Public Procurement Markets. Kabar. J. Res. Innov. (Was J. Procure. Supply Chain.) 2024, 8, 70–78. [Google Scholar] [CrossRef]
Flynn, A. Measuring procurement performance in Europe. J. Public Procure. 2018, 18, 2–13. [Google Scholar] [CrossRef]
Edler, J.; Georghiou, L. Public procurement and innovation—Resurrecting the demand side. Res. Policy 2007, 36, 949–963. [Google Scholar] [CrossRef]
Bechauf, R.; Casier, L.; Erizaputri, S. How Reforming the European Union’s Public Procurement Directive Can Help Drive a Green Transition. Available online: https://www.iisd.org/system/files/2025-04/european-union-green-public-procurement.pdf (accessed on 16 May 2025).
European Commission. Socially Responsible Public Procurement (SRPP). Available online: https://single-market-economy.ec.europa.eu/single-market/public-procurement/strategic-procurement/socially-responsible-public-procurement_en (accessed on 16 May 2025).
Andersson, P.E.; Arbin, K.; Rosenqvist, C. Assessing the value of artificial intelligence (AI) in governmental public procurement. J. Public Procure. 2025, 25, 120–139. [Google Scholar] [CrossRef]
McBride, K.; van Noordt, C.; Misuraca, G.; Hammerschmid, G. Towards a systematic understanding on the challenges of public procurement of artificial intelligence in the public sector. In Research Handbook on Public Management and Artificial Intelligence; Edward Elgar Publishing Limited: Chelthenham, UK, 2024; pp. 62–77. [Google Scholar] [CrossRef]
Rodríguez, M.J.G.; Rodríguez-Montequín, V.; Ballesteros-Pérez, P.; Love, P.E.; Signor, R. Collusion detection in public procurement auctions with machine learning algorithms. Autom. Constr. 2022, 133, 104047. [Google Scholar] [CrossRef]
Torres-Berru, Y.; Vivian, F.; Lopez-Batista, L.C.Z. A Data Mining Approach to Detecting Bias and Favoritism in Public Procurement. Intell. Autom. Soft Comput. 2023, 36, 3501–3516. [Google Scholar] [CrossRef]
Anurag, A.; Johnpaul, M. Predictive Analytics: Facilitating the Process of Supply Chain Automation. In Advancements in Intelligent Process Automation; Thangam, D., Ed.; IGI Global Scientific Publishing: Hershey, PA, USA, 2025; pp. 481–512. [Google Scholar] [CrossRef]
Modrusan, N.; Rabuzin, K.; Mrsic, L. Improving Public Sector Efficiency using Advanced Text Mining in the Procurement Process. In Proceedings of the 9th International Conference on Data Science, Technology and Applications-DATA, Web-Based Event, 7–9 July 2020; pp. 200–206. [Google Scholar] [CrossRef]
Amarasinghe, K.; Rodolfa, K.T.; Lamba, H.; Ghani, R. Explainable machine learning for public policy: Use cases, gaps, and research directions. Data Policy 2023, 5, e5. [Google Scholar] [CrossRef]
Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Kim, J.; Maathuis, H.; Sent, D. Human-centered evaluation of explainable AI applications: A systematic review. Front. Artif. Intell. 2024, 7, 1456486. [Google Scholar] [CrossRef]
OECD. OECD AI Principles Overview. Available online: https://oecd.ai/en/ai-principles (accessed on 16 May 2025).
Molnar, C. Interpretable Machine Learning. A Guide for Making Black Box Models Explainable, 3rd ed.; Self-Published Christoph Molnar: Munich, Germany, 2025. [Google Scholar]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
Wirtz, B.W.; Weyerer, J.C.; Geyer, C. Artificial intelligence and the public sector—Applications and challenges. Int. J. Public Adm. 2019, 42, 596–615. [Google Scholar] [CrossRef]
Dwivedi, R.; Dave, D.; Naik, H.; Singhal, S.; Omer, R.; Patel, P.; Qian, B.; Wen, Z.; Shah, T.; Morgan, G.; et al. Explainable AI (XAI): Core Ideas, Techniques, and Solutions. ACM Comput. Surv. 2023, 55, 194. [Google Scholar] [CrossRef]
European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 Laying down Harmonised Rules on Artificial Intelligence and Amending Regulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and Directives 2006/42/EC, 2009/48/EC, 2013/29/EU, 2014/33/EU, 2014/34/EU, 2014/35/EU, 2014/53/EU, 2014/68/EU and (EU) 2016/797 (Artificial Intelligence Act); L Series; Official Journal of the European Union: Brussels, Belgium, 2024. [Google Scholar]
Sanchez-Graells, A. Public Procurement of Artificial Intelligence: Recent Developments and Remaining Challenges in EU Law. LTZ (Leg. Tech J.) 2024, 2, 122–131. [Google Scholar] [CrossRef]
Commitee on Internal Market and Consumer Protection (IMCO). Activity Report 2019–2024; Technical Report; European Parliament, 2024. Available online: https://www.europarl.europa.eu/cmsdata/283639/IMCOActivityReport-2019-2024.pdf (accessed on 16 May 2025).
Soares, H.d.A.; Moura, R.S.; Machado, V.P.; Paiva, A.; Lima, W.; Veras, R. The Detection of Spurious Correlations in Public Bidding and Contract Descriptions Using Explainable Artificial Intelligence and Unsupervised Learning. Electronics 2025, 14, 1251. [Google Scholar] [CrossRef]
Aboelazm, K.S. A new era of public procurement: Critical issues of procuring artificial intelligence systems to produce public services. Int. J. Law Manag. 2025. ahead-of-print. [Google Scholar] [CrossRef]
Barocas, S.; Hardt, M.; Narayanan, A. Fairness and Machine Learning: Limitations and Opportunities; MIT Press: Cambridge, UK, 2023. [Google Scholar]
Wang, L.; Zhang, Y.; Wang, J. Multi-Attribute Decision-Making Methods for Additive Manufacturing: A Review. Processes 2023, 11, 497. [Google Scholar] [CrossRef]
Luís, V.; Arruda, P. A Multi-Criteria Model to Evaluate Public Services Contracts. Int. Bus. Res. 2022, 15, 85. [Google Scholar] [CrossRef]
UK Government Analysis Function. An Introductory Guide to Multi-Criteria Decision Analysis (MCDA). Available online: https://analysisfunction.civilservice.gov.uk/policy-store/an-introductory-guide-to-mcda/ (accessed on 16 May 2025).
Adil, M. A Decision Model for e-Procurement Decision Support Systems for the Public Sector Using Multi-Criteria Decision Analysis. Ph.D. Thesis, University of Sheffield, Sheffield, UK, 2015. [Google Scholar]
Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Zadeh, L.A. The concept of a linguistic variable and its application to approximate reasoning—I. Inf. Sci. 1975, 8, 199–249. [Google Scholar] [CrossRef]
Herrera, F.; Herrera-Viedma, E. Linguistic decision analysis: Steps for solving decision problems under linguistic information. Fuzzy Sets Syst. 2000, 115, 67–82. [Google Scholar] [CrossRef]
Xu, Z. A survey of preference relations. Int. J. Gen. Syst. 2007, 36, 179–203. [Google Scholar] [CrossRef]
Ross, T.J. Fuzzy Logic with Engineering Applications, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar] [CrossRef]
Fatima, A.; Nai, A.; Meo, R. Machine Learning in Procurement with a View to Equity. In Artificial Intelligence—Social, Ethical and Legal Issues; IntechOpen: London, UK, 2025. [Google Scholar] [CrossRef]
Nai, A.; Meo, R.; Corbo, R.; Sapino, A.; Torasso, P. Public tenders, complaints, machine learning and recommender systems: A case study in public administration. Expert Syst. Appl. 2023, 156, 103606. [Google Scholar] [CrossRef]
Minghini, M. Semantic Annotation and Classification of EU Tendering Data on Open Geospatial Software, Standards and Data Using Large Language Models. Available online: https://av.tib.eu/media/68534 (accessed on 16 May 2025).
Westermann, J.; Klassen, G.; Bauer, L.T.; Fritzsche, R. Breaking Tech Monopolies: The Role of Public Procurement in Fostering SME-Led Innovation in Europe. In Proceedings of the 26th Annual International Conference on Digital Government Research, Porto Alegre, Brazil, 9–12 June 2025. [Google Scholar] [CrossRef]
Publications Office of the European Union. TED (Tenders Electronic Daily) CSV. Available online: https://data.europa.eu/data/datasets/ted-csv (accessed on 16 May 2025).
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 3146–3154. [Google Scholar]
Cernat, L. Participation of Foreign Bidders in EU Public Procurement: Too Much or Too Little? Technical Report; 2025. Available online: https://ecipe.org/publications/participation-foreign-bidders-eu-public-procurement/ (accessed on 16 May 2025).

Figure 1. Model performance metrics vs. classification threshold. Shows accuracy and F1-score. Vertical dashed lines indicate optimal thresholds for accuracy (0.14 in plot, 0.1387 precise) and F1-score (0.13 in plot, 0.1288 precise).

Figure 2. Confusion matrices for predicting local (1) vs. non-local (0) winners on the test set. (a) Default threshold 0.5 (b) Optimal threshold 0.1387.

Figure 3. SHAP global feature importance (mean absolute SHAP value). Features are ranked by their average impact on the model output magnitude for predicting a local winner.

Figure 4. SHAP Summary Plot (Beeswarm). Shows the distribution of SHAP values for each feature across the explanation sample. Color indicates feature value (high/low). Positive SHAP values increase the likelihood of predicting a “Local Winner”.

Figure 5. SHAP dependence plots for key features. Each plot shows how a feature’s value (x-axis) impacts the model’s prediction for a local winner (SHAP value on the y-axis). Positive SHAP values increase the likelihood of predicting a local winner. The color of each point represents the value of a secondary interaction feature, revealing tripartite relationships. (a) historical_local_win_rate—The powerful path-dependency of a country’s historical local win rate. (b) AWARD_VALUE_EURO_FIN_1_log—the tendency for higher contract values to attract non-local competition. (c) ISO_COUNTRY_CODE—the varied baseline tendencies toward local awards across different buyer countries. (d) CAE_TYPE—the influence of the contracting authority’s type. (e) no_competition—the impact of single-bidder tenders.

Table 1. Comparative overview of recent AI/ML applications in public procurement analysis.

Study	Methodology	Data Scope	Primary Objective	Distinguishing Focus
Current Work (2025)	LightGBM, SHAP, Fuzzy Linguistic Interpretation	Pan-European (TED)	Explain market integration drivers	Pan-European explanatory analysis of winner nationality using a fuzzy-enhanced XAI framework.
Fatima, Nai, & Meo (2025) [49]	XGBoost, XAI	National (Italy)	Predict contract variations	XAI application for fairness analysis of contract variations within a single country’s procurement system.
Minghini et al. (2024) [51]	GPT-3.5, Support Vector Machine (SVM)	Pan-European (TED)	Classify tenders by domain	Automation of tender classification in a specific technological sector using ML on TED data.
Nai et al. (2023) [50]	ML Classifiers, NLP, Recommender System	National (Italy)	Predict tender complaints	Predictive analysis of an administrative outcome (complaints) and development of a functional tool for bidders.
Westermann et al. (2025) [52]	Quantitative Analysis, Firm Classification	Pan-European (TED), enriched with Orbis	Analyze SME/startup barriers in ICT	Structural analysis of market access barriers for SMEs and startups in the European ICT sector.

Table 2. Comparative performance of predictive models on the test set.

Model	AUC	Accuracy	F1-Score
Logistic Regression	0.717	0.653	0.772
Random Forest	0.810	0.702	0.808
LightGBM (Proposed)	0.855	0.913	0.953

Note: Metrics for logistic regression and random forest are calculated at the default 0.5 classification threshold. The LightGBM metrics are reported at its optimized threshold (0.1387) to represent its peak performance, as detailed in Section 4.2.

Table 3. Classification Report Summary on Test Set.

Threshold	Class	Prec.	Rec.	F1	Support
Default (0.50)	Non-Local (0)	0.253	0.793	0.383	124,516
	Local (1)	0.971	0.748	0.845	1,157,782
	Accuracy			0.752	1,282,298
Optimal (0.1387)	Non-Local (0)	0.647	0.226	0.335	124,516
	Local (1)	0.922	0.987	0.953	1,157,782
	Accuracy			0.913	1,282,298

Table 4. Fuzzy linguistic interpretation of top global feature importance (values are approximate from Figure 3).

Feature Name	Mean Abs SHAP	Fuzzy Overall Impact
`historical_local_win_rate`	≈0.53	Very High
`AWARD_VALUE_EURO_FIN_1_log`	≈0.31	Very High
`ISO_COUNTRY_CODE`	≈0.18	High
`MAIN_ACTIVITY`	≈0.17	High
`CAE_TYPE`	≈0.14	High
`no_competition`	≈0.12	Medium
`NUMBER_OFFERS_log`	≈0.11	Medium
`TYPE_OF_CONTRACT`	≈0.10	Medium
`country_tender_count`	≈0.09	Medium
`AWARD_VALUE_EURO_FIN_1_sqrt`	≈0.08	Low

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cernăzanu-Glăvan, C.; Bulzan, A.-Ș. Explainable AI and Fuzzy Linguistic Interpretation for Enhanced Transparency in Public Procurement: Analyzing EU Tender Awards. Mathematics 2025, 13, 2215. https://doi.org/10.3390/math13132215

AMA Style

Cernăzanu-Glăvan C, Bulzan A-Ș. Explainable AI and Fuzzy Linguistic Interpretation for Enhanced Transparency in Public Procurement: Analyzing EU Tender Awards. Mathematics. 2025; 13(13):2215. https://doi.org/10.3390/math13132215

Chicago/Turabian Style

Cernăzanu-Glăvan, Cosmin, and Andrei-Ștefan Bulzan. 2025. "Explainable AI and Fuzzy Linguistic Interpretation for Enhanced Transparency in Public Procurement: Analyzing EU Tender Awards" Mathematics 13, no. 13: 2215. https://doi.org/10.3390/math13132215

APA Style

Cernăzanu-Glăvan, C., & Bulzan, A.-Ș. (2025). Explainable AI and Fuzzy Linguistic Interpretation for Enhanced Transparency in Public Procurement: Analyzing EU Tender Awards. Mathematics, 13(13), 2215. https://doi.org/10.3390/math13132215

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainable AI and Fuzzy Linguistic Interpretation for Enhanced Transparency in Public Procurement: Analyzing EU Tender Awards

Abstract

1. Introduction

2. Literature Review

2.1. Regulation and Market Challenges in EU Public Procurement

2.2. From AI to XAI in Public Procurement

2.3. Fuzzy-Linguistic Interpretation of XAI Results

2.4. Synthesis and Contribution

3. Materials and Methods

3.1. Data Source and Cohort Definition

3.2. Feature Engineering

3.3. Predictive Modeling

3.4. Explainability Method (SHAP)

3.5. Fuzzy Linguistic Interpretation of SHAP Values

4. Results

4.1. Model Performance Comparison

4.2. Predictive Model Performance

4.3. Explainability Insights (SHAP Results)

5. Discussion

5.1. Interpreting Model Findings in the Context of Market Integration

5.2. Exploring Potential Biases and Fairness Implications

5.3. Implications for Smart Governance and Policy

5.4. Limitations

5.5. Methodological Contribution

5.6. Future Research Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Justification and Robustness of Fuzzy Thresholds

Appendix A.1. Rationale for Threshold Selection

Appendix A.2. Sensitivity and Robustness of Linguistic Labels

Appendix B. Details of Fuzzy Parameter Selection for Linguistic Interpretation

Appendix B.1. Parameters for “Influence Strength” ( | s j | )

Appendix B.2. Parameters for “Overall Impact” ( s ¯ k )

Appendix C. Candidate Feature Set

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix B.1. Parameters for “Influence Strength” ( $| s_{j} |$ )

Appendix B.2. Parameters for “Overall Impact” ( ${\bar{s}}_{k}$ )