Evaluating Trustworthiness in AI: Risks, Metrics, and Applications Across Industries

Nastoska, Aleksandra; Jancheska, Bojana; Rizinski, Maryan; Trajanov, Dimitar

doi:10.3390/electronics14132717

Open AccessReview

Evaluating Trustworthiness in AI: Risks, Metrics, and Applications Across Industries

¹

Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, 1000 Skopje, North Macedonia

²

Department of Computer Science, Metropolitan College, Boston University, Boston, MA 02215, USA

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(13), 2717; https://doi.org/10.3390/electronics14132717

Submission received: 5 June 2025 / Revised: 30 June 2025 / Accepted: 1 July 2025 / Published: 4 July 2025

(This article belongs to the Special Issue Navigating the Digital Age: Security, Ethics and Trust in Emerging Technologies)

Download

Browse Figures

Versions Notes

Abstract

Ensuring the trustworthiness of artificial intelligence (AI) systems is critical as they become increasingly integrated into domains like healthcare, finance, and public administration. This paper explores frameworks and metrics for evaluating AI trustworthiness, focusing on key principles such as fairness, transparency, privacy, and security. This study is guided by two central questions: how can trust in AI systems be systematically measured across the AI lifecycle, and what are the trade-offs involved when optimizing for different trustworthiness dimensions? By examining frameworks such as the NIST AI Risk Management Framework (AI RMF), the AI Trust Framework and Maturity Model (AI-TMM), and ISO/IEC standards, this study bridges theoretical insights with practical applications. We identify major risks across the AI lifecycle stages and outline various metrics to address challenges in system reliability, bias mitigation, and model explainability. This study includes a comparative analysis of existing standards and their application across industries to illustrate their effectiveness. Real-world case studies, including applications in healthcare, financial services, and autonomous systems, demonstrate approaches to applying trust metrics. The findings reveal that achieving trustworthiness involves navigating trade-offs between competing metrics, such as fairness versus efficiency or privacy versus transparency, and emphasizes the importance of interdisciplinary collaboration for robust AI governance. Emerging trends suggest the need for adaptive frameworks for AI trustworthiness that evolve alongside advancements in AI technologies. This paper contributes to the field by proposing a comprehensive review of existing frameworks with guidelines for building resilient, ethical, and transparent AI systems, ensuring their alignment with regulatory requirements and societal expectations.

Keywords:

artificial intelligence; trustworthy AI; AI regulation; ethical governance

1. Introduction

As artificial intelligence (AI) becomes increasingly integrated into sectors such as finance, healthcare, and public administration, ensuring the trustworthiness of AI-based systems is essential. Trustworthy AI extends beyond conventional performance metrics, such as accuracy, and requires rigorous evaluation across multiple dimensions, including fairness, transparency, privacy, and security. These additional layers of assessment help mitigate risks associated with biases, lack of transparency, and ethical concerns, thereby promoting reliable and ethical AI deployment.

With the widespread use of AI technology in various industries worldwide, the trustworthiness of AI-based systems is not only a technical requirement but also a critical factor in business and operational success. Trustworthy AI has significant implications for practical applications as it determines user acceptance, regulatory compliance, and the ethical deployment of AI solutions. In various sectors, systems that are perceived as trustworthy have the potential to drive substantial economic value by reducing operational risks, enhancing decision accuracy, and improving service quality. The global market for AI is projected to reach hundreds of billions of dollars by 2030 [1] with trustworthiness becoming a decisive factor in adoption and scale. Organizations that prioritize trustworthy AI can capture a larger share of this opportunity by building sustainable, resilient, and ethically sound systems that foster user trust and comply with emerging regulations. Thus, trustworthiness can also help position businesses as leaders in a highly competitive market.

This paper provides a comprehensive review of standards, frameworks, and state-of-the-art (SOTA) metrics for evaluating AI trustworthiness in machine learning (ML) applications. It aims to identify and review major risks that may arise at various stages of the AI lifecycle, from design and development to deployment and monitoring. By examining and comparing established frameworks—such as the NIST AI Risk Management Framework and ISO/IEC standards, among others—along with specific trustworthiness metrics, this paper emphasizes practical approaches for establishing and maintaining trust in AI. Focusing on relevant metrics and examples from real-world use case scenarios, this work bridges the gap between theoretical frameworks and practical implementations. The goal is to offer actionable insights that contribute to developing and governing trustworthy AI systems, thereby fostering a more transparent, fair, and secure AI ecosystem.

The structure of this paper is organized as follows. In Section 2, we provide an overview of trustworthy AI by defining the concept and outlining the core components of AI trustworthiness. Section 3 discusses issues that could negatively impact AI trustworthiness, such as the black-box nature of models, inherent biases, and privacy concerns. Section 4 and Section 5 explore metrics and evaluation frameworks central to assessing AI’s reliability and safety, examining their practical applications across various industries. To illustrate the practical implications, Section 6 presents real-world case studies that highlight potential failures in AI trustworthiness, emphasizing the challenges and limitations of applying these metrics and frameworks. The paper concludes with an analysis of emerging trends and research opportunities, offering insights into the future of AI trustworthiness.

2. Overview of Trustworthy AI: Core Definitions and Attributes

With the rapid evolution of AI technologies across diverse sectors, trustworthy AI has positioned itself as a central concern for regulators, developers, and researchers [2,3]. Trustworthy AI refers to the design of safe and reliable systems that embody attributes such as validity, reliability, transparency, accountability, privacy, and fairness [4]. Establishing trustworthiness involves integrating user needs, system validation, and continuous performance monitoring. These efforts must address trade-offs between trustworthiness dimensions, including fairness, explainability, privacy, accountability, and transparency, ensuring these dimensions are carefully balanced and appropriately prioritized and optimized based on context [5]. AI systems built on these principles can help mitigate societal risks and align AI applications with shared human values.

Recent research highlights the theoretical and practical dimensions of trustworthiness in AI. Explainability and causability are identified as critical components, enabling users to understand and evaluate AI decisions effectively. These aspects are supported by design frameworks such as Participatory Design and the Nested Framework, which integrate ethical considerations and stakeholder input into AI system development. Participatory Design entails user participation in design for work practice. It is a democratic process for design (social and technological) of systems involving human work, based on the argument that users should be involved in designs they will be using, and that all stakeholders, including and especially users, have equal input into interaction design [6]. The Nested Framework refers to a theoretical or conceptual framework designed to integrate multiple layers of considerations into the development and implementation of systems, often in complex domains such as AI, sustainability, or education. However, challenges remain in combining causality with normative ethics in multidisciplinary contexts [7].

Maintaining public trust requires the alignment of technical design, institutional practices, and regulatory frameworks, as this synergy ensures the effective, ethical, and sustainable implementation of AI-based systems. Without such alignment, AI technologies risk inefficiency, unintended bias, or even harm, undermining their societal benefits. Effective governance must achieve a balance between efficiency, fairness, and accountability, creating a foundation for responsible innovation. National legislation, such as the European Union’s AI Act, adopts precautionary measures to achieve this balance by enhancing transparency and accountability in AI systems. The Act outlines specific requirements, including comprehensive documentation of system design and functionality, disclosure obligations to inform users of AI interactions, and the integration of explainability features to ensure decisions made by AI can be understood and challenged. Addressing transparency challenges through these measures is essential to fostering and enhancing public trust in algorithmic decision-making (ADM) systems and AI technologies [8].

In the following sections, we provide a detailed overview of the components of trustworthiness in AI-based systems and discuss their importance across various sectors.

2.1. Components of AI Trustworthiness

Trustworthiness in AI encompasses several fundamental components that establish the reliability and ethical alignment of AI systems. These include fairness, transparency, privacy, accountability, and security, which are outlined in the following text and consolidated in Figure 1:

Fairness. Fairness is a cornerstone of trustworthy AI, ensuring that AI systems avoid biases or minimize their negative impact as they could lead to discriminatory outcomes. Ethical challenges associated with bias include discrimination and inequality, human autonomy, and public trust [9]. This is especially important in sensitive domains like finance, healthcare, and criminal justice, where AI-driven decisions can significantly affect lives of individuals. Fairness can be assessed using various metrics, such as the Gini coefficient, which measures the concentration of exploratory variables based on Shapley values or estimated parameters [10]. Additionally, statistical tools like the Kolmogorov–Smirnov test can be employed to detect deviations from uniform distribution [11]. These methods help ensure that different population groups are treated equitably. Other notable fairness metrics include demographic parity (DP) and equal opportunity (EO) [12].
Transparency. As an essential component for maintaining trust and accountability, transparency in AI requires openness and clarity in decision-making processes to help users understand how predictions are generated [13]. This is especially critical for “black-box” models. Techniques such as Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP) are known to enhance transparency [14]. The growing emphasis on transparency is reflected in regulatory frameworks, with the European Union (EU) leading efforts through its 2018 AI strategy and the High-Level Expert Group on AI (AI HLEG) [13]. The purpose of the EU’s AI HLEG is to develop voluntary AI ethics guidelines, titled Ethics Guidelines for Trustworthy AI (HLEG Guidelines), as well as policy and investment recommendations for EU institutions and Member States, known as Policy and Investment Recommendations for Trustworthy AI (Policy Recommendations) [15]. Additionally, global standards organizations like IEEE and ISO are focusing on transparency in AI governance. The IEEE has been at the forefront of promoting AI transparency through its Ethically Aligned Design framework and standards like IEEE P7001, which focus on transparency in autonomous systems [16]. These efforts aim to ensure that AI systems are explainable, accountable, and aligned with human values, fostering trust and ethical deployment in society. Legal discussions, including the right to explanation under EU’s General Data Protection Regulation (GDPR) and calls for algorithmic auditing, highlight the need for robust transparency measures. Ultimately, transparency goes beyond individual algorithms to include system-level considerations, meaning that the entire lifecycle and interactions of an AI system (i.e., from its data inputs, decision-making processes, and outputs), are made understandable and interpretable to stakeholders. Achieving AI transparency on the system level ensures that users, regulators, and developers can evaluate the collective behavior of all interconnected components, not just isolated parts. System-level transparency is crucial because it helps identify potential biases, vulnerabilities, or unintended consequences that may arise from the interplay of various elements within the AI system, thus fostering trust, accountability, and ethical use.
Privacy. Privacy in AI is important to safeguard user data from unauthorized access while ensuring the secure handling of personal information, particularly in sensitive industries like healthcare and finance. Techniques such as federated learning (FL), which allows AI models to be trained across decentralized devices without transferring raw data [17], and differential privacy (DP), which adds statistical noise to datasets to protect individual identities [17], play a key role in achieving AI privacy by design. Beyond data protection, privacy in AI also encompasses deeper aspects like autonomy and identity. Autonomy refers to an individual’s ability to make decisions without undue influence from AI-driven profiling or recommendations, while identity is related to preserving the integrity of personal characteristics without distortion or misuse. The large-scale data collection and profiling by AI systems often erode these controls [18], highlighting the ongoing tension between advancing technology and maintaining user consent and control. This erosion emphasizes the need for robust privacy frameworks to ensure AI respects fundamental human rights.
Accountability. Accountability is essential for ethical AI governance, ensuring that systems operate in accordance with ethical, transparent, and regulatory standards [19]. In the context of trustworthy AI, accountability refers to the clear attribution of responsibility for decisions made by AI systems, ensuring that stakeholders can trace actions back to specific individuals or entities. Additionally, accountability at its core involves the recognition of authority and the limitation of power to prevent the arbitrary use of authority. It takes on a sociotechnical dimension, reflecting the complexity of systems that involve diverse stakeholders, from developers to end-users [19]. For all actions in an AI-based system, there must be clear accountability to understand responsibility, especially when outcomes vary due to model decisions, errors, or unforeseen behaviors. This clarity is critical for determining who is responsible in cases of harm, bias, or failure, ensuring that the system’s outcomes align with ethical standards. Challenges, such as the unpredictability of AI outcomes, highlight the need for robust documentation and audit trails throughout the AI lifecycle [20]. These measures enable transparency, facilitate compliance checks, and ensure clear attribution of responsibility in the event of system failures or algorithm faults.
Security. Security in AI ensures that AI-based systems are protected against vulnerabilities, safeguarding their integrity, confidentiality, and privacy, especially in the face of malicious attacks. Vulnerabilities in AI systems can be exploited through techniques such as model extraction, where an attacker attempts to recreate or steal the underlying model by querying the system [21], and evasion, where attackers design inputs to deceive the AI system into making incorrect predictions [21]. These attacks can undermine both the performance and reliability of AI systems. Effective defenses against such attacks are generally categorized into complete defenses, which prevent attacks from succeeding, and detection-only defenses, which identify and mitigate the impact of attacks after they occur [21]. Techniques like adversarial training, where models are trained with intentionally altered inputs to improve robustness, and data sanitization, which involves cleaning and securing training data to remove potentially harmful or misleading examples, can significantly enhance AI system resilience [21]. However, the field continues to face challenges, particularly as adversaries develop more sophisticated attack strategies. This highlights the need for ongoing research to protect AI systems from emerging threats. Thus, security plays a critical role in fostering trust in AI by ensuring reliable and safe performance in diverse and potentially hostile environments. These challenges are reflected in global AI regulations, with the EU’s AI Act mandating security controls for high-risk systems [22], and the US National Institute of Standards and Technology’s AI Risk Management Framework emphasizing security as a core component of trustworthy AI [23]. Similarly, nations like India are operationalizing this through their national AI strategy [24], while China’s regulations emphasize state-led control to manage the security and stability of AI applications [25].

2.2. Importance Across Sectors

The demand for trustworthy AI spans multiple sectors, each with its own unique requirements and challenges. We explore some of the most prominent sectors, including finance, healthcare, public administration, and autonomous systems, where the adoption of trustworthy AI is important to promote trust in AI.

2.2.1. Finance

AI has become increasingly used in the finance sector, reshaping the way financial institutions operate and serve their clients. Its applications extend across areas such as fraud detection, credit scoring, and algorithmic trading [26], where security, accuracy, and fairness are crucial to mitigating the risk of significant financial risks and repercussions. Financial institutions and fintech companies have increasingly embraced AI to improve efficiency and customer experience [27]. This widespread adoption has also been a catalyst for economic growth in various countries, positioning their economies as technological leaders within their regions. In particular, several countries across different regions, including Singapore, Denmark, and the United States, have demonstrated strong preparedness for AI adoption by fostering safe and responsible AI environments and maintaining public trust [28,29]. Beyond operational advantages, AI has improved risk management by enabling more precise predictions and fostering data-driven decision-making. These innovations have been assisted by regulatory frameworks that prioritize ethical practices, transparency, and compliance [30]. Despite its transformative potential, AI in finance still faces challenges, particularly in addressing data privacy concerns and security vulnerabilities [31]. Consumer trust depends on how effectively these issues are managed. Initiatives such as those from the Frankfurt Institute for Risk Management and Regulation (FIRM) highlight the importance of collaboration in developing robust governance standards, especially with regard to the financial industry. The Frankfurt Institute for Risk Management and Regulation (FIRM) was founded in 2009 and is supported by the Association for Risk Management and Regulation (Gesellschaft für Risikomanagement und Regulierung e.V.). It brings together financial service providers, corporations, consulting firms, and the State of Hesse, Germany, to promote teaching and research in risk management and regulation, especially concerning the financial industry. The financial sector will continue to use AI technologies to create solutions that drive sustainable development and foster an environment characterized by technology-driven financial services [32]. By tackling these challenges and prioritizing fairness, ethics, and security, AI can continue to responsibly advance the financial industry while emphasizing trust and innovation.

2.2.2. Healthcare

AI technologies drive innovation in healthcare across diverse applications, including diagnostics, treatment, and patient management [33]. Predictive models have revolutionized the field, helping to mitigate health risks and improve outcomes. Model-agnostic methods, such as LIME and SHAP, are used to provide post hoc explanations by approximating a model’s behavior locally around specific predictions [34,35]. Beyond traditional clinical settings, AI has enabled remote and preventive care through virtual health assistants and wearable devices that monitor patients in real time—an important advancement over prior approaches [36]. In all these use cases, ensuring AI transparency and fairness is essential to mitigating biases that could adversely affect specific populations.

AI’s reliance on healthcare data raises privacy concerns. Breaches in the healthcare sector, such as the Anthem medical data breach that affected 78.8 million patients’ records, highlight the challenges and vulnerabilities in protecting electronic health records [17]. Privacy-preserving machine learning (PPML) techniques, including differential privacy and federated learning, aim to secure sensitive data but often come at the cost of model performance [17]. Moreover, evaluating AI models in clinical settings requires robust performance metrics to ensure reliability. Metrics such as precision, recall, F1 score, and Matthews correlation coefficient (MCC) are commonly used, but data imbalance and class-dependence can skew results, which leads to the need for comprehensive evaluation strategies [37]. By leveraging secure, interpretable, and fair AI models, the healthcare sector can improve outcomes while safeguarding individual rights, paving the way for innovative healthcare solutions.

2.2.3. Public Administration

AI adoption in public administration is transforming the way governments deliver services and manage resources, with applications across public services, criminal justice, social welfare programs, and public resource allocation [38]. Integrating AI holds potential to enhance efficiency and decision-making. Digitalization, grounded in earlier eGovernment initiatives, forms the backbone of this transformation, providing the data and infrastructure necessary for AI development [38,39]. However, public administrations often face challenges in transitioning from development to implementation due to gaps in relevant skills, resources, strategic alignment, and organizational change [38]. Collaboration—both within public sector institutions and with external partners—is an enabler of AI adoption, facilitating effective data sharing and citizen engagement [39]. By addressing these challenges and implementing AI-driven changes in organizational structures and processes, public administrations can facilitate sustainable and ethical AI integration. This has the potential to foster greater value and trust in AI-driven governance for citizens.

2.2.4. Autonomous Driving and Robotics

While the integration of AI into autonomous systems (AI/AS) is regarded as a defining feature of the Fourth and Fifth Industrial Revolutions, it also poses significant implications for safety, reliability, and ethical governance [40,41,42]. In fields such as autonomous driving and robotics, AI enables systems to navigate dynamic and unpredictable environments by leveraging advancements in machine learning, data analysis, and sensor integration [43]. The trustworthiness of these systems is paramount, requiring robust defenses against environmental variability, cyberattacks, and sensor failures.

Key initiatives such as the Asilomar Principles, national AI strategies [43], and partnerships like the Partnership on AI (PAI) highlight the growing focus on the ethical and societal implications of AI and robotics. The Asilomar Principles are a set of 23 ethical guidelines aimed at ensuring the safe and beneficial development of artificial intelligence. They were developed at the 2017 Asilomar Conference on Beneficial AI, organized by the Future of Life Institute. The principles address issues such as research transparency, value alignment, shared benefit, and the long-term societal impact of AI. Their goal is to promote collaboration among researchers and policymakers to prevent misuse or unintended consequences of AI technologies.

The Partnership on AI (PAI) is a nonprofit organization founded in 2016 by major technology companies, including Amazon, Apple, Facebook, Google, IBM, and Microsoft, alongside other stakeholders from academia, civil society, and research institutions. PAI aims to advance public understanding of artificial intelligence, promote best practices in AI development, and ensure AI technologies are developed and used in a manner that benefits humanity. The organization focuses on topics such as fairness, transparency, accountability, safety, and the societal impact of AI. This growing focus on AI’s ethical implications includes ensuring that systems are not only safe but also capable of making ethical decisions [43]. Frameworks like IEEE’s Ethically Aligned Design and British standard BS 8611:2016 have provided valuable guidance in developing transparent, accountable, and privacy-conscious AI/AS [42]. The IEEE’s Ethically Aligned Design (EAD) is a comprehensive framework published by the Institute of Electrical and Electronics Engineers (IEEE) to guide the ethical development and deployment of autonomous and intelligent systems (AIS). First released in 2016 and updated in subsequent editions, EAD provides recommendations on embedding ethical considerations such as transparency, accountability, privacy, and human rights into the design process of AIS. It serves as the foundation for IEEE’s global standards projects, including IEEE 7000-series standards [16], which address ethical issues in AI and emerging technologies. The British standard BS 8611:2016 [42], published by the British Standards Institution (BSI), provides a framework for the ethical design and application of robots and robotic systems. Titled Robots and Robotic Devices—Guide to the Ethical Design and Application of Robots and Robotic Systems, it outlines potential ethical hazards, such as privacy, safety, and societal impacts, and offers strategies to mitigate them. The standard is particularly focused on ensuring that robotics development aligns with societal values, human rights, and public safety, serving as a key reference for developers, manufacturers, and policymakers. Machine ethics plays a crucial role in ensuring that AI/AS align with societal values, distinguishing between implicit ethical agents that avoid harm through predefined rules and explicit agents capable of reasoning about ethics. Despite the potential of AI/AS, challenges such as the opacity of “black-box” decision-making, ethical accountability, and the risks of misuse remain [40]. Therefore, further governance frameworks and interdisciplinary collaboration are essential.

By defining and understanding these components and their sector-specific relevance, AI stakeholders can build a broad foundation for analyzing the metrics and frameworks critical to assessing AI trustworthiness. The next section will explore the key challenges in achieving trustworthy AI, including technical limitations and trade-offs that must be managed throughout the development lifecycle of AI systems.

3. Challenges in Achieving Trustworthy AI

Despite the widespread adoption of AI across industries and its business impact, ensuring trustworthy AI remains a major challenge. The complexity of AI systems, particularly the “black-box” nature of ML models, often obscures their decision-making processes, undermining transparency and accountability. Furthermore, biases embedded in training data, privacy violations, and trade-offs between competing trustworthiness metrics further increase the difficulty of creating reliable and ethical AI systems. These challenges are further complicated by emerging risks such as adversarial attacks, misinformation, and data poisoning [5], which exploit vulnerabilities in AI technologies. Addressing these issues requires interdisciplinary collaboration to establish robust technical safeguards, ethical governance frameworks, and ongoing research to balance AI’s reliability, security, and societal impact. Public awareness campaigns are also essential to ensure that AI advancements align with societal expectations [44]. This section explores the key challenges to achieving trustworthy AI, emphasizing the associated risks and ongoing efforts to address them. The challenges are summarized in Figure 2.

3.1. Black-Box Nature of AI Models

The black-box nature of AI and ML models refers to the inherent opacity in their decision-making processes, particularly in deep learning (DL) models. These models often lack transparency and interpretability, making it challenging for users to understand how decisions are made [45]. Unlike traditional ML algorithms (e.g., linear regression, logistic regression, decision trees, rule-based models, Naive Bayes, Generalized Additive Models (GAMs), etc.), where model design and outcomes are explicitly defined and easily understood, modern DL models rely on highly complex and non-linear transformations. This lack of transparency poses significant challenges in fields where accountability, trust, and explainability are essential.

AI models often learn patterns from data without explicitly encoding human-understandable logic. For example, neural networks process inputs through multiple layers of interconnected nodes, with each layer extracting features and passing them to the next; while the neural network architecture enables models to uncover complex patterns and achieve high performance, it provides little insight into how specific predictions or decisions are made. As a result, stakeholders may find it challenging to trust or justify the outcomes of these systems.

The challenges posed by black-box models extend beyond technical opacity. They include ethical and regulatory concerns, such as ensuring compliance with legislation like GDPR, which mandates transparency in automated decision-making [46]. Additionally, reliance on black-box models raises risks of preserving biases and errors embedded in training data. This can potentially lead to unfair or harmful outcomes.

Efforts to address the black-box problem have catalyzed research efforts into explainable AI (XAI), an emerging field focused on enhancing the interpretability of AI systems. Various XAI approaches have been explored, and can be broadly categorized into post hoc and intrinsic methods [47]. Post hoc methods aim to explain model predictions after training by utilizing techniques such as feature importance scoring, surrogate models, and visualization tools. Intrinsic methods involve designing models that are inherently interpretable by their structure, such as decision trees, rule-based systems, or interpretable neural networks. These methods prioritize simplicity and transparency during the model-building phase, ensuring interpretability without relying on external explanations [47]. Notable examples include LIME and SHAP, which are characterized by their ability to provide explainability insights into individual predictions [48]. However, these methods are not universally effective and often struggle to deliver meaningful insights for highly complex models or specific use cases [47]. This limitation poses challenges, particularly in sectors demanding stricter accountability. Recent advancements in XAI, such as counterfactual explanations, aim to address these known challenges of LIME and SHAP, demonstrating progress in tackling interpretability issues in complex models [49].

3.2. Bias and Fairness

Bias is a common challenge in AI systems, and it affects their fairness, reliability, and ethical alignment. It can emerge at multiple stages of the AI lifecycle, including problem definition, data collection, model training, and deployment [50]. Bias can be generally classified into two types: data-related bias and algorithmic bias. Data-related bias, such as incomplete or unrepresentative datasets, can introduce systemic inequalities into AI models. For example, in healthcare, skewed datasets can lead to models that provide suboptimal recommendations for certain populations, inadvertently reinforcing existing disparities rather than addressing them [51]. In finance, bias can occur in payment processes and revenue sharing partnerships. Potential social harm due to bias can also occur in biased AI recruitment tools and disparities in mortgage interest rates [52].

Algorithmic bias is another type of bias that can arise from various factors, including flawed solution design choices, incorrect ML model assumptions, or inaccurate system premises [51]. These negative factors can lead to undesirable consequences for individuals, users, and society, particularly in high-stakes applications. One notable case of algorithmic bias occurred when users began interacting with Microsoft’s Tay chatbot [52]. Designed to learn from conversations on Twitter, Tay quickly began repeating and amplifying racist and offensive comments made by users. Instead of addressing user queries meaningfully, the chatbot mirrored harmful language, revealing how easily machine learning systems can be manipulated when safeguards and content moderation are lacking. In another case, during the 2020 global pandemic, pulse oximeters were found to be less accurate in measuring blood oxygen levels in individuals with darker skin compared to those with lighter skin [53]. This discrepancy arose because the devices were primarily calibrated using data from lighter-skinned individuals, leading to potential underdiagnosis or delayed treatment for patients with darker skin. The incident highlighted how biases in medical device design and testing can result in unequal healthcare outcomes.

Various real-world examples highlight the profound implications of bias. In healthcare, for instance, ML model bias has led to unequal access to care for marginalized groups [9]. In criminal justice, ML-based systems like COMPAS have faced public criticism for disproportionately labeling African-American defendants as high risk for recidivism [9]. Similarly, AI in finance has been shown to reflect socioeconomic and demographic biases, affecting ethical decisions in applications such as loan approvals and insurance pricing [26]. These biases often affect high-risk customers or customers that are prone to discrimination, leading to operational inefficiencies and even financial losses [26].

Mitigating bias requires a proactive, interdisciplinary approach, starting with the curation of diverse and representative data, followed by the development of fairness-aware algorithms, rigorous validation processes, and transparent governance frameworks, such as those outlined in the Algorithmic Accountability Act [51]. Without these measures, AI stakeholders risk undermining the transformative potential of AI to create equitable solutions.

Ensuring fairness in AI systems requires addressing both technical and societal complexities. It involves minimizing the negative impact of bias and discrimination while balancing ethical considerations with ML model performance. However, AI systems are often trained on imperfect data that inherently contains biases; while fairness metrics provide frameworks for assessing and mitigating bias, they often oversimplify complex societal dynamics and fail to account for intersectionality—that is, the complex interactions between various factors that can lead to discrimination and disparities [9]. Concerns about amplifying existing healthcare inequities remain significant, as AI models have been shown to encode sensitive patient characteristics, such as ethnicity and sex, leading to performance disparities among different patient subgroups [54]. The COMPAS system, originally shown to be biased against African Americans in predicting recidivism risk, was later revised using a “race-neutral” algorithm in an attempt to reduce bias without compromising accuracy. In recruitment, gender decoding tools have been employed to remove biased language from job postings, helping promote equal opportunities for women in traditionally male-dominated roles [9]. In finance, algorithmic credit scoring models have at times been found to disadvantage minority applicants due to biased historical data [55]. Similarly, predictive policing tools have raised concerns for disproportionately targeting communities of color, reinforcing systemic inequalities in criminal justice.

Mitigation strategies, such as pre-processing, in-processing, and post-processing techniques, have been developed to incorporate fairness into ML algorithms. Pre-processing involves modifying datasets before training to reduce bias, such as by balancing representation across demographic groups or removing sensitive attributes; post-processing, on the other hand, refers to adjusting the algorithm’s outcomes after a decision has been made, aiming to ensure fairness in the final predictions without altering the model itself [56]. Additionally, tools like IBM’s Fairness 360 Toolkit (IBM, Armonk, NY, USA) represent efforts to embed fairness metrics into AI systems [56]. The toolkit provides a comprehensive suite of metrics to assess biases in datasets and models and includes mitigation algorithms to reduce such biases. It supports various stages of the machine learning pipeline, helping developers identify and address fairness concerns across diverse use cases. Similar tools include Google’s What-If Tool and Microsoft’s Fairlearn. The What-If Tool is a visual interface for analyzing machine learning models without writing code, allowing users to explore how changes in input data affect predictions and to evaluate fairness metrics across different subgroups. Fairlearn, developed by Microsoft, provides a suite of fairness-focused algorithms and dashboards that help developers identify, assess, and mitigate disparities in model performance across demographic groups. However, these tools face limitations, as they primarily focus on correcting data or algorithmic processes without addressing the underlying sociotechnical structures.

Moreover, the inherent trade-offs between fairness and performance complicate these efforts. Removing biased features can reduce discrimination but may also degrade the prediction accuracy of ML models if those features are correlated with relevant information. These trade-offs emphasize the challenge of achieving fairness without compromising performance and utility. Addressing these challenges requires not only well-designed technical solutions but also a deeper understanding of the socio-ethical implications of AI usage in specific use cases in order to ensure equitable and just outcomes.

3.3. Privacy and Security

The rise of data-intensive applications powered by AI has led to significant advancements but has also brought privacy and security concerns. These challenges arise in domains such as healthcare, finance, and personalized marketing, where sensitive data is frequently used to train and deploy AI models.

AI systems often require vast amounts of data to function effectively, posing inherent risks to privacy. Common privacy issues in the context of AI-based systems include data collection and consent, data minimization, data storage and retention, anonymization and de-identification, surveillance and tracking, third-party data sharing, and data breaches. Data collection in AI systems can be performed on a large scale, often without the consent of individuals, users, or customers. The lack of informed consent or unclear data usage policies can violate privacy rights [18]. Additionally, AI systems may collect more data than necessary, violating the principle of data minimization [57]. Furthermore, the extensive data collection typically required by AI systems increases the risk of unauthorized access or misuse when large volumes of personal data are stored for extended periods of time. Therefore, implementing proper data retention policies is essential to mitigate these risks [57].

When it comes to data anonymization, anonymization techniques are commonly used to protect privacy, but they are not always sufficiently robust and reliable [57]. In some cases, data can be re-identified (a process known as data re-identification), leading to privacy violations [57]. AI-driven services, which rely on vast amounts of data, can enable pervasive surveillance or tracking of user activities, possibly resulting in privacy breaches without individuals’ knowledge or consent [18]. Additionally, AI systems often involve third-party data sharing, which can unintentionally expose personal information and create privacy risks [18]. Similarly to traditional IT systems, AI-based systems are data-driven and vulnerable to hacking and data breaches, potentially exposing sensitive personal data to unauthorized parties [17]. Between 2005 and 2019, over 249 million healthcare records were compromised in the US [17].

Various types of attacks can occur when privacy and security are compromised. One such attack is re-identification, where anonymized data is cross-referenced with external datasets to identify individuals [18]. Techniques such as probabilistic frameworks and adversarial ML have demonstrated high success rates in breaching privacy, even with datasets considered secure. For instance, in healthcare, data from wearable devices has been used to achieve re-identification accuracy rates exceeding 60% [17]. Another significant privacy risk arises from membership inference attacks, which determine whether specific data points are part of the training set, exposing sensitive information and violating confidentiality [17]. This is particularly concerning in medical datasets containing patients’ records.

AI models are also susceptible to model inversion attacks, in which adversaries can reconstruct private data attributes by exploiting the model’s outputs. These attacks showcase the vulnerability of even de-identified datasets, highlighting the need for strong privacy-preserving mechanisms [17]. In addition, AI systems can face security challenges coming from adversarial attacks such as poisoning and testing-time attacks [58]. In poisoning attacks, an adversary injects malicious data into the training dataset, which disrupts the model’s learning process and leads to incorrect, biased, or harmful outcomes [17]. In training-time attacks, adversaries manipulate the input data provided to the AI system during operation, often by adding noise or subtle perturbations, to cause the model to produce inaccurate predictions [17]. Both types of attacks exploit vulnerabilities in the AI system during its productive use without requiring changes to the underlying ML model.

Another challenge in achieving trustworthy AI is balancing privacy preservation with data utility. Techniques such as differential privacy, federated learning, and homomorphic encryption are designed to protect sensitive information. However, these methods often involve trade-offs, such as reduced accuracy or computational efficiency.

Ensuring privacy and security in AI systems goes beyond technical solutions; it requires a strong ethical framework and strict adherence to regulatory standards. Compliance with privacy laws, such as the GDPR, demands transparency and comprehensive data governance practices. Ongoing innovation and interdisciplinary collaboration are crucial to tackling the ever-evolving challenges in this domain.

3.4. Balancing Competing Trust Components in AI Systems

Achieving trust in AI systems requires careful consideration of the trade-offs between key components such as privacy, transparency, fairness, and efficiency. These attributes often present competing demands, making it challenging to balance them simultaneously. For example, privacy and transparency can be inherently conflicting [59]. Ensuring privacy requires robust measures like encryption and anonymization to protect sensitive data. However, these measures can limit the amount of information disclosed to relevant stakeholders, which is an essential prerequisite for transparency. On the other hand, improving transparency is often based on introducing insights into decision-making processes, which may expose sensitive information and lead to privacy violations.

Another key trade-off exists between fairness and efficiency. Achieving fairness means to design algorithms that minimize biases, a process that often introduces additional complexity and resources as well as more efforts for proper ML model training and validation; while these measures enhance fairness, they can reduce the overall efficiency of the system, particularly in real-time applications. Similarly, balancing robustness and efficiency presents challenges, as methods to improve robustness typically require greater computational resources [60].

Navigating these trade-offs requires holistic solutions based on a strategic, context-specific approach. Hybrid models that integrate privacy-preserving techniques with explainability, such as federated learning combined with interpretable surrogate models, can help mitigate the privacy–transparency conflict [14]. Fairness-efficiency trade-offs can be addressed by using lightweight algorithms or fairness constraints. Lightweight algorithms are computational methods designed to be resource-efficient, requiring minimal memory and processing power, making them suitable for environments with constrained computational resources. Fairness constraints, on the other hand, are mathematical or algorithmic mechanisms integrated into AI models to ensure equitable outcomes across different demographic groups, mitigating biases while maintaining system accuracy [14]. By thoughtfully addressing these trade-offs, AI systems can achieve a more sustainable balance between AI trust components, fostering user confidence while maintaining high performance and ethical integrity.

3.5. Accountability and Ethical Implications

Achieving accountability in AI systems is a multifaceted challenge due to the intricate nature of modern ML models. Accountability is defined as the ability of stakeholders to justify the decisions and actions of AI systems and ensure they align with ethical principles and societal norms [19].

The architecture of AI systems often involves multiple layers of abstraction, complex operations, and integration of diverse datasets, all of which can impact the proper functioning of the systems, including their accountability and ethical implications. Another aspect that can influence the operation of AI systems is the shift in the distribution of input data, which directly impacts ML model performance [61]. Such changes, along with the complexity of underlying operations, can negatively affect the AI systems. Therefore, more rigorous testing, continuous monitoring, and thorough documentation are needed. Maintaining accountability is further complicated by the diverse roles and responsibilities of stakeholders throughout the AI lifecycle, which can result in fragmented accountability and ambiguous stakeholder roles, potentially overlooking critical aspects of governance.

AI systems can produce unintended outcomes that raise ethical concerns, with bias and discrimination being among the most prominent issues. Moreover, the potential misuse of AI technologies for malicious purposes presents ethical challenges that extend beyond individual systems; while individual systems may be designed with fairness, transparency, and safety in mind, broader concerns arise when AI is deployed at scale or integrated across platforms. These systemic risks highlight the need for oversight not just at the model level, but also at the societal and infrastructural levels where AI technologies interact and amplify one another. Ensuring accountability in these contexts requires governance frameworks that anticipate and mitigate risks before they materialize. Ethical AI governance must also address broader issues such as informational self-determination—the principle that individuals should have the autonomy to decide how their personal data is collected, used, and shared [62]. This concept emphasizes the importance of consent, transparency, and individual control in the digital age, where vast amounts of personal information are routinely processed by AI systems. Ensuring informational self-determination helps protect privacy, prevent misuse of data, and uphold human dignity in increasingly data-driven societies.

Additionally, the principles of autonomy and informed consent are essential, particularly in contexts where individuals are subject to AI-driven decisions without a clear understanding of how those decisions are made [61]. When algorithms influence outcomes in areas such as healthcare, hiring, or credit scoring, individuals must be able to meaningfully consent to the use of their data and have the right to understand and challenge automated decisions. Upholding these principles ensures that people retain agency and dignity in interactions with AI systems.

3.6. Interactions and Trade-Offs Between Trustworthiness Metrics

While trustworthiness in AI is often evaluated through multiple dimensions—such as accuracy, robustness, explainability, fairness, and privacy—these metrics do not always evolve in harmony. In real-world applications, improvements in one dimension may introduce trade-offs or even degrade performance in others. Recognizing and analyzing these tensions is crucial for developing a realistic understanding of what trustworthy AI entails in practice.

For example, enhancing accuracy or reliability through model complexity (e.g., deeper neural networks or more exhaustive training datasets) may result in reduced explainability, as such models become more difficult to interpret. Similarly, improving robustness by training models to withstand adversarial inputs or out-of-distribution data may require injecting noise or altering model structure in ways that marginally reduce baseline predictive performance. Privacy-preserving techniques such as differential privacy often degrade model accuracy due to added noise, while efforts to ensure fairness may require rebalancing datasets or modifying outcomes in ways that reduce optimal performance for the majority class.

These metric tensions reflect deeper ethical and operational dilemmas. For instance, a highly accurate medical diagnosis model might sacrifice patient privacy due to the need for granular health data, or a fair hiring algorithm might obscure its internal logic in order to satisfy fairness constraints. Moreover, striving for perfect transparency could conflict with proprietary concerns on intellectual property protections.

Therefore, trade-offs between trustworthiness metrics should not be viewed as design flaws, but as an integral part of responsible AI development. The goal is not to optimize each metric in isolation, but to achieve a context-dependent balance that aligns with the system’s intended use, risk profile, and societal impacts. Future research and policy frameworks should explicitly account for these trade-offs, offering guidance on how to prioritize competing objectives based on domain-specific requirements and stakeholder values.

4. Metrics for Evaluating Trustworthiness

Evaluating the trustworthiness of AI systems helps ensure their reliability and fosters user confidence. These systems must exhibit high levels of accuracy, transparency, fairness, and privacy to gain widespread acceptance. Metrics play a important role in operationalizing these aspects, offering a structured framework to assess and validate AI systems across diverse applications. This section focuses on the key metrics used to evaluate AI trustworthiness. The metrics are summarized in Figure 3.

4.1. Accuracy and Reliability

Accuracy is a fundamental metric for evaluating the predictive performance of ML models in AI systems. Accuracy measures the proportion of correct predictions made by an ML model relative to the total number of predictions. In classification tasks, accuracy is calculated as the ratio of true positives (TP) and true negatives (TN) to the total number of samples in the dataset. For regression tasks involving continuous response variables, metrics such as Root Mean Square Error (RMSE) are commonly used. RMSE quantifies the magnitude of prediction errors, with lower values indicating greater predictive accuracy.

While accuracy-based metrics are crucial for evaluating the correctness of AI predictions, they represent only one facet of an AI system’s trustworthiness. For example, high accuracy does not guarantee reliability, which also includes consistency, robustness, and transparency. An AI system that performs well on one dataset may fail when exposed to unseen scenarios. Therefore, predictive accuracy alone is insufficient to assess the trustworthiness of an ML model.

Precision, recall, and the F1 score provide deeper insights into the performance of ML models by focusing on the quality of positive predictions. Precision measures the proportion of true positives (TP) among all predicted positives, aiming to minimize false positives (FP). Recall evaluates the model’s ability to identify all relevant instances, thereby reducing false negatives (FN). The F1 score offers a balanced measure of predictive performance by combining precision and recall, effectively accounting for both FP and FN. These metrics are particularly important in high-stakes applications where accurate and reliable predictions are crucial [63].

These metrics, however, fail to fully capture the multidimensional nature of trustworthiness of AI systems. The systems must also achieve robustness, interpretability and ethical compliance. For example, a high-accuracy ML model may lack explainability. In the domain of XAI, the challenge becomes even more pronounced, as standard accuracy metrics do not account for the quality of explanations or their alignment with human understanding [64].

The complexity of real-world applications further highlights the limitations of standard metrics. AI systems must operate reliably across varying conditions. Additionally, the lack of standardized benchmarks contributes to the fragmentation of the field. To address these gaps, comprehensive evaluation frameworks need to be developed. These frameworks aim to integrate traditional accuracy metrics with domain-specific measures of reliability, robustness, and transparency.

4.2. Robustness

Robustness is another critical metric for evaluating the trustworthiness of AI systems. It measures the ability of ML models to maintain consistent performance under diverse and often challenging conditions [65]. The purpose of robustness metrics is to assess the model resilience to anomalies and variations in input data, thereby playing a vital role in determining the reliability and safety of AI systems.

Adversarial robustness evaluates the ability of ML models to cope with deliberate, subtle, and non-random perturbations or modifications of input data specifically designed to deceive the model [63]. Metrics in this domain focus on measuring the magnitude of perturbations required to compromise the model’s performance.

$L_{0}$ norm: Measures the number of features altered in an input to cause misclassification.
$L_{2}$ norm: Represents the Euclidean distance between the original and perturbed inputs.
$L_{\infty}$ norm: Evaluates the maximum perturbation, highlighting the model’s sensitivity to extreme alterations.

Robustness provides a measurable level of stability. Specifically, techniques such as randomized smoothing compute bounds within which the model remains unaffected, offering quantifiable assurances of robustness [66]. Robustness also encompasses the model’s capacity to handle natural anomalies and unexpected inputs [64]. This aspect, referred to as natural robustness, includes metrics assessing [66]:

Out-of-distribution (OOD) detection: Evaluates the model’s ability to identify and respond appropriately to inputs that differ significantly from distribution of training data.
Noise tolerance: Measures the performance degradation resulting from the addition of random noise to the inputs.
Anomaly robustness: Assesses the model’s behavior when encountering rare data points.

Robustness evaluation employs empirical and formal methods [66]. Certified robustness evaluation methods extend these analysis, using techniques like deterministic smoothing and geometric transformations. Certified robustness evaluation methods refer to techniques that provide formal guarantees about the model’s resilience to perturbations or adversarial attacks [67]. Deterministic smoothing involves approximating the model’s output by averaging predictions over small perturbations of the input, providing a measure of how stable the model is to minor variations. This approach helps to identify potential vulnerabilities in the model by assessing how sensitive it is to small, controlled changes in input data. Geometric transformations, on the other hand, involve applying modifications such as rotations, translations, or scaling to the input data, simulating real-world variations and testing the model’s ability to maintain performance under these changes. These approaches ensure that robustness assessments are not only comprehensive but also applicable across various contexts.

Robustness is related to other metrics such as explainability and generalization, i.e., the model’s ability to perform on unseen scenarios, encompassing adversarial and non-adversarial contexts [66]. A robust model must maintain consistent interpretability and performance across varying scenarios. This interdependence highlights the importance of incorporating robustness metrics as a central component of a comprehensive framework for evaluating trustworthiness.

4.3. Explainability and Transparency

Explainability and transparency are key pillars of trustworthy AI, addressing the fundamental need for stakeholders to understand, verify, and trust the decision-making processes of ML models [68]. Metrics for evaluating explainability and transparency aim to assess how clearly the operations and outputs of AI systems can be understood by human users, ensuring that decisions are both interpretable and traceable.

Explainability refers to the ability of an ML model to provide insights into its decision-making process. This encompasses both global explanations, which offer a comprehensive understanding of the model’s behavior, and local explanations, which focus on individual predictions. Key metrics and methods used to evaluate explainability include the following:

Shapley values: Inspired by cooperative game theory, Shapley values quantify the contribution of each feature to a specific prediction. They ensure feature importance is fairly distributed across inputs by satisfying properties such as consistency and local accuracy [69,70]. The approach based on Shapley values is considered SOTA approach for ML model explainability.
Model-agnostic metrics: LIME is able to generate simple, interpretable model-agnostic explanations for explaining individual predictions by perturbing inputs and analyzing their effects on model outputs [69]. Feature importance scores are aggregated measures which provide a global view of feature contributions across predictions.
Visual explanations: For image-based AI systems, methods like Grad-CAM and Guided Backpropagation are employed. Grad-CAM is a class-discriminative localization technique that can generate visual explanations from any CNN-based network without requiring architectural changes or re-training. It computes the gradients of the model’s output (the predicted class score) with respect to the final convolutional layer of the neural network. These gradients are then combined with the feature maps themselves to create a heatmap. This heatmap highlights the regions that were most influential in the decision-making process. Guided Backpropagation refers to pixel-space gradient visualizations. It modifies the standard backpropagation algorithm by only allowing positive gradients to flow backward, effectively filtering out irrelevant or negatively contributing features. The result is a clear and detailed map of the pixels that had a positive influence on the prediction, providing an intuitive understanding of the model’s output [71]. These methods highlight the regions of an image that have a significant influence on the predictions [72].

Recent advancements in explainability emphasize designing systems that foster trust and usability by bridging the gap between human reasoning and AI decision-making processes. For instance, methods like GAMs and additive frameworks have demonstrated utility in medical and bio-informatics applications by ensuring models remain transparent without sacrificing accuracy [73].

Transparency measures the extent to which the architecture, data, and decision-making processes of AI systems are open and understandable to stakeholders; while explainability focuses on the interpretability of model outputs, transparency evaluates how accessible and comprehensible the system’s processes are. Transparency in AI is an integral part of regulatory frameworks, corporate policies, and ethical guidelines. The EU has been at the forefront of this effort, adopting an AI strategy in 2018 and creating the High-Level Expert Group on AI (AI HLEG) to provide recommendations on ethical issues and investment strategies [13]. The AI HLEG provided the conceptual and ethical groundwork that informed and shaped the legislative approach of the EU AI Act [74]. Transparency is also in the focus of global standards organizations, such as the IEEE’s P7001 initiative for autonomous systems [16] and the ISO’s examination of ethical and societal concerns in AI [13]. Legal frameworks like GDPR raise transparency-related debates, including the right to explanation for decisions involving automated processing. This topic is still debatable because it is characterized by ambiguities in legal interpretations and limitations of current explainable AI (XAI) techniques [13].

Despite significant advancements, ensuring explainability and transparency remains a challenge. Black-box ML models, particularly deep learning systems, often prioritize performance over interpretability. Additionally, trade-offs between simplicity and accuracy—where overly simplistic explanations fail to capture the complexities of advanced models—emphasize the need for more balanced approaches.

4.4. Fairness

Fairness ensures equitable treatment of individuals and demographic groups, addressing potential biases that can emerge from training data or ML models. The concept of fairness in AI is focused on mitigating these biases and ensuring that AI systems do not disproportionately disadvantage certain groups of people. In this subsection, we explore key fairness metrics and outline quantitative tools to assess and enhance fairness in AI models.

Statistical parity, or demographic parity, is a fairness metric that ensures equal proportions of positive outcomes across different demographic groups [9]. The goal of statistical parity is to eliminate disparities in AI-driven decision-making by giving individuals from all groups an equal chance of receiving favorable outcomes; while it is intuitive and straightforward to compute, it has certain limitations. For instance, it does not consider the underlying distribution of true outcomes among groups [75]. Additionally, achieving statistical parity may involve adjusting the model’s decision boundary, which could potentially reduce overall accuracy [75].

The Gini coefficient measures inequality in the distribution of predicted outcomes [75]. This coefficient is defined as a real number, with values ranging from 0 to 1. A Gini coefficient of 0 indicates perfect equality, while a value of 1 represents maximum inequality. It is calculated using the Lorenz curve, which plots the cumulative proportion of predicted outcomes against the cumulative proportion of the population. Although the Gini coefficient offers a useful summary of overall inequality, it does not identify specific groups or individuals who may be disproportionately affected [75].

In practice, achieving fairness often involves trade-offs with other metrics, like accuracy. For example, enforcing statistical parity may require sacrificing some degree of performance. Similarly, improving the Gini coefficient might require interventions that redistribute the model outcomes, raising ethical and practical concerns. Moreover, statistical parity and the Gini coefficient are context-dependent and may not fully capture the complexities of societal inequalities [9]. To address these challenges, fairness evaluations should be integrated with domain knowledge, ethical considerations, and input from relevant stakeholders.

4.5. Privacy

Privacy is a fundamental aspect of the trustworthiness of AI systems as it ensures that sensitive data is protected and used responsibly. Effective privacy-preserving techniques, such as differential privacy and federated learning, allow models to function while safeguarding individual privacy. These approaches ensure that AI systems comply with ethical standards and regulatory frameworks without compromising their performance or utility. By integrating robust privacy measures into the design and deployment of AI models, organizations can mitigate risks of data breaches, unauthorized access, and misuse, ultimately fostering trust among users and stakeholders.

Differential privacy is a robust mathematical framework to quantify the risk of individual data exposure in the dataset. It is designed to protect individual data privacy while enabling meaningful analysis of datasets. DP works by adding calibrated noise to the computations, ensuring that the inclusion or exclusion of a single data point has a minimal effect on the output [21]. This guarantees that individuals cannot be re-identified or have their data exposed based on the results of any analysis. In federated learning, DP is crucial for maintaining individual privacy across decentralized networks since it allows for the aggregation of model updates from multiple devices without sharing raw data [21]. This ensures that private information remains protected while still enabling the model to learn from a wide range of data sources.

DP is particularly valuable in generative AI (GenAI), where it prevents models from unintentionally memorizing sensitive information during training, such as personal or confidential data [57]. By ensuring that outputs do not reveal private details about the training data, DP mitigates the risk of data leakage in GenAI models. As a result, DP is considered the gold standard for privacy preservation in AI systems, offering a balance between privacy protection and model utility [57]. Its implementation has become essential in regulatory environments, such as those governed by GDPR, where protecting user data is a legal requirement.

Relevant privacy-preserving metrics include Privacy Risk Scores, Membership Inference and Attribute Inference Resistance, and Privacy Leakage Metrics, defined as follows:

Privacy Risk Scores: Evaluate the likelihood of re-identifying individuals from anonymized or aggregated data.
Membership Inference and Attribute Inference Resistance: These metrics assess the robustness of models against inference attacks.
Privacy Leakage Metrics: Evaluate the extent to which sensitive information can be extracted from a model’s outputs or parameters.

Federated learning (FL) is a privacy-preserving technique that enables decentralized model training by keeping data on local devices while sharing model updates. It is particularly useful in applications like healthcare [76]. Due to its built-in privacy features [17], FL is particularly useful in healthcare because it deals with challenges like isolated datasets, data sharing, etc. In the healthcare domain, patient data is often distributed across multiple hospitals or institutions, making centralized data collection difficult due to privacy regulations and ethical concerns. Federated learning enables these institutions to collaboratively train machine learning models without transferring sensitive patient data, thus preserving privacy while still benefiting from diverse and rich datasets. This approach not only enhances model performance through access to broader data but also ensures compliance with legal frameworks like HIPAA and GDPR.

Furthermore, FL’s privacy-preserving capabilities can be enhanced by combining FL with DP and homomorphic encryption (HE). HE allows computations to be performed on encrypted data, protecting sensitive information during processing [76]. It has been widely adopted in industries like healthcare and finance. Other notable privacy-preserving techniques include Secure Multi-Party Computation (SMPC), Privacy-Preserving Synthetic Data Generation, Data Masking and Anonymization, and Techniques to Prevent Unintended Memorization [76]. SMPC enables joint computations over private data without revealing inputs, making it valuable in finance and generative AI for secure collaborative model training [76]. Masking and anonymization techniques, such as k-anonymity, l-diversity, and t-closeness, safeguard personal data in AI training datasets. Data masking is the process of hiding data by modifying its original letters and numbers, while data anonymization is the process of protecting private or sensitive information by erasing or encrypting identifiers that connect an individual to stored data [76]. Techniques for Preventing Unintended Data Memorization include differential privacy, “goldfish loss” (which excludes random data subsets during training), and regularization techniques (e.g., dropout, weight decay). These efforts enhance privacy and reduce data leakage risks in AI outputs [76].

While privacy-preserving techniques have advanced significantly, challenges remain in balancing privacy with scalability. Techniques like DP and HE often introduce computational complexity, and combining multiple privacy mechanisms can be quite complex. Moreover, emerging threats like membership inference and data extraction attacks require continuous innovation to ensure robust privacy guarantees [21].

5. Frameworks and Standards for Trustworthy AI

The development of trustworthy AI systems relies on robust frameworks that establish relevant methodologies, guidelines, and principles for designing, developing, and maintaining such systems. In this section, we examine key frameworks created to enhance trust in AI and explain their underlying methodologies, assumptions, and practical considerations. We also provide a comparative analysis of these frameworks, focusing on their advantages and limitations, followed by a discussion on their application in different countries. Our aim is to demonstrate how these frameworks address the core aspects of trustworthy AI discussed in previous sections, such as fairness, transparency, accountability, and safety, and offer actionable insights for stakeholders deploying AI across various industries.

5.1. Overview of Major Frameworks

In this subsection, we provide a detailed overview of major frameworks for trustworthy AI. These frameworks are specifically designed to assist the development and deployment of AI systems while ensuring adherence to ethical principles, technical standards, and regulatory requirements. By addressing these key dimensions, the frameworks help promote responsible use of AI and foster trust among researchers, engineers, users, and policymakers.

5.1.1. NIST AI Risk Management Framework

The National Institute of Standards and Technology (NIST) AI Risk Management Framework (AI RMF) is a robust and comprehensive framework that offers organizations best practices for identifying, assessing, and managing AI-related risks [77]. Developed through extensive collaboration with diverse stakeholders, the AI RMF aims to ensure that AI systems are safe, trustworthy, and socially responsible. Its significance has grown substantially, particularly following the October 2023 US Executive Order on AI governance which called for a coordinated effort to drive the implementation of AI-related consensus standards, cooperation and coordination, and information sharing [78]. This further emphasized the framework’s important role in fostering the safe and secure development of AI technologies.

The primary goal of NIST’s AI RMF is to help organizations navigate the complex challenges of AI risk management across the entire AI system lifecycle. It provides a flexible and adaptable set of guidelines that can be tailored to diverse contexts, regardless of the application domain. It is also applicable to a broader set of AI technologies, making it an adequate choice for organizations seeking a comprehensive and scalable approach to AI governance.

This framework structures AI risk management activities into four interrelated core functions outlined as follows [23,79]:

Map: Emphasizes the identification and framing of AI-related risks. It encourages organizations to gather diverse perspectives. By framing risks holistically, organizations can better understand the potential consequences of their AI systems.
Measure: Involves evaluating systems through rigorous testing and monitoring. Organizations are encouraged to assess functionality, trustworthiness, and security both before deployment and throughout the operational lifecycle.
Manage: Focuses on implementing risk treatment strategies, which includes allocating appropriate resources to maximize AI benefits while minimizing negative impacts.
Govern: Provides the structural support needed for effective risk management. It encompasses establishing policies, procedures, and practices that foster a strong risk management culture.

One of the distinguishing features of NIST’s AI RMF is its emphasis on the sociotechnical approach to AI risk management [80]. It recognizes that AI harms often come from combining technical and social factors. To address these challenges, the framework advocates for meaningful stakeholder engagement. Its focus on risk measurement and policy-making makes the framework a solid foundation for developing maturity models which provide a structured approach to assessing AI systems’ trustworthiness. Such models help organizations systematically improve their AI practices through stakeholder engagement and predefined evaluation criteria. AI RMF can guide the continuous enhancement of AI maturity in areas such as fairness, transparency, and accountability [23,79].

Despite its flexibility, the NIST AI RMF faces certain limitations. One key issue is the framework lacking specific technical implementation guidance, making it difficult for organizations without strong AI governance expertise to implement its recommendations [80]. Additionally, the framework’s voluntary nature may hinder widespread and consistent adoption across industries. Furthermore, the framework’s broad applicability may lead to ambiguities in risk prioritization, especially in sectors with highly specialized risk profiles.

5.1.2. AI Trust Framework and Maturity Model (AI-TMM)

The AI Trust Framework and Maturity Model (AI-TMM) is another framework for evaluating and improving AI trustworthiness. It is a maturity model-based framework which addresses security, reliability, and ethical challenges in developing AI systems [57]. By focusing on vulnerabilities in the design, model training and implementation stages of AI, AI-TMM ensures that systems deliver reliable outcomes while maintaining stakeholder trust.

A distinctive feature of AI-TMM is its Maturity Indicator Levels (MILs), which measure the implementation and management of controls [57]. The framework’s process methodology is structured into five key steps, offering organizations a systematic approach to improving trustworthiness [57]:

Determine frameworks and controls: The AI-TMM specifies controls across seven key trust pillars, including explainability, data privacy, robustness and safety, transparency, data use and design, societal well-being, and accountability. These controls guide organizations in evaluating and improving AI trustworthiness by focusing on specific risk areas based on their priorities and available resources [57].
Perform assessments: Maturity levels are assigned to each control using the Maturity Indicator Level (MIL) methodology, which evaluates the extent to which an organization has implemented and institutionalized a given control. This involves assessing factors such as process consistency, documentation, integration into workflows, and continuous improvement efforts.
Analyze gaps: Identify and evaluate deficiencies and their impact. Gap analysis is performed by assessing identified deficiencies against organizational objectives, available resources, and potential risks. This includes determining the severity of vulnerabilities, prioritizing areas for improvement, and estimating the impact of unaddressed gaps on AI trustworthiness and operational outcomes.
Plan and prioritize: Address identified gaps, using cost–benefit analysis for effective resource allocation. Organizations should rank remediation activities based on factors such as risk severity, strategic alignment, and resource availability. Planning also involves setting clear timelines, defining roles and responsibilities, and ensuring alignment with broader organizational objectives and risk management strategies.
Implement plans: Use evaluation metrics to ensure consistent risk management and progress monitoring. Implementation involves integrating improvement activities into existing workflows, ensuring stakeholder engagement, and establishing feedback mechanisms for continuous refinement. Organizations should also document changes, track performance against defined metrics, and adjust plans dynamically based on emerging risks or shifting priorities.

AI-TMM evaluates AI systems across seven trust pillars: explainability, data privacy, robustness, safety, transparency, societal well-being, and accountability [57]. An innovative aspect of AI-TMM is its use of entropy-based metrics, such as structural entropy and entropy production, to assess and optimize AI systems. These metrics are particularly valuable for neural networks, where a neural network with high structural entropy might have randomly connected layers and nodes, leading to unpredictable behavior. By integrating such metrics, the framework facilitates the development of AI systems that are both technically robust and ethically aligned.

The modularity of AI-TMM allows organizations to adapt the framework to their unique needs, making it suitable for industries with varying objectives and constraints. By embedding ethical principles and sustainability practices into its methodology, AI-TMM fosters collaboration between humans and AI systems while promoting trust and resilience.

While the AI-TMM provides a robust methodology for measuring and improving AI trustworthiness, it has its own limitations. A major challenge lies in the framework’s reliance on self-assessment, which may lead to biased evaluations if external audits are not conducted. Moreover, the framework’s focus on organizational processes can overlook system-level harms emerging from AI interactions in broader social contexts.

5.1.3. ISO/IEC Standards

The International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) have been at the forefront of establishing frameworks for the development and governance of trustworthy AI systems. Their efforts address the growing demand for structured, ethical, and reliable AI solutions. Among their most significant contributions are ISO/IEC 24028 [81] and ISO/IEC 42001 [82], which provide important guidelines on trustworthiness and management of AI systems. The ISO/IEC TR 24028 standard describes an overview of AI trust issues. The standard aims to analyze the factors that influence the formation of trustworthy AI and the decisions it generates. The document presents well-known approaches that allow increasing confidence in technical systems and potential uses in AI systems. The ISO/IEC 42001 standard establishes a framework for organizations to implement, maintain, and continually improve an AI management system, ensuring ethical, secure, and transparent AI practices. These standards are designed to shape the landscape of responsible AI deployment across various sectors.

ISO/IEC 24028 focuses on the critical aspects of trustworthiness in AI systems, offering a detailed examination of the factors that shape and influence trust in AI technologies [83]. The standard include approaches that foster trust through key principles like transparency, explainability and controllability, while also addressing engineering pitfalls, threats and risks. It serves as a foundational framework for identifying vulnerabilities in AI systems and proposing mitigation strategies to counter these issues effectively. Key components of ISO/IEC 24028 include the following [83]:

High-level concerns: These concerns encompass general elements that are essential to building trustworthiness, such as safety, decision-making sustainability and system control.
Mitigation measures: The approach with mitigation measures is aimed at building trustworthiness, but it does not determine its concept, that is, the basic concepts on which trustworthiness is based.
Stakeholder expectations: This concept involves capturing and understanding the diverse perceptions, needs, and priorities of stakeholders, which shape the trustworthiness requirements for AI systems.

The standard emphasizes the need for human-centric principles, which are key in organizing and understanding the rapidly expanding field of AI. ISO/IEC 24028 lays the groundwork for fostering trust in AI systems, while acknowledging the evolving nature of the field and the necessity for continuous refinement of its framework.

ISO/IEC 42001 introduces a comprehensive management system framework with the purpose of the ethical and responsible development, deployment, and governance of AI systems [84]. Unlike ISO/IEC 24028, which focuses on trustworthiness, ISO/IEC 42001 adopts a holistic organizational perspective, providing guidance on managing AI-related risks and opportunities across the entire lifecycle of AI technologies. This standard includes the following core principles:

Ethical and trustworthy AI: This principle ensures that AI systems operate in a manner aligned with ethical norms and societal values.
Risk assessment and management: Proactive identification and mitigation of risks associated with AI systems.
Data governance: Establishing robust mechanisms for managing data quality, sources, and preparation.
Continuous improvement: Encouraging iterative refinement of AI systems and their governance frameworks to adapt to evolving needs and challenges.

ISO/IEC 42001 is structured around ten clauses that provide a comprehensive framework for AI management [84]:

Clause 1: Defines the boundaries and applicability of the ISO 42001 standard.
Clause 2: Refers to documents that are referenced in the text of the ISO 42001 standard in such a way that some or all of their content constitutes requirements of the standard.
Clause 3: Establishes common terminology used in the framework to facilitate consistent implementation of the standard across organizations.
Clause 4: Establishes the organizational context, requiring organizations to consider internal and external factors, stakeholder needs, and the scope of their AI systems.
Clause 5: Focuses on leadership, emphasizing top management’s role in fostering a culture of responsible AI through clear policies and commitments.
Clause 6: Covers planning processes, including addressing risks, setting AI objectives, and managing changes within the organization.
Clause 7: Details the support requirements, including resources, competence development, communication, and documentation.
Clause 8: Outlines operational planning and control, addressing implementation, impact assessments, and change management.
Clause 9: Addresses performance evaluation through monitoring, internal audits, and reviews of AI systems.
Clause 10: Emphasizes continuous improvement, including addressing non-conformities, corrective actions, and maintaining accountability.

Despite the detailed and structured nature of the ISO/IEC standards, they exhibit certain limitations. One concern is that these standards are slow to evolve, which can make them less responsive to emerging AI risks and technological developments. Additionally, these standards often emphasize compliance over innovation, which might suppress novel approaches to AI development.

In addition to ISO/IEC 24028 and ISO/IEC 42001, other related standards such as ISO/IEC 38507 [85], ISO/IEC 23894 [86], and ISO/IEC 25059 [87], among others, provide specialized guidance on AI governance [84]. These standards address the ethical, technical, and governance challenges associated with AI systems. ISO/IEC 25059 describes the quality model of AI systems, while ISO/IEC 38507 describes the consequences of using AI in organizations, whereas ISO/IEC 23894 provides guidance on how organizations can manage AI risk [88]. They enable organizations to align their AI initiatives with best practices, ensuring compliance with regulatory requirements and fostering stakeholder trust. As AI technologies continue to evolve, the ISO/IEC standards provide a vital foundation for responsible AI innovation. They balance the need for technological progress with societal expectations, ensuring AI systems contribute to a sustainable and equitable future.

5.1.4. KAIRI: Key Artificial Intelligence Risk Indicators

The Key Artificial Intelligence Risk Indicators (KAIRI) framework is a framework that offers a structured approach for AI risk management and is built upon four key principles: accuracy, sustainability, explainability, and fairness [89]. Each principle introduces practical metrics and methods for monitoring AI systems in development and deployment.

Accuracy measures how well an AI system’s predictions align with observed outcomes. KAIRI distinguishes between cases with continuous response variables, where metrics such as RMSE are employed, and categorical response variables, where metrics such as the Receiver Operating Characteristic (ROC) assess classification performance. These approaches are complemented by statistical tests such as the Diebold–Mariano and DeLong tests.

Sustainability focuses on the AI system robustness against data anomalies or adversarial manipulation, ensuring the system remains reliable under various conditions. For probabilistic models, KAIRI proposes backward variable selection based on likelihood-based statistical tests where the selection starts with all available variables, and then removes the unnecessary variables step by step [90]. For non-probabilistic models, it utilizes explainability metrics such as Shapley values to select variables, ensuring that the most relevant features contributing to model decisions are identified. This selection enhances interpretability, reduces model complexity, and mitigates the risk of overfitting. Statistical stopping rules ensure that model performance remains robust while reducing complexity.

Explainability is crucial for ensuring transparency and building trust among stakeholders. KAIRI leverages Shapley values for interpreting black-box models and employs statistical tests, such as t-tests, for assessing white-box models.

Fairness ensures that AI systems are free from biases that could disproportionately impact specific population groups. KAIRI adapts the Gini coefficient to evaluate the distribution of feature importance across different demographic groups. To assess statistical fairness, KAIRI uses non-parametric tests, such as the Kolmogorov–Smirnov test, to identify significant deviations from uniformity in feature effects.

KAIRI’s structured framework addresses critical AI risk dimensions through rigorous metrics and statistical validation methods. It complements existing frameworks for trustworthy AI by providing actionable indicators that can be applied to both probabilistic and non-probabilistic models. This makes KAIRI a valuable tool for organizations seeking to align their AI systems with ethical standards, regulatory compliance, and robust performance metrics.

Although KAIRI offers a rigorous, metrics-driven approach to AI risk management, its complexity can be a significant barrier to practical implementation. The framework primarily focuses on quantitative indicators, potentially underrepresenting qualitative dimensions of trust, such as user perception or organizational culture. Its focus on feature-level fairness and accuracy may also overlook systemic biases embedded in data collection processes or institutional practices, thereby limiting its comprehensiveness in real-world deployments.

5.2. Comparative Analysis

This subsection presents a comparative analysis of the key AI governance frameworks discussed earlier. The comparison highlights their strengths and limitations as well as contributions to enhancing trustworthiness in AI systems.

The main advantage of NIST’s AI RMF framework lies in its comprehensive and adaptable nature. Its adaptability stems from its risk-based, voluntary, and customizable approach, allowing organizations to tailor its implementation based on their specific AI use cases and risk environments. It is defined using a sociotechnical approach, combining both technical and societal factors into a holistic risk management strategy. Additionally, its focus on stakeholder engagement makes it well-suited for collaborative environments. The AI TMM framework is distinguished by the introduction of Maturity Indicator Levels (MILs) for structured evaluation and improvement. It also emphasizes seven trust pillars, offering a multidimensional perspective on trustworthiness. Its use of entropy-based metrics (or network optimization techniques) enhances technical robustness and ethical alignment. The ISO/IEC standards provide comprehensive guidance for trustworthiness and management systems in AI, addressing a broad spectrum of governance aspects while emphasizing human-centric principles to ensure alignment with societal values and regulatory requirements. The KAIRI framework offers actionable metrics and methods for monitoring AI systems, including metrics such as RMSE, ROC, and Shapley values. It adapts established statistical methods to assess and enhance trustworthiness, making it particularly valuable for organizations requiring rigorous, quantitative measures of AI performance.

Despite their differences, all three frameworks share key similarities: they prioritize risk management, trustworthiness, and governance in AI systems. Each framework provides structured methodologies to evaluate and mitigate AI-related risks while ensuring transparency, fairness, and accountability. Moreover, they all emphasize a multidimensional approach, integrating technical, ethical, and societal considerations to promote responsible AI adoption.

The NIST AI RMF’s flexibility, while advantageous, can lead to inconsistent implementations across organizations as it may require significant time and resources for stakeholder engagement and interdisciplinary collaboration. In contrast, AI-TMM’s complexity—particularly in applying MILs and entropy-based metrics—can pose challenges for organizations lacking advanced technical expertise. ISO/IEC standards often demand extensive compliance efforts and resources, which may present barriers for implementation across smaller organizations. Moreover, their general principles may need adaptation to effectively address specific use cases. The disadvantage of KAIRI lies in its heavy reliance on statistical and quantitative metrics, which can make it less accessible to non-technical stakeholders. Additionally, its focus on specific metrics may overlook broader governance and operational aspects.

Each of the analyzed frameworks offers unique benefits and significant contributions to fostering AI trustworthiness even though they differ in scope, methodology, and implementation focus. Table 1 summarizes the frameworks discussed and highlights key features of additional frameworks for easier comparison; while other frameworks offer insights into specific aspects of AI security or human-centered design, they focus more narrowly on particular phases of the AI lifecycle. Including them in the broader discussion could broaden the analysis, but the four selected frameworks provide a balanced representation of both high-level governance (e.g., NIST AI RMF, ISO/IEC) and practical, metrics-driven approaches (e.g., AI-TMM, KAIRI), thereby making them well-suited for a foundational exploration of AI trustworthiness.

5.3. Application of Frameworks in Real Scenarios

This section explores the application of major frameworks for trustworthy AI across countries and use cases. It provides specifics of how these frameworks are implemented differently depending on national legislations and technical contexts. Additionally, the section outlines key use cases for AI systems, highlighting the diverse ways these frameworks support the development of responsible and reliable AI in practice.

5.3.1. Usage of AI Regulation Frameworks in the US, Canada, and China

We provide a comparative analysis of AI regulation frameworks in the US, Canada, China, and the EU. The analysis highlights each approach reflects the governance priorities and societal values, while also identifying areas where these frameworks converge or diverge.

The EU AI Act is one of the most comprehensive legal frameworks for AI regulation, designed to balance innovation with safety, fairness, and fundamental rights [22]. It adopts a risk-based approach, classifying AI systems into unacceptable, high-risk, limited-risk, and minimal-risk categories, ensuring proportionate oversight. The Act aligns with existing EU laws such as the GDPR, Digital Services Act (DSA), and Unfair Commercial Practices Directive (UCPD), reinforcing AI governance within the broader digital ecosystem. A key aspect of the regulation is its extraterritorial applicability, requiring compliance from any AI system deployed in the EU, regardless of origin. With the European AI Office overseeing enforcement and governance, the Act aims to establish the EU as a leader in trustworthy and ethical AI development, setting a global standard for responsible AI regulation.

The AI regulatory framework in the US prioritizes innovation by minimizing regulatory barriers, fostering technological advancement and leadership in AI development [100]. The US National Security Commission on AI and the National Artificial Intelligence Initiative Act (NAIIA) emphasize a dual focus on economic growth and national security. This approach promotes rapid technological progress while ensuring strategic oversight in critical areas such as cybersecurity and defense [100]. However, the minimal regulatory approach may result in gaps in addressing societal impacts, such as bias, and could lead to uneven adoption of best practices across industries.

Canada’s introduction of the Artificial Intelligence and Data Act (AIDA) reflects the country’s commitment to addressing risks associated with AI systems [101]. The framework emphasizes risk assessment, data governance, and record-keeping. It integrates ethical considerations, promoting accountability and transparency in AI systems. AIDA’s effectiveness, however, is undermined by its exclusionary public consultation process, narrow scope, lack of specificity, and lack of independent regulatory enforcement and oversight [101]. The lack of an independent regulatory body raises concerns about oversight and enforcement.

China’s centralized approach to AI governance is based on its “New Generation Artificial Intelligence Development Plan”. This plan outlines the country’s ambition to become the global leader in the AI industry by 2030 [102]. The centralized model enables coordinated efforts in AI development and regulation. The strategic focus on standards and the establishment of regulations provides a foundation for future governance. However, the absence of specific laws and comprehensive enforcement mechanisms limits the effectiveness of China’s regulatory framework. Furthermore, the lack of detailed risk management strategies hinders the development of a robust framework for defining and addressing AI risks.

5.3.2. Surveillance Technology

Surveillance technologies, particularly facial recognition systems, have been under scrutiny due to their potential for misuse and privacy violations. Companies can promote transparency and trust by adhering to established AI frameworks. Clearview AI is an American facial recognition company, providing software primarily to law enforcement and other government agencies. Clearview AI’s adoption of NIST’s AI RMT highlights how the structured approach can address systemic risks in facial recognition systems [77]. Given that facial recognition is often criticized for biases, privacy violations, and potential misuse, Clearview AI’s implementation of fairness metrics, continuous algorithmic audits, and encryption standards demonstrates how AI risk management can address these issues. Clearview AI employed fairness metrics, performed continuous algorithmic audits, and implemented robust encryption standards. In addition, usage policies aligned with regulatory frameworks like the EU AI Act have been adopted [77]. This way, the organization’s systems have been aligned with societal values and risks have been reduced.

5.3.3. Military Aviation

AI frameworks, such as those derived from ISO/IEC 24028, have been used to certify autonomous systems in unmanned aerial vehicles (UAVs). These systems rely on AI algorithms for navigation, threat detection, and mission execution in dynamic environments [77]. By aligning with AI trustworthiness standards, developers can ensure these systems meet critical requirements for reliability, robustness, and transparency. For example, incremental learning capabilities have been incorporated into UAV control systems, enabling real-time adaptability to unforeseen scenarios, such as sudden weather changes or unexpected adversarial behavior [77]. This adaptability is achieved while maintaining compliance with safety protocols defined by the frameworks. Additionally, standards for hardware vulnerabilities, as outlined in ISO/IEC 24028, are used to assess the compatibility of AI algorithms with existing hardware. This alignment mitigates risks associated with hardware failures, thereby enhancing operational reliability. Furthermore, AI-powered decision-support tools for military pilots rely on frameworks to validate the accuracy and reliability of systems under diverse operational conditions. These tools process vast amounts of sensor data to provide real-time insights into battlefield scenarios, aiding in threat identification and mission planning. Frameworks ensure that the AI models used in these systems undergo rigorous testing in simulated environments. Moreover, considerations related to human factors, such as reducing overreliance on automation, are incorporated to maintain a balance between AI-assisted decisions and human judgment. These examples illustrate the role of certification frameworks in establishing AI trustworthiness within military aviation, ensuring that AI systems meet the rigorous safety and operational requirements demanded in this domain.

5.3.4. Credit Scoring

Shapley values have been used in credit scoring systems as a promising approach to address interpretability challenges associated with ML models [10]. Shapley values enhance transparency in domains where regulatory compliance and stakeholder trust are essential.

In this section, we analyze the real-world application of Shapley values on two datasets: the Taiwan Credit Card Dataset and the Home Credit Dataset, both of which have been explored in prior research [10]. These datasets were selected for their relevance in assessing credit risk and model interpretability. The first dataset serves as an example of how Shapley values can improve interpretability and decision-making processes in credit scoring. By replacing traditional logistic regression parameters, Shapley values provide interpretable and actionable insights for credit score practitioners. The credit scores derived from Shapley values closely align with logistic regression-based scores, effectively identifying predictor variable bins that contribute to lower credit scores. In the second dataset, Shapley values are applied to analyze and interpret predictions from gradient boosting models. Through advanced feature engineering, the predictor variables are expanded significantly, improving both model accuracy and interpretability.

The adoption of Shapley values has several benefits, including enhanced interpretability, improved accuracy and performance of ML models, as well as increased fairness and transparency [10]. As ML models replace traditional methods, the effectiveness of Shapley values shows promise for broader use in credit scoring and other areas, with interpretability and actionable outputs being key to success.

6. Applications of Trustworthiness Metrics: Case Studies

The practical implementation of trustworthiness metrics in AI systems spans various sectors. This section explores real-world applications where transparency, fairness, accountability, and robustness play a crucial role in ensuring ethical and effective AI deployment. By examining case studies across different sectors, we illustrate how these metrics are used to address critical compliance concerns, such as bias and interpretability, among others. These examples highlight the potential of trustworthy AI and the need for context-specific solutions to foster trust and reliability across diverse domains.

6.1. Financial Sector

AI is increasingly integrated into the financial sector, including banking and investment. This integration requires adherence to ethical standards and regulatory frameworks, while also improving risk assessment, model testing, and investment strategies. The financial landscape is evolving as a result of AI’s practical impact and transformative potential [30].

Fairness in finance, especially in credit scoring, can be evaluated through criteria like independence (ensuring equal classification rates across groups), separation (equal error rates), and sufficiency (equal probability of correct classification) [103]. In risk management, AI has significantly enhanced credit scoring by using machine learning models, such as logistic regression and ensemble learning, to assess creditworthiness [104]. Advanced models are also used to predict financial distress and prevent potential losses. AI techniques help forecast bankruptcy and mitigate associated risks. In wealth management, AI-driven solutions offer personalized investment advice and portfolio management, enhancing customer experience and satisfaction [104].

6.1.1. SAFE AI in Finance

EU’s AI Act aims to regulate AI applications by categorizing risks and setting requirements such as accuracy, fairness and explainability. However, it previously lacked standardized metrics for measuring these aspects, which has been addressed more recently. To address the gap and offer robust measurement tools for AI safety, the study in [105] proposes the so-called SAFE metrics, sustainability, accuracy, fairness, and explainability (called SAFE metrics), unified under the Lorenz curve framework.

The study [105] analyzes cryptocurrency time series data from focusing on bitcoin prices from the Coinbase exchange. Explanatory variables include oil, gold, and exchange rates USD/Yuan and USD/Eur. The goal is to apply the Lorenz Zonoid tool to assess the SAFE metrics of AI methodologies. The Lorenz Zonoid tool is a concept from the field of mathematics and statistics, often used in the context of inequality measurement and distributional analysis. It derives from the Lorenz curve, which is a graphical representation of income or wealth distribution. The Lorenz curve shows the proportion of total income or wealth earned by the bottom x% of the population. The Lorenz Zonoid extends this idea to analyze inequalities in more complex spaces, typically in multidimensional contexts like risk or uncertainty analysis in various fields, including AI and economics. An exploratory analysis of bitcoin prices, compared with other financial variables, reveals significant volatility in bitcoin relative to other assets. A neural network with five hidden layers was used to build and evaluate the model [105]. The SAFE metrics framework enhances the transparency and trustworthiness of AI models by assessing four key dimensions: explainability, accuracy, sustainability, and fairness. In this study, explainability analysis identified gold as the most influential factor in predicting Bitcoin prices, helping stakeholders understand the model’s decision-making process. Higher fairness and sustainability scores indicate that the model avoids biased predictions and ensures long-term reliability. The analysis found an explainability score of 0.5714, with gold being the top contributor. The model’s accuracy was 0.328 and sustainability scored 0.8314, while fairness was 0.8617. For binary responses, accuracy improved to 0.4088, although explainability decreased.

The SAFE metrics offer a robust framework for evaluating the trustworthiness of AI systems, aligning with the European AI Act’s regulatory framework. Compared to traditional metrics such as RMSE and AUC-ROC (Area Under the Receiver Operating Characteristic Curve), which primarily assess predictive performance, SAFE metrics offer a holistic view of model trustworthiness by incorporating ethical and regulatory considerations. This aligns closely with the European AI Act, which emphasizes AI transparency, fairness, and risk mitigation. Unlike RMSE, which measures the average error magnitude, and AUC-ROC, which evaluates classification performance, SAFE metrics provide interpretable, normalized assessments of AI behavior, making them particularly useful in high-risk applications such as financial forecasting and healthcare.

6.1.2. Handling Sensitive Financial Data

In financial technology (FinTech), handling sensitive financial data securely and transparently is critical to ensuring regulatory compliance, mitigating risks, and fostering trust among stakeholders. AI-driven approaches play a vital role in achieving these objectives by enhancing explainability, fraud detection, and transaction efficiency [32].

One of the key challenges in financial risk management is the transprency of AI models used in credit risk assessment. The EU Horizon2020 FIN-TECH project implemented explainable AI (XAI) techniques to address concerns about black-box AI models [32]. A crucial component of this initiative was the use of SHAP values to enhance model interpretability. By decomposing the contribution of individual variables to AI-driven decisions, SHAP clustering allows for improved decision transparency, customer segmentation, and regulatory compliance. AI-driven credit scoring models, when combined with post hoc explainability techniques, can provide both local and global explanations for risk assessments [32]. These insights are crucial for financial institutions, enabling them to validate model behavior, identify biases, and refine risk assessment strategies.

Handling sensitive financial data also involves securing transactions against fraud while optimizing operational efficiency. AI-driven database systems enhance fraud detection and transaction management by leveraging ML and deep learning, anomaly detection, NLP, and Predictive Analytics and Automated Processing [106].

By integrating AI-driven fraud detection mechanisms with XAI-based credit risk assessments, financial institutions can develop robust, trustworthy, and efficient frameworks for managing sensitive financial data. These advancements not only improve security and transparency but also align with ethical AI principles, fostering responsible financial innovation [106].

6.1.3. Trustworthy AI in Financial Risk Management

The study in [32] demonstrates the application of XAI techniques to enhance transparency, accountability, and trustworthiness in credit risk management. The primary objective is to bridge the explainability gap in AI-driven financial models by providing insights into how decisions are made, ensuring regulatory compliance and improving risk assessment strategies. SHAP is employed to identify and quantify the contributions of input variables to model predictions, enabling financial institutions to interpret AI-driven credit risk assessments. By analyzing SHAP values, risk managers can better understand which factors drive credit approval or denial, facilitating more transparent decision-making. Additionally, dimensionality reduction techniques such as Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) can be used to explore SHAP value distributions, helping visualize patterns in model explanations. These visual insights assist in detecting biases, ensuring fairness, and uncovering vulnerabilities in risk models.

In financial risk management, segmenting customers based on SHAP-derived decision patterns rather than raw input features allows for better risk stratification. This methodology enables financial institutions to identify sub-portfolios where simpler, more interpretable models can be deployed without compromising decision accuracy. Furthermore, clusters exhibiting similar SHAP distributions help detect cases where credit risk models might be inconsistent or exhibit biases, such as groups with an equal proportion of default and non-default predictions. Such inconsistencies may indicate model weaknesses that require further refinement.

By documenting variable contribution through SHAP analysis, financial institutions can ensure traceability in decision-making processes, aligning with regulatory requirements for accountability and explainability. This supports compliance with ethical guidelines and emerging regulations on trustworthy AI. Moreover, identifying how different demographic groups are treated by AI-driven models enhances fairness assessments, helping mitigate risks related to biased decisions. Therefore, leveraging SHAP for financial risk management not only increases model interpretability, but also strengthens regulatory compliance, ethical governance, and trust in AI-driven decision-making. By integrating explainability techniques into credit risk assessment, financial institutions can improve transparency, reduce systemic risks, and foster greater confidence among stakeholders.

This study in [32] demonstrates the application of XAI to enhance transparency, accountability, and trustworthiness in credit risk management, focusing on SHAP clustering as a core methodology. The primary objective of this work is to bridge the explainability gap in AI systems by developing tools that allow stakeholders to understand, validate, and improve model decisions.

SHAP is used to identify and quantify the contributions of input variables to model predictions. Clustering techniques, such as Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and graph theory are applied to SHAP values. Data points are grouped into clusters based on similar SHAP value distributions, representing homogenous decision-making structures. Intersections between clusters identify regions of fuzzy decision-making, highlighting vulnerabilities.

SHAP clustering allows for the segmentation of customers based on decision-making behaviors rather than input features, which enables the identification of sub-portfolios where simpler, more interpretable models can be deployed for enhanced explainability. Clusters with equal proportions of default and non-default predictions can indicate potential bugs or inconsistencies in the model. The whole methodology bridges AI-driven decisions with human narratives by providing numeric contributions of input variables. This fosters a better understanding of individual and group-level decisions, enabling more informed oversight and governance.

SHAP clustering supports the documentation of variable contributions, enabling traceability of decision-making processes, which aligns with regulatory requirements for accountability and explainability. Fairness is considered by revealing how different groups are treated by the model. This supports compliance with ethical guidelines and emerging EU regulations on trustworthy AI. Finally, clear explanations to stakeholders enhance understanding and trust in the system.

In conclusion, this methodology addresses critical concerns about black-box models, enhances regulatory compliance, and supports the ethical deployment of AI in finance.

6.2. Healthcare

The integration of AI in healthcare has significantly improved disease diagnosis, treatment planning, and patient outcomes by enhancing accuracy and efficiency. AI technologies, such as machine learning and Predictive Analytics, are now used to detect diseases earlier, personalize treatments, and optimize hospital operations. However, despite these advancements, trust in AI remains a critical concern. Regulatory bodies like the US Food and Drug Administration (FDA) and the EU are working to ensure AI systems are ethically designed, robust, and legally compliant [107]. These efforts emphasize the importance of trustworthiness in AI, particularly in high-stakes environments like healthcare, where decisions directly impact patient well-being. Addressing the challenges of ensuring AI’s reliability and transparency requires the development of comprehensive evaluation frameworks that assess not only technical performance but also ethical and legal considerations.

6.2.1. Out-of-Hospital Cardiar Arrest (OHCA) Detection

In Copenhagen’s Emergency Medical Dispatch Center, 25% of out-of-hospital cardiac arrest (OHCA) cases are unrecognized until paramedics arrive. This delay undermines the timely provision of critical interventions like cardiopulmonary resuscitation (CPR), thereby lowering survival rates. To address this gap, the Center employed an AI system that assists dispatchers in identifying cardiac arrests during emergency calls [108]. The system uses ML to analyze audio data and flag indicators of cardiac arrest, such as unconsciousness and abnormal breathing patterns.

The AI system employs a Danish language model to transcribe emergency call audio. These transcriptions are analyzed by a classifier trained to detect OHCA-related patterns. The model was trained on archived call data, allowing it to learn nuanced conversational cues of cardiac arrest. The development and deployment of this system highlighted several technical and ethical challenges, such as false positives, language barriers, and human–AI interaction. Its performance was assessed through a retrospective study and a randomized clinical trial.

The system’s ethical and technical dimensions were evaluated using sociotechnical scenarios and frameworks such as the EU’s Guidelines for Trustworthy AI. The key conclusions included the following:

Transparency and explainability: The “black-box” nature of the AI model hinder dispatcher understanding, reducing trust.
Bias mitigation: The system underperformed for non-Danish speakers and patients with heavy dialects, which prompts the use of diversified training data to prevent discrimination.
Accountability and oversight: Stakeholder involvement was critical to addressing ethical tensions, including balance between human oversight and AI autonomy.

Despite these challenges, the use case described in [108] demonstrates the potential of AI to enhance decision-making in time-critical scenarios; while limitations such as transparency, bias, and accountability must be addressed, the system has already shown promise in improving early detection rates. By refining training data, incorporating feedback from dispatchers, and ensuring ethical oversight, AI can serve as a valuable support tool in emergency response. Key lessons include the need for robust evaluation methods, interdisciplinary collaboration, and ongoing refinement to address ethical, technical, and legal challenges.

6.2.2. Prediction of Parkinson’s Disease Progression

In [109], Federated XAI (Fed-XAI) approaches are used to predict Parkinson’s Disease (PD) progression, emphasizing the trade-offs between privacy, accuracy, and interpretability. The case study leverages the Parkinson Telemonitoring dataset, encompassing 5875 biomedical voice recordings from 42 early-stage PD patients, to predict the Unified Parkinson’s Disease Rating Scale (UPDRS) score.

The study simulates a cross-silo (a federated learning setting where multiple independent institutions collaborate without sharing raw data) federated learning environment across ten hospitals to mimic diverse real-world data distributions scenarios. Four data partitioning strategies are explored: Independent and Identically Distributed (IID), Non-IID Quantity (NIID-Q), Non-IID Feature (NIID-F), and Non-IID Feature and Quantity (NIID-FQ) [109]. A common external test set comprising 588 instances is used for evaluation, ensuring that the test data reflects the overall population distribution. Three learning paradigms are examined to assess their applicability to trustworthiness and explainability:

Federated learning (FL): Hospitals collaboratively train a global model without sharing raw data, preserving patient privacy.
Local learning (LL): Each hospital trains its model independently, leading to suboptimal accuracy due to limited data.
Centralized learning (CL): Data from all hospitals is centralized, achieving the highest accuracy at the cost of privacy.

To improve explainability, the study employs two models: a Multi-Layer Perceptron Neural Network (MLP-NN) and a Takagi–Sugeno–Kang Fuzzy Rule-Based System (TSK-FRBS). These models are evaluated using RMSE and Pearson Correlation Coefficient (PCC). The key finding reveals that FL outperformed LL, especially in non-IID scenarios, by leveraging collaborative learning. However, FL’s performance was slightly lower than CL due to data fragmentation, though privacy was preserved. In terms of model-specific insights, MLP-NN demonstrated superior accuracy on feature-skewed data (NIID-F), as indicated by RMSE, while TSK-FRBS showed stronger correlation metrics and maintained better interpretability.

This case study highlights the potential of Fed-XAI to tackle trustworthiness challenges in AI applications. By integrating federated learning with interpretable models, Fed-XAI ensures privacy preservation (as federated learning avoids raw data sharing, adhering to data protection regulations), explainability (TSK-FRBS’s intrinsic interpretability promotes transparency, building user trust), and scalability (the approach is adaptable to diverse data distributions). These features demonstrate that Fed-XAI can offer improved explainability solutions for sensitive healthcare tasks. The insights gained are crucial for designing future AI systems in fields that demand stringent ethical standards and regulatory compliance.

6.3. Public Administration

The integration of trustworthy AI into public administration holds promise for improving governance by fostering transparency, efficiency, and citizen engagement. This section explores various use cases where AI technologies, when designed and implemented without ethical considerations, can affect services in the public sector. By examining examples from public employment services, debt recovery, criminal justice, and predictive policing, we emphasize that trustworthiness is crucial when it comes to developing AI-based systems intended for use by citizens.

6.3.1. Public Employment Services in Sweden

AI systems can be used to improve national policy related to public employment services (PES), with the goal of bringing jobseekers and employers together in an effective manner and to contribute to long-term employment. The study in [110] examines the AI-based decision-support system BAR, which has been deployed to support labor market policy assessments in Sweden. Introduced in 2021, BAR is a neural network trained on historical data from over 1.1 million jobseeker profiles collected over a decade. The system evaluates jobseekers based on personal characteristics, unemployment history, and socioeconomic factors associated with their postal areas. Its recommendations assist caseworkers in assessing jobseekers, guiding them toward suitable opportunities, and helping them overcome employment barriers.

The AI system achieves a 68% accuracy rate, which is an improvement over random assignment (57%) but falls short of a naive baseline where all jobseekers are assumed to need support (82%). Sub-population analysis reveals notable disparities: while jobseekers with disabilities experience higher accuracy, they also face an increased rate of false positives. Conversely, younger individual have lower accuracy but encounter fewer false positives. These inconsistencies highlight fairness challenges, particularly concerning age and disability groups. Beyond accuracy, the study highlights gaps between theoretical ambitions and practical outcomes, particularly regarding AI trustworthiness in decision-making. Although caseworkers consider qualitative factors such as motivation and competence, there is limited evidence that these human inputs have led to improved overall outcomes. This suggests that human expertise is not fully leveraged in combination with AI insights, raising concerns about the system’s effectiveness in real-world decision-making.

Given the neural network architecture of the BAR system, the Swedish PES used LIME, given its popularity, in order to achieve a certain degree of explainability; while LIME can provide insights into the model predictions, its drawback is the lack of stability as it can generate different explanations for the same prediction. In addition, the authors of [110] state that methods like LIME are approximate, meaning that explanations are not always faithful with respect to the outcomes that they are supposed to explain. LIME’s explanations are approximate and do not always faithfully reflect the model’s actual decision logic [110]. This can lead to contradictions where factors highlighted as important by LIME might not align with the predicted outcomes. Such ambiguity undermines the trustworthiness of AI recommendations, particularly for caseworkers and jobseekers who rely on those explanations to understand employment support decisions. A possible solution, as demonstrated in Denmark, is integrating self-reported jobseeker assessments into the decision-making process. This approach could help align AI recommendations with individual jobseekers’ needs while maintaining operational consistency.

To addresss the identified trust and fairness challenges, several measures could be considered [110]:

Enhance model interpretability: Transition to simpler, interpretable models that maintain predictive performance could improve stakeholder trust and understanding. Providing visual explanations or feature importance rankings can further help users grasp how decisions are made.
Refine stakeholder engagement: Actively involving caseworkers and jobseekers could improve adoption and transparency. Regular feedback loops and Participatory Design sessions can ensure that system updates align with users’ real-world needs and concerns.
Strengthen human–AI collaboration: Foster balanced decision-making by clearly defining roles where AI supports, rather than replaces, human judgment. Training programs for caseworkers can further enhance their ability to interpret and responsibly act on AI recommendations.
Increase transparency: Communicate decision thresholds, probability estimates, and key performance metrics to stakeholders. Making this information easily accessible and understandable can help clarify system behavior and enhance informed insights.
Strengthen ethical and legal compliance: Ensure mechanisms for jobseekers to appeal decisions, aligning with Swedish administrative and anti-discrimination regulations. Establishing clear documentation and accessible communication channels for appeals will reinforce fairness and accountability.

6.3.2. AI in Criminal Justice

The integration of AI into criminal justice systems has significantly impacted risk assessment, judicial decision-making, and prison management; while proponents highlight AI’s potential to reduce case backlogs and improve efficiency, critics warn of serious risks to fundamental liberties. Central to these concerns is the question of trustworthiness, and whether AI-driven decisions are fair, transparent, and accountable.

A major challenge in AI-driven criminal justice is the black-box effect [111]. The lack of transparency raises issues of accountability, especially when AI influences life-altering decisions like bail, parole, and sentencing. Predictive policing tools like CompStat analyze crime patterns, but they risk reinforcing biases and infringing on individual privacy. Similarly, AI-based risk assessment tools such as COMPAS, used in the US to predict recidivism, have been criticized for racial bias and lack of explainability [111].

Automation has expanded into courtrooms, where AI predicts recidivism and flight risk, directly influencing judicial decisions. However, the use of AI in sentencing has led to significant legal challenges. In the Loomis v. Wisconsin case from 2016, the court upheld the use of AI risk assessments but emphasized that judges retain ultimate discretion [111]. Similarly, the Kansas v. Wallks case from 2017 further reinforced due process rights, ruling that defendants must have access to AI-generated assessments to challenge their validity [111]. These cases highlight the ongoing struggle to ensure that AI tools do not undermine fundamental legal principles such as the presumption of innocence and the right to a fair trial.

AI applications extend beyond courts to prison management, where they are used for surveillance, risk assessment, and rehabilitation. In China, AI systems monitor inmates 24/7, raising concerns about mass surveillance and human rights violations. In Finland, inmates train AI algorithms for content classification, integrating rehabilitation with digital literacy. Similarly, England and Wales have introduced AI-based coding programs to help prisoners develop job skills. However, the use of AI “companions” for solitary confinement has sparked ethical debates, as critics argue this technology normalizes harmful isolation practices [111].

In the EU, AI’s role in criminal justice is scrutinized under human rights frameworks, particularly the European Convention on Human Rights (ECHR). Concerns include the right to a fair trial, privacy, and non-discrimination. Predictive policing, for example, has been criticized for disproportionately targeting marginalized communities. The GDPR offers some safeguards by granting individuals the right to consent AI-driven decisions, reinforcing the need for human oversight in AI applications [111].

The increasing reliance on AI in criminal justice underscores the need for democratic debate and regulatory oversight: while AI can enhance efficiency, removing human discretion entirely can lead to injustices, as seen in the unintended consequences of automated welfare systems and mandatory sentencing laws. A balance must be struck between using AI’s capabilities and ensuring that legal standards and human rights are upheld. Future developments in explainable AI and interdisciplinary collaboration between technologists, legal experts, and policymakers will be crucial in fostering trust in AI-driven criminal justice systems.

6.3.3. The Robodebt System in Australia

In 2016–2019, the Australian government implemented an AI-based debt recovery system, known as “Robodebt”, which automatically sent debt notices to citizens based on algorithmic calculations of income discrepancies. By relying on automated income averaging without human oversight, the system generated inaccurate debt notices, disproportionately impacting vulnerable welfare recipients [112]. The Robodebt scandal exemplifies the critical need for trustworthiness metrics in automated decision-making systems, particularly those deployed by government agencies. The absence of human intervention in critical decision-making, coupled with a lack of transparency and accountability, eroded public trust and led to widespread social and financial harm. Robodebt’s failure highlights the dangers of deploying automated systems without robust mechanisms to mitigate algorithmic biases, such as model bias, data bias, and social bias [112]. Trustworthiness metrics, including fairness assessments, explainability standards, and mechanisms for human oversight, could have prevented the system’s ethical and legal breaches. This case underscores the necessity of integrating ethical AI principles and continuous monitoring frameworks into government automation efforts to safeguard public welfare and prevent future injustices [112].

6.3.4. The COMPAS Algorithm in the US

In 2013–2016, the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) algorithm was widely used in the US criminal justice system to assess the risk of offenders reoffending and to inform sentencing decisions. The COMPAS algorithm exemplifies the complexities of applying trustworthiness metrics in high-stakes decision-making. It predicts defendants’ risk of recidivism, categorizing them into risk levels that influence sentencing and parole decisions [113]. However, its reliability and fairness have been widely debated, particularly in the Loomis case, where concerns over transparency, bias, and due process were raised. The Wisconsin Supreme Court upheld its use, emphasizing that judges should consider multiple factors. Criticisms of COMPAS intensified after a 2016 ProPublica study revealed racial disparities, suggesting that black defendants were more likely to be misclassified as high-risk [113,114]. The broader challenge with such predictive systems lies in classification thresholds, which shape policy outcomes by managing false positives and false negatives; while differential thresholding can address biases, it raises ethical concerns about treating individuals differently based on group membership. As alternative predictive systems emerge, comparative evaluations using confusion matrices become crucial in assessing accuracy, fairness, and overall trustworthiness in real-world applications.

7. Emerging Trends and Future Directions in Trustworthy AI

7.1. Foundation Models and Trustworthiness Challenges

Foundation models, particularly large language models (LLMs) such as OpenAI’s ChatGPT, Google’s Bard, and Meta’s LLaMA, among others, have significantly advanced the capabilities of AI. These models, characterized by their scale and versatility, serve as the backbone for a wide range of applications. However, their adoption introduces significant challenges related to trustworthiness, including reliability, transparency, ethical alignment, and accountability.

While foundational models demonstrate remarkable performance across diverse domains, their outputs can be inconsistent and prone to errors [115]. Key issues include hallucinations and the inability to provide factually accurate information, failures in novel or edge-case scenarios that result in unpredictable behavior, and the intensive use of resources. Significant gaps often emerge between simulated and real-world performance. Furthermore, the black-box nature of foundation models complicates efforts to understand their decision-making processes. The underlying algorithms and vast training datasets are often proprietary, hindering reproducibility and external audits. Additionally, simplified explanations provided to users often lack depth, which may suffice for general applications but fail in high-stakes contexts. As [116] argues, the urgency of interpretability in AI is paramount; without understanding how these models arrive at their outputs, diagnosing errors, preventing undesirable behavior, and ensuring alignment with human values become exceedingly difficult. This lack of transparency can lead to a range of failure scenarios, eroding public trust and potentially causing significant harm.

Recent incidents involving LLMs showcase the critical importance of establishing and maintaining trustworthiness in modern AI systems. These events serve as cautionary examples, revealing vulnerabilities that demand closer scrutiny. For example, during its first public demo in February 2023, Google’s Bard LLM incorrectly claimed that the James Webb Space Telescope took the first exoplanet photo, which is factually inaccurate. This error resulted in a significant market impact, with $100 billion in market value lost for Google within hours [117]. Such a major failure emphasizes the risks associated with LLMs, including the need for rigorous safety evaluations for factual accuracy and careful oversight during deployment, especially before critical public demonstrations.

Similarly, Microsoft’s Bing Chat, known internally as “Sydney”, came under scrutiny due to incidents of prompt injection and instances of the model exhibiting aggressive behavior. These issues culminated in public backlash and negative media coverage [118,119]. The root causes included vulnerabilities to prompt injection, inadequate safeguards against generating emotionally charged or inappropriate content, and a potential lack of thorough “red-teaming” (adversarial testing) to identify these behavioral failures prior to wider release. Furthermore, CNET, under Red Ventures, published numerous articles generated by an internal AI tool. These articles, primarily on financial topics, were later found to contain significant factual errors, awkward phrasing, and instances of content closely resembling material from other websites without proper attribution, raising plagiarism concerns [120,121]. The incident damaged CNET’s journalistic reputation and led to a public apology followed by a review of their AI-generated content strategy.

Ethical and societal concerns are amplified due to the scale and deployment scope of these models. Training data often reflects societal biases, which can surface in outputs, leading to and even increasing discrimination or unfair practices. Ensuring that generated content adheres to ethical standards while preserving free expression remains a complex and ongoing challenge. Furthermore, these models can be misused for harmful purposes, including cyberattacks and the generation of misinformation.

The deployment of foundation models raises accountability challenges due to their autonomous decision-making capabilities. Issues include diffuse responsibility, particularly in multiagent or collaborative AI systems, where pinpointing accountability for errors or unethical outcomes becomes challenging [122]. Furthermore, the broad applications of foundation models require alignment with global regulations, such as GDPR or the EU AI Act, yet their generic nature often complicates domain-specific compliance.

In 2023, China introduced the “Interim Measures for the Administration of Generative Artificial Intelligence Services” regulation, marking the world’s first comprehensive regulatory framework specifically targeting generative AI. Based on a vertical-iterative legislative model, it emphasizes targeted regulation for distinct AI categories and allows for continuous adaptation as AI technologies evolve. These measures build on prior regulations for technologies such as deepfakes and demonstrate China’s proactive strategy in AI governance [123]. They reflect a dual emphasis on promoting innovation and maintaining strict oversight, requiring generative AI providers to ensure data accuracy, transparency, and lawful sourcing, while also mandating content labeling and mechanisms for compliant resolution.

China’s framework stands out for introducing clear accountability and regulatory disclosure duties throughout the AI lifecycle, from data collection to deployment. This includes legal obligations to protect user privacy, ensure ethical data labeling, and quickly address harmful content. Additionally, service providers and AI model developers are required to register their models for security inspection and assessment, maintained by authorities in an “algorithm registry” for oversight and record-keeping [124,125]. Although these measures prioritize national security and social stability, they raise the risk of creating a high-control environment where content is regulated [126]. However, this regulation provides information on how various economies, especially those with centralized governance structures, are shaping the global landscape of trustworthy AI.

India’s Digital Personal Data Protection Act (DPDP), passed in August 2023, marks a significant milestone in the country’s journey toward regulating data-driven technologies, including AI [127]. The DPDP reflects a shift from comprehensive, GDPR-style regulation to a more streamlined model. DPDP simplifies compliance obligations by reducing the granularity of consent procedures and narrowing the definition of sensitive data categories [128]. This regulation aims to encourage accountability through self-regulation rather than strict enforcement. For AI governance, particularly in generative models that rely on large-scale personal data, the DPDP lays an important foundation. However, its success will depend heavily on the future rules and guidelines that the Indian government will issue and its alignment with global frameworks for trustworthy AI.

TrustLLM [122] is a comprehensive benchmark suite designed to evaluate the trustworthiness of large language models (LLMs). It incorporates over 30 datasets and assesses 16 LLMs, including both open-source and proprietary models, to assess their performance across a set of principles. Specifically, TrustLLM evaluates LLMs on eight key principles: truthfulness, safety, fairness, robustness, privacy, machine ethics, transparency, and accountability. The findings indicate that no LLM excels across all principles; while proprietary models generally outperform open-source ones, Llama2 emerges as a promising open-source contender [122]. Additionally, the evaluation highlights that overly cautious models often compromise utility and transparency, meaning that it is important to balance between calibration and accuracy. In machine learning, overly cautious models are models that prioritize minimizing errors, especially false positives or false negatives, over achieving strong performance. These models are typically too hesitant to make strong predictions or too conservative in their outputs.

While significant progress has been made in assessing and building trustworthy foundation models, much work remains to be done. Addressing critical challenges is essential, such as leveraging real-world behaviors and attitudes of users to better understand the impacts and risks of these technologies. Additionally, regulators and governments play a crucial role in fostering trust by developing clear guidance in collaboration with industry and society [129].

7.2. Trustworthiness Challenges and Evolving Evaluation Frameworks for Generative AI Systems

The widespread use of sophisticated AI agents—particularly those leveraging large language models (LLMs)—across various industries and applications represents a paradigm shift in operational capabilities, risk management, and customer interaction. However, this growing integration of AI systems exposes critical challenges related to the adequacy of existing evaluation metrics for rigorously assessing these systems’ trustworthiness, ethical behavior, and regulatory compliance [130,131]. Traditional benchmarks, often focused on narrow, task-specific measures of accuracy or efficiency [132], fall short of capturing the multifaceted performance and potential societal impacts of AI agents operating in complex and dynamic environments. The non-deterministic nature of LLM outputs, for instance, can lead to unforeseen consequences, making continuous and comprehensive evaluation essential for responsible innovation.

LLMs such as ChatGPT, Bard, and Claude bring a distinct set of trustworthiness risks that require customized evaluation approaches. One of the most pressing issues is hallucination (i.e., the confident generation of false or misleading information), which directly undermines accuracy and reliability metrics. Furthermore, the vast datasets used in training can lead to bias amplification, negatively impacting fairness, and increasing existing societal inequalities [133]. These phenomena also pose indirect risks to explainability and privacy. The interaction between these metrics is complex: improving robustness to adversarial prompts could decrease interpretability, while enforcing strict privacy constraints might limit model transparency.

These emerging challenges highlight a critical insight: trustworthiness is not a single, unified metric, but a collection of partially overlapping and sometimes conflicting properties. For example, increasing a model’s accuracy through the inclusion of more detailed or personalized data may compromise privacy, especially in generative models that can memorize training content. In other cases, prioritizing fairness by re-weighing data samples or introducing bias mitigation layers can lead to a decline in overall predictive performance, especially in underrepresented subgroups [134]. These interactions illustrate that the pursuit of one trust dimension often comes at the cost of another, which necessitates careful trade-off management rather than optimization in isolation.

The limitations of current evaluation frameworks become particularly emphasized when considering high-stakes applications such as finance. Issues such as data drift, where the statistical properties of input data change over time, can degrade model performance in ways that static, pre-deployment evaluations fail to predict [135]. Furthermore, the “black box” nature of many advanced AI models complicates the assessment of their decision-making processes, making it challenging to ensure fairness, transparency, and accountability—cornerstones of financial regulation and consumer trust [136]. For LLMs, challenges include ensuring factual accuracy, mitigating biases learned from vast training datasets, preventing the generation of harmful or misleading financial advice, and ensuring outputs adhere to strict regulatory and ethical guidelines [137]. The inability to adequately measure and validate these crucial aspects with current metrics poses significant risks, including financial losses, reputational damage, and regulatory penalties.

Consequently, there is a pressing need to develop and implement new, robust evaluation metrics specifically designed for the nuances of adaptive, context-sensitive AI systems. The capabilities of these novel metrics should go beyond simple single-valued performance scores, incorporating measures of robustness to adversarial attacks and distributional shifts, interpretability to allow for transparent decision auditing, and fairness to detect and mitigate biases across different demographic groups [135]. Moreover, for LLM-driven agents, metrics need to assess the coherence, factual grounding, and safety of generated content, as well as their capacity for nuanced reasoning in financial scenarios [138]. As highlighted by [139] in the context of adaptive machine learning, particularly in critical sectors, the ability of a system to continuously learn and adapt necessitates equally dynamic and comprehensive evaluation strategies that ensure sustained safety and efficacy post-deployment.

To address these questions, next-generation evaluation frameworks must integrate multi-objective optimization principles and dynamic monitoring tools that reflect real-world priorities and evolving user needs. For instance, context-aware metric weighing may be essential: in healthcare applications, reliability and fairness might outweigh full explainability, whereas in legal or financial systems, transparency could take precedence. Moreover, GenAI systems used for creative or conversational tasks must be assessed not just on correctness, but also on coherence, appropriateness, and harm avoidance, which may require human-in-the-loop evaluations and hybrid quantitative–qualitative assessment pipelines. As such, building trustworthy AI is not a matter of satisfying static checklists, but of engaging in continuous negotiation between competing values, aligned with domain requirements, ethical explanations, and societal norms [134].

The development of such comprehensive evaluation frameworks is a practical imperative for the sustainable and ethical deployment of AI systems. Future metrics should facilitate continuous monitoring and real-time feedback, enabling adaptive governance mechanisms that can respond to evolving model behaviors and regulatory landscapes [140]. This requires a multi-stakeholder effort, involving AI developers, institutions, regulatory bodies, and academic researchers, to co-create standards and benchmarks that foster innovation while safeguarding against systemic risks [141]. Integrating insights from human judgment research, as suggested by [142], could also illuminate current challenges in LLM assessment and help bridge critical gaps. Ultimately, the creation of rigorous, multifaceted evaluation methodologies will be pivotal in building trust in AI and ensuring that AI systems serve as reliable and beneficial technology in society.

7.3. Regulatory and Ethical Trends

Existing fairness metrics, such as demographic parity, primarily focus on group or individual fairness. However, these metrics often fail to account for intersectional biases. Intersectional bias refers to biases that arise from the overlapping and interconnected nature of social categories such as race, gender, socioeconomic status, dynamic societal norms, and nuanced ethical considerations [143]. Furthermore, AI models operate in environments where context plays a critical role. Thus, current metrics may not be able to adequately capture performance variations across diverse, domain-shifted, or real-world scenarios. As AI models are deployed in dynamic settings, their ability to generalize and adapt to unseen data remains a major challenge. Metrics that assess adaptability and robustness under evolving conditions are underdeveloped to this day [144]. The comprehensive evaluation of an AI system’s ethical compliance is further limited due to the existing gap between the dominance of fairness compared to other principles such as transparency and accountability [143]. Finally, metrics often exclude subjective human perspectives, which are crucial for assessing user satisfaction, trust, and perceived ethical alignment [145]. In this context, bridging the gap between quantitative and qualitative evaluations is necessary for a holistic assessment. Directions for future metric development include intersectional fairness metrics, dynamic context sensitivity, robustness and adaptability indices, ethical compliance benchmarks, and human-informed metrics [144].

7.3.1. Il-Learn: A Novel Metric for Measuring Intelligence Evolution in Learning Systems

The Il-Learn metric presents a novel approach to evaluating the intelligence evolution of Cooperative Multiagent Systems (CMAS) [146]. It is designed to bridge critical gaps in current evaluation frameworks by focusing on system-level intelligence changes, adaptability, and problem-solving efficiency. This metric quantifies intelligence changes in pre- and post-learning, addresses issues of accuracy and robustness, and distinguishes between positive and negative learning impacts.

Il-Learn is applicable to any intelligent learning system, regardless of its architecture. It emphasizes system-level cooperation and problem-solving capabilities as primary indicators of intelligence. The metric employs statistical tests, such as the Two-Sample Unpaired t-test and Welch’s test, to assess intelligence variability with high precision and reliability [146]. Its advantage over existing metrics lies in enhanced accuracy with smaller sample sizes, surpassing traditional metrics, including accuracy, robustness, and universality [146]. Universality means a metric is independent of factors like the agent, system architecture, and environment, making it widely applicable across different contexts. Il-Learn also incorporates robust evaluation methods, including outlier removal and non-Gaussian data handling, while focusing directly on learning-induced intelligence changes, which helps address an important gap in current frameworks.

The Il-Learn metric lays the groundwork for a deeper understanding of intellligence evolution in learning systems. It represents a significant advancement in measuring AI learning impacts. Future research could explore its adaptation to broader AI domains, refinement for specific tasks, and integration with other ethical and performance evaluation metrics.

As AI adoption expands across critical sectors, regulatory bodies worldwide are increasingly acknowledging the need to mitigate the risks posed by AI systems while ensuring their ethical deployment. Recently proposed regulations highlight an evolving focus on impact assessments, public transparency, and anti-discrimination measures as key elements that are essential to fostering trust in AI.

One of the central themes emerging in AI governance is the mandate for algorithmic accountability, particularly through the implementation of impact assessments. These serve as systematic evaluations of automated systems, analyzing their potential capabilities, limitations, and societal impacts prior to deployment [20]. They aim to identify and mitigate risks such as unintended discrimination, systemic bias, and harm to marginalized communities. As indicated in Figure 4, four key regulations highlight this trend:

Algorithmic Accountability Act of 2022 (AAA) [147]: Introduced in the US, the AAA emphasizes the need for comprehensive impact assessments of automated decision systems in key sectors. A distinctive feature of this regulation is its focus on stakeholder engagement, requiring consultations with impacted communities to address issues of bias and discrimination. AAA also emphasizes the importance of transparency and accountability.
New York City’s Int. 1894 [20]: This regulation, introduced in 2020, targets automated employment decision tools, mandating bias audits to ensure fairness in hiring processes. The law requires developers and employers to disclose the criteria, data sources, and retention policies of automated systems to job candidates, thereby promoting greater transparency and accountability in algorithmic decision-making.
California’s AB 13 [20]: This legislation extends algorithmic accountability to state procurement processes, requiring developers to conduct detailed impact assessments, which must address system performance, risks, limitations and potential disparate impacts on protected groups. AB 13 aligns technical assessments with ethical considerations, prioritizing fairness and equity.
The EU AI Act: this framework is one of the most comprehensive AI governance frameworks to date globally. It proposes a risk-based approach that categorizes AI systems based on their potential impact on human rights and safety.

These regulations collectively emphasize the role of impact assessments in establishing trustworthiness standards. By requiring pre-deployment evaluations, they encourage developers to identify and mitigate potential harms early in the AI lifecycle.

7.3.2. The EU AI Act

The EU AI Act is considered one of the most comprehensive legal frameworks for AI globally [22]. Its aim is to mitigate AI-related risks, build public trust, and position Europe as a global leader in ethical AI development. By adopting a risk-based approach, the Act ensures proportionate regulation that fosters innovation while safeguarding fundamental rights. Operating within the EU’s broader digital ecosystem, the Act aligns with existing laws such as the GDPR, the Digital Services Act (DSA), and the Unfair Commercial Practices Directive (UCPD). Its extraterritorial provisions ensure that AI systems deployed in the EU, regardless of origin, comply with its regulations.

The key features of the EU AI Act can be summarized as follows [148]:

Need for regulation: the EU AI Act addresses the inability of current laws to manage AI risks effectively, particularly those that impact safety, fairness, and fundamental rights.
Risk-based categorization: AI systems are regulated based on their level of risk, ensuring a proportionate response. The risk categories are defined as unacceptable risk, high risk, limited risk, minimal risk, or no risk.
General-Purpose AI (GPAI): General-purpose AI models face additional transparency and risk management obligations, including self-assessments, bias mitigation, and cybersecurity measures.
Enforcement and governance: The European AI Office, established in 2024, oversees implementation, monitors compliance, and promotes ethical AI development.
Implementation timeline: The EU AI Act came into force in August 2024. Within six months, bans on AI systems posing unacceptable risks will be implemented. One year later, the GPAI rules and governance provisions will take effect, while full obligations for AI integrated into regulated products will come into force in three years.

This European legislation aims to promote trustworthy AI through compliance assessments, transparency requirements, and expert-led risk evaluations. As the European Commission highlights, trust relies on effectively managing risks. Based on the AI HLEG’s 2019 Guidelines, trustworthiness means AI must be lawful, ethical, and robust, with safeguards like human oversight and data governance to minimize unacceptable risks. The Act follows a paternalistic model, where developers and auditors assess risk acceptability.

The AI Act faces five key challenges in achieving trustworthy AI: uncertain trust foundations, misalignment between trust and trustworthiness, behavioral bias in risk acceptance, impartiality of intermediaries, and the risk of regulatory capture [149]. Nonetheless, it is a significant step in AI governance, establishing consistent rules to balance innovation and safety. By refining the Act’s provisions and encouraging public participation, the EU has the potential to set a global standard for sustainable and trustworthy AI.

8. Conclusions

The rapid integration of artificial intelligence (AI) into various industries such as healthcare, finance, and public administration, among others, highlights the pressing need for trustworthy AI. This paper addresses this topic by examining the metrics, frameworks, and methodologies essential for evaluating and enhancing the trustworthiness of AI systems. We provide a comprehensive review of current standards, such as the NIST AI Risk Management Framework (AI RMF), the AI Trust Framework and Maturity Model (AI-TMM), and relevant ISO/IEC standards. Through a combination of theoretical insights and practical applications, this study identifies the core components and challenges associated with trustworthy AI and proposes actionable solutions to navigate these complexities.

This paper highlights several findings. First, it establishes that trustworthiness in AI encompasses multidimensional attributes, including fairness, transparency, privacy, accountability, and security. Each of these attributes is explored in detail, revealing their interdependencies and the trade-offs necessary to balance competing priorities. For instance, ensuring privacy may conflict with achieving complete transparency, while enhancing fairness can sometimes reduce system efficiency. By delineating these trade-offs, we aim to provide clarity on how to approach the design and evaluation of AI systems.

The review of metrics for AI trustworthiness reveals the importance of quantitative tools such as Shapley Additive Explanations (SHAP), Local Interpretable Model-agnostic Explanations (LIME), differential privacy, and federated learning. With their benefits and drawbacks, these methodologies address key challenges in explainability, bias mitigation, and data security, making them indispensable for evaluating resilient and user-centric AI systems. Furthermore, we presented case studies across industries to demonstrate how these metrics have been applied and what their limitations are, offering real-world perspectives of their effectiveness.

We performed comparative analysis of various frameworks for AI trustworthiness to evaluate their strengths in addressing specific aspects of AI governance. The NIST AI RMF’s sociotechnical approach effectively integrates technical and societal considerations, while the AI-TMM’s maturity model provides a structured guidance for organizations to assess and improve trustworthiness. ISO/IEC standards contribute by offering global benchmarks for ethical AI development, ensuring alignment with regulatory requirements and fostering international collaboration.

While this study provides a comprehensive overview of current methodologies, it also identifies several avenues for future research. One key area is the development of standardized benchmarks for trustworthiness metrics, which would facilitate more consistent evaluations across industries. Additionally, there is a need for further exploration of the trade-offs between competing trustworthiness attributes, particularly in contexts where resource constraints demand prioritization. Additionally, emerging technologies such as generative AI and large language models present new challenges and opportunities for trustworthiness. Future research should focus on adapting existing frameworks to address the unique risks associated with these technologies.

This research aims to foster public trust in AI systems and advance the field of trustworthy AI. By analyzing existing metrics and frameworks, it bridges the gap between theoretical principles and practical implementations. The emphasis on actionable insights and real-world applications ensures that the findings are not only valuable in academia but also practically relevant for AI developers, policymakers, and industry stakeholders. As AI continues to influence decision-making processes in high-stakes domains, the ability to demonstrate fairness, transparency, and accountability becomes critical for user acceptance and regulatory compliance. By providing a roadmap for evaluating and improving trustworthiness, we give insights to stakeholders into the tools needed to build systems that are not only effective but also ethically sound and socially responsible.

Author Contributions

Conceptualization, A.N., B.J., M.R. and D.T.; methodology, A.N., B.J., M.R. and D.T.; investigation, A.N., B.J. and M.R.; writing—original draft preparation, A.N. and B.J.; writing—review and editing, A.N., B.J., M.R. and D.T.; visualization, A.N., B.J. and M.R.; supervision, M.R. and D.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Anwer, S.; Hosen, M.S.; Khan, D.S.; Oluwabusayo, E.; Folorunso, M.; Khan, H. Revolutionizing the global market: An inclusion of AI the game changer in international dynamics. Migr. Lett. 2024, 21, 54–73. [Google Scholar]
Cousineau, C.; Dara, R.; Chowdhury, A. Trustworthy AI: AI developers’ lens to implementation challenges and opportunities. Data Inf. Manag. 2024, 9, 100082. [Google Scholar] [CrossRef]
Xu, Y.; Liu, X.; Cao, X.; Huang, C.; Liu, E.; Qian, S.; Liu, X.; Wu, Y.; Dong, F.; Qiu, C.W.; et al. Artificial intelligence: A powerful paradigm for scientific research. Innovation 2021, 2, 100179. [Google Scholar] [CrossRef]
Gardner, C.; Robinson, K.; Smith, C.; Steiner, A. Contextualizing End-User Needs: How to Measure the Trustworthiness of an AI System; Carnegie Mellon University, Software Engineering Institute: Pittsburgh, PA, USA, 2023. [Google Scholar]
Li, B.; Qi, P.; Liu, B.; Di, S.; Liu, J.; Pei, J.; Yi, J.; Zhou, B. Trustworthy AI: From principles to practices. ACM Comput. Surv. 2023, 55, 1–46. [Google Scholar] [CrossRef]
Kuhn, S.; Muller, M.J. Participatory design. Commun. ACM 1993, 36, 24–29. [Google Scholar]
Vianello, A.; Laine, S.; Tuomi, E. Improving trustworthiness of AI solutions: A qualitative approach to support ethically-grounded AI design. Int. J. Hum. Comput. Interact. 2023, 39, 1405–1422. [Google Scholar] [CrossRef]
Lahusen, C.; Maggetti, M.; Slavkovik, M. Trust, trustworthiness and AI governance. Sci. Rep. 2024, 14, 20752. [Google Scholar] [CrossRef]
Ferrara, E. Fairness and bias in artificial intelligence: A brief survey of sources, impacts, and mitigation strategies. Sci 2023, 6, 3. [Google Scholar] [CrossRef]
Hlongwane, R.; Ramabao, K.; Mongwe, W. A novel framework for enhancing transparency in credit scoring: Leveraging Shapley values for interpretable credit scorecards. PLoS ONE 2024, 19, e0308718. [Google Scholar] [CrossRef]
Gursoy, F.; Kakadiaris, I.A. Error parity fairness: Testing for group fairness in regression tasks. arXiv 2022, arXiv:2208.08279. [Google Scholar]
Jiang, Z.; Han, X.; Fan, C.; Yang, F.; Mostafavi, A.; Hu, X. Generalized demographic parity for group fairness. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
Larsson, S.; Heintz, F. Transparency in artificial intelligence. Internet Policy Rev. 2020, 9, 1–16. [Google Scholar] [CrossRef]
Kovari, A. AI for Decision Support: Balancing Accuracy, Transparency, and Trust Across Sectors. Information 2024, 15, 725. [Google Scholar] [CrossRef]
Smuha, N.A. The Work of the High-Level Expert Group on AI as the Precursor of the AI Act. 2024. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5012626 (accessed on 30 June 2025).
Winfield, A.F.; Booth, S.; Dennis, L.A.; Egawa, T.; Hastie, H.; Jacobs, N.; Muttram, R.I.; Olszewska, J.I.; Rajabiyazdi, F.; Theodorou, A.; et al. IEEE P7001: A proposed standard on transparency. Front. Robot. AI 2021, 8, 665729. [Google Scholar] [CrossRef] [PubMed]
Khalid, N.; Qayyum, A.; Bilal, M.; Al-Fuqaha, A.; Qadir, J. Privacy-preserving artificial intelligence in healthcare: Techniques and applications. Comput. Biol. Med. 2023, 158, 106848. [Google Scholar] [CrossRef] [PubMed]
Elliott, D.; Soifer, E. AI technologies, privacy, and security. Front. Artif. Intell. 2022, 5, 826737. [Google Scholar] [CrossRef]
Novelli, C.; Taddeo, M.; Floridi, L. Accountability in artificial intelligence: What it is and how it works. AI Soc. 2024, 39, 1871–1882. [Google Scholar] [CrossRef]
Oduro, S.; Moss, E.; Metcalf, J. Obligations to assess: Recent trends in AI accountability regulations. Patterns 2022, 3, 100608. [Google Scholar] [CrossRef]
Oseni, A.; Moustafa, N.; Janicke, H.; Liu, P.; Tari, Z.; Vasilakos, A. Security and privacy for artificial intelligence: Opportunities and challenges. arXiv 2021, arXiv:2102.04661. [Google Scholar]
Commission, E. AI Act. Available online: https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai/ (accessed on 15 May 2025).
National Institute of Standards and Technology. AI Risk Management Framework. Available online: https://www.nist.gov/itl/ai-risk-management-framework (accessed on 10 May 2025).
NITI Aayog. National Strategy for Artificial Intelligence #AIFORALL. Available online: https://www.niti.gov.in/sites/default/files/2023-03/National-Strategy-for-Artificial-Intelligence.pdf (accessed on 30 May 2025).
Tuzov, V.; Lin, F. Two paths of balancing technology and ethics: A comparative study on AI governance in China and Germany. Telecommun. Policy 2024, 48, 102850. [Google Scholar] [CrossRef]
Riyazahmed, K. AI in finance: Needs attention to bias. Annu. Res. J. Scms Pune Vol. 2023, 11, 1. [Google Scholar]
Cao, L.; Yang, Q.; Yu, P.S. Data science and AI in FinTech: An overview. Int. J. Data Sci. Anal. 2021, 12, 81–99. [Google Scholar] [CrossRef]
Georgieva, K. AI Will Transform the Global Economy. Let us Make Sure It Benefits Humanity; IMF Blog (blog), International Monetary Fund: Washington, DC, USA, 2024. [Google Scholar]
Al-Gasawneh, J.; Alfityani, A.; Al-Okdeh, S.; Almasri, B.; Mansur, H.; Nusairat, N.; Siam, Y. Avoiding uncertainty by measuring the impact of perceived risk on the intention to use financial artificial intelligence services. Uncertain Supply Chain. Manag. 2022, 10, 1427–1436. [Google Scholar] [CrossRef]
Maurya, S.; Verma, R.; Khilnani, L.; Bhakuni, A.S.; Kumar, M.; Rakesh, N. Effect of AI on the Financial Sector: Risk Control, Investment Decision-making, and Business Outcome. In Proceedings of the 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 14–15 March 2024; pp. 1–7. [Google Scholar]
Buchanan, B.G. Artificial Intelligence in Finance; The Alan Turing Institute: London, UK, 2019. [Google Scholar]
Fritz-Morgenthal, S.; Hein, B.; Papenbrock, J. Financial risk management and explainable, trustworthy, responsible AI. Front. Artif. Intell. 2022, 5, 779799. [Google Scholar] [CrossRef] [PubMed]
Albahri, A.; Duhaim, A.; Fadhel, M.; Alnoor, A.; Baqer, N.; Alzubaidi, L.; Albahri, O.S.; Alamoodi, A.H.; Bai, J.; Salhi, A.; et al. A systematic review of trustworthy and explainable artificial intelligence in healthcare: Assessment of quality, bias risk and data fusion. Inf. Fusion 2023, 96, 156–191. [Google Scholar] [CrossRef]
Kalusivalingam, A.K.; Sharma, A.; Patel, N.; Singh, V. Leveraging SHAP and LIME for Enhanced Explainability in AI-Driven Diagnostic Systems. Int. J. AI ML 2021, 2, 1–23. [Google Scholar]
Ashraf, K.; Nawar, S.; Hosen, M.H.; Islam, M.T.; Uddin, M.N. Beyond the Black Box: Employing LIME and SHAP for Transparent Health Predictions with Machine Learning Models. In Proceedings of the 2024 International Conference on Advances in Computing, Communication, Electrical, and Smart Systems (iCACCESS), Dhaka, Bangladesh, 8–9 March 2024; pp. 1–6. [Google Scholar]
Samala, A.D.; Rawas, S. Generative AI as Virtual Healthcare Assistant for Enhancing Patient Care Quality. Int. J. Online Biomed. Eng. 2024, 20, 174–187. [Google Scholar] [CrossRef]
Hicks, S.A.; Strümke, I.; Thambawita, V.; Hammou, M.; Riegler, M.A.; Halvorsen, P.; Parasa, S. On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 2022, 12, 5979. [Google Scholar] [CrossRef] [PubMed]
van Noordt, C.; Tangi, L. The dynamics of AI capability and its influence on public value creation of AI within public administration. Gov. Inf. Q. 2023, 40, 101860. [Google Scholar] [CrossRef]
Kuziemski, M.; Misuraca, G. AI governance in the public sector: Three tales from the frontiers of automated decision-making in democratic settings. Telecommun. Policy 2020, 44, 101976. [Google Scholar] [CrossRef]
Michael, K.; Abbas, R.; Roussos, G.; Scornavacca, E.; Fosso-Wamba, S. Ethics in AI and autonomous system applications design. IEEE Trans. Technol. Soc. 2020, 1, 114–127. [Google Scholar] [CrossRef]
He, H.; Gray, J.; Cangelosi, A.; Meng, Q.; McGinnity, T.M.; Mehnen, J. The challenges and opportunities of human-centered AI for trustworthy robots and autonomous systems. IEEE Trans. Cogn. Dev. Syst. 2021, 14, 1398–1412. [Google Scholar] [CrossRef]
Winfield, A.F.; Michael, K.; Pitt, J.; Evers, V. Machine ethics: The design and governance of ethical AI and autonomous systems [scanning the issue]. Proc. IEEE 2019, 107, 509–517. [Google Scholar] [CrossRef]
Bryson, J.; Winfield, A. Standardizing ethical design for artificial intelligence and autonomous systems. Computer 2017, 50, 116–119. [Google Scholar] [CrossRef]
Polemi, N.; Praça, I.; Kioskli, K.; Bécue, A. Challenges and efforts in managing AI trustworthiness risks: A state of knowledge. Front. Big Data 2024, 7, 1381163. [Google Scholar] [CrossRef] [PubMed]
Kaur, D.; Uslu, S.; Rittichier, K.J.; Durresi, A. Trustworthy artificial intelligence: A review. ACM Comput. Surv. CSUR 2022, 55, 1–38. [Google Scholar] [CrossRef]
Pedreschi, D.; Giannotti, F.; Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F. Meaningful explanations of black box AI decision systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 9780–9784. [Google Scholar]
Adadi, A.; Berrada, M. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting black-box models: A review on explainable artificial intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
Dai, X.; Keane, M.T.; Shalloo, L.; Ruelle, E.; Byrne, R.M. Counterfactual explanations for prediction and diagnosis in XAI. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, Oxford, UK, 19–21 May 2022; pp. 215–226. [Google Scholar]
Deloitte, U. Banking on the Bots: Unintended Bias in AI; Deloitte: London, UK, 2023. [Google Scholar]
Nelson, G.S. Bias in artificial intelligence. North Carol. Med. J. 2019, 80, 220–222. [Google Scholar] [CrossRef]
Varsha, P. How can we manage biases in artificial intelligence systems–A systematic literature review. Int. J. Inf. Manag. Data Insights 2023, 3, 100165. [Google Scholar]
Chu, C.; Leslie, K.; Nyrup, R.; Khan, S. Artificial Intelligence Can Discriminate on the Basis of Race and Gender, and Furthermore, Age. The Conversation. 2022, Volume 5. Available online: http://theconversation.com/artificial-intelligence-can-discriminate-on-the-basis-ofrace-and-gender-and-also-age-173617 (accessed on 15 May 2025).
Mbakwe, A.B.; Lourentzou, I.; Celi, L.A.; Wu, J.T. Fairness metrics for health AI: We have a long way to go. eBioMedicine 2023, 90, 104525. [Google Scholar] [CrossRef]
Rizinski, M.; Peshov, H.; Mishev, K.; Chitkushev, L.T.; Vodenska, I.; Trajanov, D. Ethically responsible machine learning in fintech. IEEE Access 2022, 10, 97531–97554. [Google Scholar] [CrossRef]
John-Mathews, J.M.; Cardon, D.; Balagué, C. From reality to world. A critical perspective on AI fairness. J. Bus. Ethics 2022, 178, 945–959. [Google Scholar] [CrossRef]
Mylrea, M.; Robinson, N. Artificial Intelligence (AI) trust framework and maturity model: Applying an entropy lens to improve security, privacy, and ethical AI. Entropy 2023, 25, 1429. [Google Scholar] [CrossRef] [PubMed]
Al-Khassawneh, Y.A. A review of artificial intelligence in security and privacy: Research advances, applications, opportunities, and challenges. Indones. J. Sci. Technol. 2023, 8, 79–96. [Google Scholar] [CrossRef]
Reinhardt, K. Trust and trustworthiness in AI ethics. AI Ethics 2023, 3, 735–744. [Google Scholar] [CrossRef]
Wang, Y. Balancing Trustworthiness and Efficiency in AI Systems: A Comprehensive Analysis of Trade-offs and Strategies. IEEE Internet Comput. 2023, 27, 8–12. [Google Scholar] [CrossRef]
Cheong, B.C. Transparency and accountability in AI systems: Safeguarding wellbeing in the age of algorithmic decision-making. Front. Hum. Dyn. 2024, 6, 1421273. [Google Scholar] [CrossRef]
Hildebrandt, M. Privacy as protection of the incomputable self: From agnostic to agonistic machine learning. Theor. Inq. Law 2019, 20, 83–121. [Google Scholar] [CrossRef]
Mortaji, S.T.H.; Sadeghi, M.E. Assessing the Reliability of Artificial Intelligence Systems: Challenges, Metrics, and Future Directions. Int. J. Innov. Manag. Econ. Soc. Sci. 2024, 4, 1–13. [Google Scholar] [CrossRef]
Pawlicki, M.; Pawlicka, A.; Uccello, F.; Szelest, S.; D’Antonio, S.; Kozik, R.; Choraś, M. Evaluating the necessity of the multiple metrics for assessing explainable AI: A critical examination. Neurocomputing 2024, 602, 128282. [Google Scholar] [CrossRef]
Hamon, R.; Junklewitz, H.; Sanchez, I. Robustness and Explainability of Artificial Intelligence; Publications Office of the European Union: Brussels, Belgium, 2020; Volume 207, p. 2020. [Google Scholar]
Tocchetti, A.; Corti, L.; Balayn, A.; Yurrita, M.; Lippmann, P.; Brambilla, M.; Yang, J. AI robustness: A human-centered perspective on technological challenges and opportunities. ACM Comput. Surv. 2022, 57, 141. [Google Scholar]
Li, L.; Xie, T.; Li, B. Sok: Certified robustness for deep neural networks. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–24 May 2023; pp. 1289–1310. [Google Scholar]
Chander, B.; John, C.; Warrier, L.; Gopalakrishnan, K. Toward trustworthy artificial intelligence (TAI) in the context of explainability and robustness. ACM Comput. Surv. 2024, 57, 144. [Google Scholar] [CrossRef]
Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable ai: A review of machine learning interpretability methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef]
Lundberg, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Vij, R. Building Trustworthy AI: Interpretability in Vision and Linguistic Models. Available online: https://pub.towardsai.net/building-trustworthy-ai-interpretability-in-vision-and-linguistic-models-b78d1ea979d4/ (accessed on 20 October 2024).
Holzinger, A.; Langs, G.; Denk, H.; Zatloukal, K.; Müller, H. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1312. [Google Scholar] [CrossRef]
Floridi, L. The European legislation on AI: A brief analysis of its philosophical approach. Philos. Technol. 2021, 34, 215–222. [Google Scholar] [CrossRef] [PubMed]
Caton, S.; Haas, C. Fairness in machine learning: A survey. ACM Comput. Surv. 2024, 56, 1–38. [Google Scholar] [CrossRef]
Feretzakis, G.; Papaspyridis, K.; Gkoulalas-Divanis, A.; Verykios, V.S. Privacy-Preserving Techniques in Generative AI and Large Language Models: A Narrative Review. Information 2024, 15, 697. [Google Scholar] [CrossRef]
Swaminathan, N.; Danks, D. Application of the NIST AI Risk Management Framework to Surveillance Technology. arXiv 2024, arXiv:2403.15646. [Google Scholar]
AI, N. A Plan for Global Engagement on AI Standards; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2024. [Google Scholar]
Desai, A. NIST’s AI Risk Management Framework Explained. Available online: https://www.schellman.com/blog/cybersecurity/nist-ai-risk-management-framework-explained/ (accessed on 20 August 2023).
Dotan, R.; Blili-Hamelin, B.; Madhavan, R.; Matthews, J.; Scarpino, J. Evolving AI Risk Management: A Maturity Model based on the NIST AI Risk Management Framework. arXiv 2024, arXiv:2401.15229. [Google Scholar]
ISO/IEC 24028:2020; Information technology—Artificial Intelligence—Overview of Trustworthiness in Artificial Intelligence. International Organization for Standardization (ISO): Geneva, Switzerland, 2020.
ISO/IEC 42001:2023; Information Technology—Artificial Intelligence—Management System. International Organization for Standardization (ISO): Geneva, Switzerland, 2023.
Manziuk, E.; Barmak, O.; Krak, I.; Mazurets, O.; Skrypnyk, T. Formal Model of Trustworthy Artificial Intelligence Based on Standardization. In Proceedings of the IntelITSIS, Khmelnytskyi, Ukraine, 24–26 March 2021; pp. 190–197. [Google Scholar]
Dudley, C. The Rise of AI Governance: Unpacking ISO/IEC 42001. Quality 2024, 63, 27. [Google Scholar]
ISO/IEC 38507:2022; Information Technology—Governance of IT—Governance Implications of the Use of Artificial Intelligence by Organizations. International Organization for Standardization (ISO): Geneva, Switzerland, 2022.
ISO/IEC 23894:2023; Information Technology—Artificial Intelligence—Guidance on Risk Management. International Organization for Standardization (ISO): Geneva, Switzerland, 2023.
ISO/IEC 25059:2023; Software Engineering—Systems and Software Quality Requirements and Evaluation (SQuaRE)—Quality Model for AI Systems. International Organization for Standardization (ISO): Geneva, Switzerland, 2023.
Janaćković, G.; Vasović, D.; Vasović, B. Artificial Intelligence Standardisation Efforts. In Proceedings of the Engineering Management and Competitiveness (EMC 2024), Zrenjanin, Serbia, 21–22 June 2024; p. 250. [Google Scholar]
Giudici, P.; Centurelli, M.; Turchetta, S. Artificial Intelligence risk measurement. Expert Syst. Appl. 2024, 235, 121220. [Google Scholar] [CrossRef]
Pierna, J.A.F.; Abbas, O.; Baeten, V.; Dardenne, P. A Backward Variable Selection method for PLS regression (BVSPLS). Anal. Chim. Acta 2009, 642, 89–93. [Google Scholar] [CrossRef] [PubMed]
International Organization for Standardization (ISO). ISO/IEC 27000 Family—Information Security Management. Available online: https://www.iso.org/standard/iso-iec-27000-family (accessed on 30 June 2025).
ISO 31000:2018; Risk management—Guidelines. International Organization for Standardization (ISO): Geneva, Switzerland, 2018.
ISO/IEC 24029-2:2023; Artificial Intelligence (AI)—Assessment of the Robustness of Neural Networks. International Organization for Standardization (ISO): Geneva, Switzerland, 2023.
International Organization for Standardization (ISO). ISO/IEC WD TS 27115.4 Cybersecurity Evaluation of Complex Systems—Introduction and Framework Overview (Under Development). Available online: https://www.iso.org/standard/81627.html (accessed on 30 June 2025).
European Telecommunications Standards Institute (ETSI). Technical Committee (TC) Securing Artificial Intelligence (SAI). 2023. Available online: https://www.etsi.org/committee/2312-sai (accessed on 30 June 2025).
MITRE ATLAS. ATLAS Matrix. 2023. Available online: https://atlas.mitre.org/matrices/ATLAS (accessed on 30 June 2025).
European Union Agency for Cybersecurity (ENISA). Multilayer Framework for Good Cybersecurity Practices for AI. 2023. Available online: https://www.enisa.europa.eu/publications/multilayer-framework-for-good-cybersecurity-practices-for-ai (accessed on 30 June 2025).
European Commission. Ethics Guidelines for Trustworthy AI. 2019. Available online: https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai (accessed on 30 June 2025).
THEMIS Consortium. Themis 5.0. 2024. Available online: https://www.themis-trust.eu/results (accessed on 30 June 2025).
Joseph, J. Should the United States Adopt Federal Artificial Intelligence Regulation Similar to the European Union. Loyola Univ. Chic. Int. Law. Rev. 2023, 20, 105. [Google Scholar]
Scassa, T. Regulating AI in Canada: A Critical Look at the Proposed Artificial Intelligence and Data Act. Can. Bar. Rev. 2023, 101, 1. [Google Scholar]
Lucero, K. Artificial intelligence regulation and China’s future. Colum. J. Asian L. 2019, 33, 94. [Google Scholar]
Cornacchia, G.; Narducci, F.; Ragone, A. Improving the user experience and the trustworthiness of financial services. In Proceedings of the IFIP Conference on Human–Computer Interaction; Springer: Cham, Switzerland, 2021; pp. 264–269. [Google Scholar]
Zhou, J.; Chen, C.; Li, L.; Zhang, Z.; Zheng, X. FinBrain 2.0: When finance meets trustworthy AI. Front. Inf. Technol. Electron. Eng. 2022, 23, 1747–1764. [Google Scholar] [CrossRef]
Giudici, P.; Raffinetti, E. SAFE Artificial Intelligence in finance. Financ. Res. Lett. 2023, 56, 104088. [Google Scholar] [CrossRef]
Narsina, D.; Gummadi, J.C.S.; Venkata, S.; Manikyala, A.; Kothapalli, S.; Devarapu, K.; Rodriguez, M.; Talla, R. AI-Driven Database Systems in FinTech: Enhancing Fraud Detection and Transaction Efficiency. Asian Account. Audit. Adv. 2019, 10, 81–92. [Google Scholar]
Alsalem, M.; Alamoodi, A.H.; Albahri, O.S.; Albahri, A.S.; Martínez, L.; Yera, R.; Duhaim, A.M.; Sharaf, I.M. Evaluation of trustworthy artificial intelligent healthcare applications using multi-criteria decision-making approach. Expert Syst. Appl. 2024, 246, 123066. [Google Scholar] [CrossRef]
Blomberg, S.N.; Folke, F.; Ersbøll, A.K.; Christensen, H.C.; Torp-Pedersen, C.; Sayre, M.R.; Counts, C.R.; Lippert, F.K. Machine learning as a supportive tool to recognize cardiac arrest in emergency calls. Resuscitation 2019, 138, 322–329. [Google Scholar] [CrossRef] [PubMed]
Ducange, P.; Marcelloni, F.; Renda, A.; Ruffini, F. Federated Learning of XAI Models in Healthcare: A Case Study on Parkinson’s Disease. Cogn. Comput. 2024, 16, 3051–3076. [Google Scholar] [CrossRef]
Berman, A.; de Fine Licht, K.; Carlsson, V. Trustworthy AI in the public sector: An empirical analysis of a Swedish labor market decision-support system. Technol. Soc. 2024, 76, 102471. [Google Scholar] [CrossRef]
Završnik, A. Criminal justice, artificial intelligence systems, and human rights. In Proceedings of the ERA Forum; Springer: Berlin/Heidelberg, Germany, 2020; Volume 20, pp. 567–583. [Google Scholar]
Michael, K. In this special section: Algorithmic bias—Australia’s Robodebt and its human rights aftermath. IEEE Trans. Technol. Soc. 2024, 5, 254–263. [Google Scholar] [CrossRef]
Lagioia, F.; Rovatti, R.; Sartor, G. Algorithmic fairness through group parities? The case of COMPAS-SAPMOC. AI Soc. 2023, 38, 459–478. [Google Scholar] [CrossRef]
Larson, J.; Mattu, S.; Kirchner, L.; Angwin, J. How We Analyzed the COMPAS Recidivism Algorithm. 2016. Available online: https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm/ (accessed on 20 May 2025).
de Cerqueira, J.A.S.; Agbese, M.; Rousi, R.; Xi, N.; Hamari, J.; Abrahamsson, P. Can We Trust AI Agents? An Experimental Study Towards Trustworthy LLM-Based Multi-Agent Systems for AI Ethics. arXiv 2024, arXiv:2411.08881. [Google Scholar]
Amodei, D. The Urgency of Interpretability. Available online: https://www.darioamodei.com/post/the-urgency-of-interpretability/ (accessed on 26 April 2025).
Coulter, M.; Greg, B. Alphabet Shares Dive After Google AI Chatbot Bard Flubs Answer in ad. Available online: https://www.reuters.com/technology/google-ai-chatbot-bard-offers-inaccurate-information-company-ad-2023-02-08/ (accessed on 26 April 2025).
Edwards, B. AI-Powered Bing Chat Spills Its Secrets via Prompt Injection Attack. Available online: https://arstechnica.com/information-technology/2023/02/ai-powered-bing-chat-spills-its-secrets-via-prompt-injection-attack/ (accessed on 18 March 2025).
Winder, D. Hacker Reveals Microsoft’s New AI-Powered Bing Chat Search Secrets. Available online: https://www.forbes.com/sites/daveywinder/2023/02/13/hacker-reveals-microsofts-new-ai-powered-bing-chat-search-secrets/ (accessed on 18 March 2025).
Sato, M. CNET Pauses Publishing AI-Written Stories After Disclosure Controversy. Available online: https://www.theverge.com/2023/1/20/23564311/cnet-pausing-ai-articles-bot-red-ventures (accessed on 18 March 2025).
Harrington, C. CNET Pauses Publishing AI-Written Stories After Disclosure Controversy. Available online: https://www.wired.com/story/cnet-published-ai-generated-stories-then-its-staff-pushed-back/ (accessed on 18 March 2025).
Sajid, H. Navigating Risks Associated with Unreliable AI & Trustworthiness in LLMs. Available online: https://www.wisecube.ai/blog/navigating-risks-associated-with-unreliable-ai-trustworthiness-in-llms/ (accessed on 18 March 2025).
Zhu, S.; Ma, G. The Chinese Path to Generative AI Governance. 2023. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4551316 (accessed on 30 June 2025).
Ferrari, F.; Van Dijck, J.; Van den Bosch, A. Observe, inspect, modify: Three conditions for generative AI governance. New Media Soc. 2025, 27, 2788–2806. [Google Scholar] [CrossRef]
Khera, R.; Oikonomou, E.K.; Nadkarni, G.N.; Morley, J.R.; Wiens, J.; Butte, A.J.; Topol, E.J. Transforming cardiovascular care with artificial intelligence: From discovery to practice: JACC state-of-the-art review. J. Am. Coll. Cardiol. 2024, 84, 97–114. [Google Scholar] [CrossRef]
Zhang, A.H. The promise and perils of China’s regulation of artificial intelligence. Columbia J. Trans. Law 2025, 63, 1. [Google Scholar] [CrossRef]
Ministry of Law and Justice of India. The Digital Personal Data Protection Act. Available online: https://www.meity.gov.in/static/uploads/2024/06/2bf1f0e9f04e6fb4f8fef35e82c42aa5.pdf (accessed on 18 March 2025).
Sundara, K.; Narendran, N. Protecting Digital Personal Data in India in 2023: Is the lite approach, the right approach? Comput. Law Rev. Int. 2023, 24, 9–16. [Google Scholar] [CrossRef]
Wells, P. Towards a Market of Trustworthy AI Foundation Models. Available online: https://medium.com/writing-by-if/towards-a-market-of-trustworthy-ai-foundation-models-b19516f6da8c/ (accessed on 5 May 2025).
Shneiderman, B. Bridging the gap between ethics and practice: Guidelines for reliable, safe, and trustworthy human-centered AI systems. ACM Trans. Interact. Intell. Syst. TiiS 2020, 10, 1–31. [Google Scholar] [CrossRef]
Matheus, R.; Janssen, M.; Janowski, T. Design principles for creating digital transparency in government. Gov. Inf. Q. 2021, 38, 101550. [Google Scholar] [CrossRef]
Staples, E. How Do You Choose the Right Metrics for Your AI Evaluations? Available online: https://galileo.ai/blog/how-do-you-choose-the-right-metrics-for-your-ai-evaluations (accessed on 5 May 2025).
Lin, Z.; Guan, S.; Zhang, W.; Zhang, H.; Li, Y.; Zhang, H. Towards trustworthy LLMs: A review on debiasing and dehallucinating in large language models. Artif. Intell. Rev. 2024, 57, 243. [Google Scholar] [CrossRef]
Liu, Y.; Yao, Y.; Ton, J.F.; Zhang, X.; Guo, R.; Cheng, H.; Klochkov, Y.; Taufiq, M.F.; Li, H. Trustworthy llms: A survey and guideline for evaluating large language models’ alignment. arXiv 2023, arXiv:2308.05374. [Google Scholar]
Ahmed, A.; Hayat, H.; Daheem, H. Core Components of an AI Evaluation System. Available online: https://www.walturn.com/insights/core-components-of-an-ai-evaluation-system (accessed on 8 February 2025).
Balasubramaniam, N.; Kauppinen, M.; Rannisto, A.; Hiekkanen, K.; Kujala, S. Transparency and explainability of AI systems: From ethical guidelines to requirements. Inf. Softw. Technol. 2023, 159, 107197. [Google Scholar] [CrossRef]
Sobolik, T.; Subramanian, S. Building an LLM Evaluation Framework: Best Practices. Available online: https://www.datadoghq.com/blog/llm-evaluation-framework-best-practices/ (accessed on 22 April 2025).
Mostajabdaveh, M.; Yu, T.T.L.; Dash, S.C.B.; Ramamonjison, R.; Byusa, J.S.; Carenini, G.; Zhou, Z.; Zhang, Y. Evaluating LLM Reasoning in the Operations Research Domain with ORQA. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 27 February–2 March 2025; Volume 39, pp. 24902–24910. [Google Scholar]
Kuersten, A. Prudently Evaluating Medical Adaptive Machine Learning Systems. Am. J. Bioeth. 2024, 24, 76–79. [Google Scholar] [CrossRef]
Tavasoli, A.; Sharbaf, M.; Madani, S.M. Responsible innovation: A strategic framework for financial llm integration. arXiv 2025, arXiv:2504.02165. [Google Scholar]
Peace, P.; Owens, A. AI-Enhanced Financial Control Systems and Metrics for Evaluating Reporting Accuracy and Efficiency. 2024. Available online: https://www.researchgate.net/profile/Emma-Oye/publication/388528481_AI-Enhanced_Financial_Control_Systems_and_Metrics_for_Evaluating_Reporting_Accuracy_and_Efficiency/links/679c1eb196e7fb48b9aaa69f/AI-Enhanced-Financial-Control-Systems-and-Metrics-for-Evaluating-Reporting-Accuracy-and-Efficiency.pdf (accessed on 30 June 2025).
Polonioli, A. Moving LLM Evaluation Forward: Lessons from Human Judgment Research. Front. Artif. Intell. 2025, 8, 1592399. [Google Scholar] [CrossRef] [PubMed]
Palumbo, G.; Carneiro, D.; Alves, V. Objective metrics for ethical AI: A systematic literature review. Int. J. Data Sci. Anal. 2024; in press. [Google Scholar] [CrossRef]
Bandi, A.; Adapa, P.V.S.R.; Kuchi, Y.E.V.P.K. The power of generative ai: A review of requirements, models, input–output formats, evaluation metrics, and challenges. Future Internet 2023, 15, 260. [Google Scholar] [CrossRef]
Microsoft. Observability in Generative AI. Available online: https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-metrics-built-in (accessed on 28 March 2025).
Iantovics, L.B.; Iakovidis, D.K.; Nechita, E. II-Learn—A Novel Metric for Measuring the Intelligence Increase and Evolution of Artificial Learning Systems. Int. J. Comput. Intell. Syst. 2019, 12, 1323–1338. [Google Scholar] [CrossRef]
Gursoy, F.; Kennedy, R.; Kakadiaris, I. A Critical Assessment of the Algorithmic Accountability Act of 2022. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4193199 (accessed on 30 June 2025).
Edwards, L. The EU AI Act: A Summary of Its Significance and Scope. 2021. Available online: https://www.adalovelaceinstitute.org/wp-content/uploads/2022/04/Expert-explainer-The-EU-AI-Act-11-April-2022.pdf (accessed on 30 June 2025).
Laux, J.; Wachter, S.; Mittelstadt, B. Trustworthy artificial intelligence and the European Union AI act: On the conflation of trustworthiness and acceptability of risk. Regul. Gov. 2024, 18, 3–32. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Components of AI trustworthiness: fairness, transparency, privacy, accountability, and security.

Figure 2. Key challenges in achieving trustworthy AI include the complexity and opacity of black-box models; issues of bias and fairness arising from flawed data and incorrect model assumptions; concerns around privacy and security, such as unauthorized access, data breaches, adversarial attacks, and model misuse; trade-offs among competing priorities, such as trustworthiness, model performance, and business needs; and the need for robust ethical governance and accountability.

Figure 3. An overview of AI trustworthiness metrics, including their definitions, purposes, and associated frameworks and applications.

Figure 4. Regulatory and ethical trends, such as the EU AI Act, New York City’s Int. 1894, the 2022 Algorithmic Accountability Act, and California’s AB 13.

Table 1. Comparison of AI trustworthiness standards and frameworks in terms of addressed problems, scope, and applicability.

Standard/Framework	Problems Addressed	Scope/Applicability
ISO 27000x Series [91]	Information security risks, data confidentiality, integrity, and availability in AI systems	Information security, data breaches; broad applicability across industries
ISO 31000:2018 [92]	Principles and guidelines for managing risk across AI development and deployment contexts	General risk management framework for AI risks
ISO/IEC 24028 [81]	Reliability, safety, and resilience of AI systems	Focused on AI-specific security threats and adversarial attacks
ISO/IEC 42001 [82]	AI management system standard to ensure ethical and responsible AI use in organizations	Governance and ethics for AI management systems
ISO/IEC 23894 [86]	AI-specific risk management, bias, explainability, and unintended consequences	Comprehensive framework for AI bias and fairness
ISO/IEC 4029-2:2023 [93]	Controls for privacy and data protection in AI systems	Robustness of neural networks and adversarial threats
ISO/IEC WD 27090 [93]	Cybersecurity guidelines specifically for AI systems	Technical cybersecurity practices for AI
ISO/IEC 27115 [94]	Digital evidence integrity and chain of custody in AI-related contexts	Data privacy and protection in AI systems
ETSI GR SAI Series [95]	Identifies potential security threats and mitigations in AI systems	Emphasis on ethical AI, fairness, transparency
NIST AI RMF [79]	Framework to ensure AI systems are trustworthy, secure, and value-aligned	Holistic risk management and fairness
MITRE ATLAS [96]	Maps known AI attacks and vulnerabilities	Adversarial tactics and AI threat modeling
ENISA Multi-layer Framework [97]	Cybersecurity best practices across the AI lifecycle	Security at all lifecycle stages
AI TMM [57]	Testing maturity model to assess and improve AI quality and trust	Explainability, privacy, and robustness in testing
EU Ethics Guidelines for Trustworthy AI [98]	Promotes fairness, accountability, transparency, and human oversight	Human-centered ethical AI practices
Themis 5.0 [99]	Evaluates algorithmic fairness and ethical/legal compliance in decision-making	Trustworthy technical decision support

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nastoska, A.; Jancheska, B.; Rizinski, M.; Trajanov, D. Evaluating Trustworthiness in AI: Risks, Metrics, and Applications Across Industries. Electronics 2025, 14, 2717. https://doi.org/10.3390/electronics14132717

AMA Style

Nastoska A, Jancheska B, Rizinski M, Trajanov D. Evaluating Trustworthiness in AI: Risks, Metrics, and Applications Across Industries. Electronics. 2025; 14(13):2717. https://doi.org/10.3390/electronics14132717

Chicago/Turabian Style

Nastoska, Aleksandra, Bojana Jancheska, Maryan Rizinski, and Dimitar Trajanov. 2025. "Evaluating Trustworthiness in AI: Risks, Metrics, and Applications Across Industries" Electronics 14, no. 13: 2717. https://doi.org/10.3390/electronics14132717

APA Style

Nastoska, A., Jancheska, B., Rizinski, M., & Trajanov, D. (2025). Evaluating Trustworthiness in AI: Risks, Metrics, and Applications Across Industries. Electronics, 14(13), 2717. https://doi.org/10.3390/electronics14132717

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating Trustworthiness in AI: Risks, Metrics, and Applications Across Industries

Abstract

1. Introduction

2. Overview of Trustworthy AI: Core Definitions and Attributes

2.1. Components of AI Trustworthiness

2.2. Importance Across Sectors

2.2.1. Finance

2.2.2. Healthcare

2.2.3. Public Administration

2.2.4. Autonomous Driving and Robotics

3. Challenges in Achieving Trustworthy AI

3.1. Black-Box Nature of AI Models

3.2. Bias and Fairness

3.3. Privacy and Security

3.4. Balancing Competing Trust Components in AI Systems

3.5. Accountability and Ethical Implications

3.6. Interactions and Trade-Offs Between Trustworthiness Metrics

4. Metrics for Evaluating Trustworthiness

4.1. Accuracy and Reliability

4.2. Robustness

4.3. Explainability and Transparency

4.4. Fairness

4.5. Privacy

5. Frameworks and Standards for Trustworthy AI

5.1. Overview of Major Frameworks

5.1.1. NIST AI Risk Management Framework

5.1.2. AI Trust Framework and Maturity Model (AI-TMM)

5.1.3. ISO/IEC Standards

5.1.4. KAIRI: Key Artificial Intelligence Risk Indicators

5.2. Comparative Analysis

5.3. Application of Frameworks in Real Scenarios

5.3.1. Usage of AI Regulation Frameworks in the US, Canada, and China

5.3.2. Surveillance Technology

5.3.3. Military Aviation

5.3.4. Credit Scoring

6. Applications of Trustworthiness Metrics: Case Studies

6.1. Financial Sector

6.1.1. SAFE AI in Finance

6.1.2. Handling Sensitive Financial Data

6.1.3. Trustworthy AI in Financial Risk Management

6.2. Healthcare

6.2.1. Out-of-Hospital Cardiar Arrest (OHCA) Detection

6.2.2. Prediction of Parkinson’s Disease Progression

6.3. Public Administration

6.3.1. Public Employment Services in Sweden

6.3.2. AI in Criminal Justice

6.3.3. The Robodebt System in Australia

6.3.4. The COMPAS Algorithm in the US

7. Emerging Trends and Future Directions in Trustworthy AI

7.1. Foundation Models and Trustworthiness Challenges

7.2. Trustworthiness Challenges and Evolving Evaluation Frameworks for Generative AI Systems

7.3. Regulatory and Ethical Trends

7.3.1. Il-Learn: A Novel Metric for Measuring Intelligence Evolution in Learning Systems

7.3.2. The EU AI Act

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI