1. Introduction
System engineering plays a vital role in informing the design of systems that can effectively respond to unprecedented and unimagined disruptions. Risk, safety, security, trust, and resilience programs are implemented to address the scope, allocation of resources, and evaluation of these complex systems. The conventional risk definition is a hallmark of medical statistics and epidemiology, as mentioned by [
1], and the concept of risk as a disruptive event is expressed in other contexts, e.g., in immunity [
2], or more general, in technology [
3]. Studies by [
4,
5,
6] focus on addressing the challenges associated with risk management [
7,
8,
9], safety assurance, security measures, and resilience strategies within these systems. By incorporating system modeling and engineering approaches, organizations can better understand and navigate the evolving priority orders of complex systems, enabling them to adapt and respond effectively to disruptions and ensure the robustness and effectiveness of their operations. With the continuous progress of science in the healthcare and medical sectors, there is an increasing need to enhance services provided to users. This growing demand has led to the adoption of advanced technologies, including artificial intelligence (AI) [
10], to meet the surge in requirements. AI has revolutionized healthcare by advancing the state of the art for diagnoses [
11,
12,
13], treatments, disease prevention, and surgical devices. AI valuation in the European healthcare market exceeds USD 1.15 billion in 2020 and is expected to grow more than 44.2% through 2027 [
14]. AI in healthcare has the potential to significantly improve outcomes [
15] and reduce procedure time and costs [
16].
Utilization of AI in healthcare faces many challenges and risks. There is a particular concern regarding risks related to applications of AI and machine learning. AI should be valid, reliable, safe, fair [
17], unbiased [
18], secure, resilient, explainable, interpretable [
19], accountable, and transparent [
20,
21,
22].
The National Institute of Standards and Technology Artificial Intelligence Risk Management Framework (NIST AI RMF), published in 2023, addresses risks in designing, developing, using, and evaluating AI systems and products [
22]. The framework discusses the requirements for trustworthy AI applications [
23,
24]. NIST proposes aspects of trustworthy AI systems and describes how these systems need to be responsive to multiple criteria in order to address AI risks [
22]. NIST AI RMF states that trustworthy AI is
safe,
secure and resilient,
explainable and interpretable,
privacy enhanced,
fair (with harmful bias managed),
accountable and transparent,
valid and reliable [
22].
The NIST framework provides guidance for addressing risks in the design, development, use, and evaluation of AI systems [
25] to ensure their trustworthiness. However, this paper identifies the need for further risk analysis to facilitate the widespread adoption of the NIST framework by organizations.
In order to complement existing systems models of purpose, structure, and function, there is a need for system modeling that focuses on evolving priority orders of complex systems. These priority orders encompass various elements such as assets, policies, investments, organizational units, locations, personnel, and more. Technological advancements, environmental factors, missions, obsolescence, regulations, behaviors, markets, human migrations, conflicts, and other influences disrupt these priority orders.
This paper develops a risk analysis of artificial intelligence in medicine with a multilayer concept of system order using a principled methodology to account for the scenarios that are most and least disruptive to these orders.
Figure 1 shows that in system modeling, the defining characteristic layers of any system
are purpose (in some of the literature,
purpose is also referred to as behavior [
26,
27,
28,
29])
(,
Pi),
structure (on some of the literature,
structure is also referred to as elements or components [
26,
27,
28,
29])
(σ,
Sig),
function (on some of the literature,
function is also referred to as process or operations [
26,
27,
28,
29])
(ϕ,
Phi),
interconnections (ɩ,
Iot),
environment (ε,
Eps) and
boundary (β,
Bet) [
26,
27,
28,
29]. The Greek alphabet is employed to facilitate fluent reading and enhance annotations throughout the paper. Other studies may find additional layers for the AI risk management analysis. The scope of the paper is limited to the
purpose (Pi),
structure (Sig),
and function (Phi) characteristic layers. The
purpose (Pi) layer examines the goals and objectives of the system. The
structure (Sig) layer examines the components of the system. The
function (Phi) layer focuses on the specific tasks and processes that the system performs [
30].
A risk assessment of AI tools is a major challenge, especially as the most recent generation of AI tools has extremely broad applicability. That is, the design and use cases for AI are constantly evolving. Three main scenario-based preference models are developed for three healthcare system: 1. Healthcare centers or clinics as a higher-level systems (purpose (Pi) layer). 2. Medical implants or devices (structure (Sig) layer). 3. Disease diagnosis, more specifically the diagnosis of cardiac sarcoidosis disease (function (Phi) layer). Trustworthiness in the context of AI in healthcare should be considered for various stakeholders, including AI developers, healthcare clinicians, and patients. This is distributed across three primary layers: insider, internal, and external layers, respectively. The scope of this study is focused on internal trustworthiness, addressing the relationship between AI providers and AI users. The AI users within the healthcare context are categorized across these three layers:
Purpose (Pi) layer: This layer focuses on the objectives and overall goal of the system and includes the strategic and operational objectives of the systems. This includes domain experts in healthcare, such as health center board members and clinicians responsible for the operation of a clinic section.
Structure (Sig) layer: This layer includes the physical framework of the system, which could resemble physical medical devices. These are the device developers and designers involved in the implementation of AI in healthcare.
Function (Phi) layer: This layer includes a specific operation or a task defined and performed by medical professionals, such as disease diagnosis. These are physicians specializing in radiology and cardiology, contributing to the functional aspects of AI applications in healthcare.
Interconnections (Iot) layer: This layer shows the interactions and connectivity of medical components together.
Environment (Eps) layer: This layer includes any external factors or environments that could affect the medical system outside its boundary.
Boundary (Bet) layer: This layer defines the limits of the medical system and the system’s scope. This layer distinguishes the medical system from its external environment (Eps).
The innovation comprises three aspects. The contribution to “theory and philosophy” is the introduction of systems organized in layers, utilizing a multi-layer system approach to account for disruptive scenarios and the disruption of system order [
31]. This innovation acknowledges and addresses risks and disruptions occurring across multiple layers. The innovation to “methods” involves offering detailed rubrics to elaborate on and execute the steps within the risk register [
32]. This paper contributes to the “application” domain by applying layer disruption scenario analysis, specifically in healthcare and medicine applications. This paper develops a multi-layer scenario-based [
33,
34] preference risk register for deploying AI in complex engineering systems, building on top of NIST AI RMF aspects.
Initiatives, success criteria, emergent conditions, and scenarios are introduced for each layer as the main components of the risk analysis. [
31]. One challenge when integrating AI-based decision-making tools into medicine is the ability to generalize effectively when applied across various sectors with diverse patient populations across varied initiatives and disruptive scenarios. The framework contributes to systems engineering by addressing various research gaps in the System Engineering Body of Knowledge (SEBoK) related to AI risk management [
28]. This work shows how responsible AI could benefit a variety of engineering systems and reduce the risks in the systems. The framework guides and shapes the AI R&D portfolio by highlighting the most and least disruptive scenarios to the enterprise and monitoring and evaluating the trustworthiness of the AI implemented in the system. Practitioners will better understand how to implement AI to enhance object designs and mitigate AI risk applications and uncertainties, as well as the general topic of what methods systems can employ to set precise boundaries for AI activities and how to establish ethical, legal, societal, and technological boundaries for AI activity by quantifying risk as the disruption of the order of AI initiatives in healthcare systems.
2. Materials and Methods
This section describes an elicitation of scenario-based preferences [
35,
36] that aids in identifying system initiatives, criteria, emergent conditions, and scenarios.
Figure 2 describes the conceptual diagram of the risk assessment methodology, and
Figure 3 describes the conceptual diagram of systems modeling for enterprise risk management of AI in healthcare. The figure describes the following four steps:
1. System modeling and scenario generator, which could include techniques customized for each case study, such as Shapley additive value, digital twins, eXplainable AI (XAI) techniques, etc.
2. The multicriteria decision analysis (MCDA) risk resister tool is used to analyze risks according to the system order.
3. The three system characteristics reviewed in this paper are Purpose (Pi), Structure (Sig), and Function (Phi) layers.
4. Case studies.
Each step will be explained in detail in the following sections.
The first step of the framework develops success criteria to measure the performance of investment initiatives based on the system objectives. Success criteria are mainly derived from research of technological analyses, literature reviews, and expert opinions describing the goals of the system. Any changes in success criteria affect expectations of success and represent the values of the stakeholder. The set of success criteria is defined as {c.01, c.02, …, c.m}.
As this framework is based on the NIST AI RMF, the success criteria for all three layers—AI trustworthy in healthcare systems or
purpose (Pi), AI trustworthy in medical implants/devices or
structure (Sig), and AI trustworthy in disease diagnosis or
function (Phi)—are established using the seven aspects of trustworthy AI systems. By leveraging this foundation, the framework ensures comprehensive risk analysis by considering the criteria of trustworthiness across the different AI in healthcare application areas.
Table 1 shows the seven aspects of the NIST AI RMF:
c.01—safe;
c.02—secure and resilient;
c.03—explainable and interpretable;
c.04—privacy enhanced;
c.05—fair (with harmful bias managed);
c.06—accountable and transparent;
c.07—valid and reliable.Initiatives are the second element of the model, and they represent a set of decision-making alternatives. These can take the form of technologies, policies, assets, projects, or other investments [
6]. Initiatives are represented by the set {x.01, x.02, …, x.n}. Initiatives are identified by elicitation from stakeholders and experts to determine what components, actions, assets, organizational units, policies, locations, and/or allocations of resources constitute the system [
31].
The third element, emergent conditions, are events, trends, or other factors impacting decision-maker priorities in future planning contexts. Karvetski and Lambert [
37,
38] identify “emergent and future conditions” as individual trends or events that can impact decision-making and strategy in some way. These conditions are combined to create unique scenarios. Uncertainties in emergent conditions are a significant contributor to project failure and impact the ability of the system to meet success criteria. The set of emergent conditions is {e.01, e.02, …, e.k}. In the model, emergent conditions influence the relevance weights of individual success criteria.
The baseline relevance of criteria is established by interviewing stakeholders, and they are scored low, medium, and high. Based on this determination, the baseline weights are assigned to each of the success criteria.
Scenarios comprise one or more emergent conditions. The set of scenarios is defined as {
s.01,
s.02, …, s.p}. Scenarios are potential events that may disrupt priority orders. It is important to clarify that scenarios do not serve as predictions for future conditions and do not include any indication of the likelihood of occurrence. Instead, scenarios function as projections, designed to investigate the impacts of potential future states. Additionally, emergent conditions and scenarios do not aim to catalog every conceivable future state or disruption. Instead, they concentrate on addressing the specific concerns of system owners and experts, such as those in the medical field, that have been introduced earlier in the analysis [
31].
Experts in three layers were engaged in the process of identifying success criteria, initiatives, emergent conditions, scenarios, criteria-initiative assessment, criteria-scenario relevance, and baseline relevance. The experts for the purpose (Pi) layer are the board members of Binagostar eye surgical hospital. Three interviews were conducted with the board members of Binagostar Eye Surgical Hospital.
In the criteria-initiative assessment, experts and stakeholders were asked to what degree they agree that “initiative x.i addresses criterion c.j”. Neutral entries are represented by a dash (-); somewhat agree is represented by an unfilled circle (○); agree is represented by a half-filled circle (◐); and strongly agree is represented by a filled circle (●) in the matrix with the set of numerical weights of {0, 0.334, 0.667, 1}, respectively.
The qualitative results of the project constraint matrix can be converted into numerical weights [
39,
40] following a rank-sum weighting method [
41] based on Equation (1):
where w
j is the weight of the j-th criterion, m is the total number of criteria, and rank
j is the ordinal rank of the j-th criterion [
37].
The effect of disruptive emergent conditions is operationalized through a change in the criteria weights. For each scenario, the user is asked to assess to what degree the relative importance of each criterion change given the scenario will occur [
42]. Responses include
decreased (D),
decreased somewhat (DS), no change,
increased somewhat (IS), and
increased (I). These changes are recorded in the
W matrix. In Equation (2),
is a scaling constant that is equal to {8, 6, 1, 1/6, 1/8} for
increases,
increases somewhat,
no change,
decreases somewhat,
and decreases, respectively. The scaling constant is intended to be consistent with the swing weighting rationale. The swing weight technique accommodates adjustments for the additional scenarios. The procedure for deriving weights for an additive value function using the swing weight method is thoroughly documented in the MCDA literature, as evidenced by works such as those by Keeney and Raiffa (1979) [
40], Keeney (1992) [
43], Belton and Stewart (2002) [
44], and Clemen and Reilly (2001) [
45]. The justification for swing weighting is explained by Karvetski and Lambert as follows: α serves as a value multiplier, adjusting the trade-off between exchanging a high level of performance for a low level of performance in one criterion and an exchange of a low level of performance for a high level of performance in another criterion [
37,
38]. The swing weight technique was adopted to derive the baseline criteria weights (w
j), as well as the adjusted weights for each scenario [
38].
The initiatives are prioritized with a linear additive value function, defined in Equation (3). v
j(x.i) is the partial value function of initiative x.i along with criterion c.j, which is defined using the criteria-initiative (C-I) assessment.
V is a matrix that contains the relative importance scores for each initiative across each scenario, and v
k(x.i) is the change for initiatives across each scenario.
The disruptiveness score is defined based on the sum of the squared differences between the baseline rank and the disrupted rank of each initiative for each scenario. The disruptiveness score is used to understand the effect of emergent conditions on the prioritization of initiatives. Equation (4) shows the disruptiveness score for scenario s.k.
r
ik is the rank of initiative x.i under scenario s.k and r
i0 is the rank of the initiative x.i under the baseline scenario (
s.00) [
46]. Then, the scores are normalized to be in the scale of 0–100.
This paper shows the proposed theory and method of system modeling for enterprise risk management of AI in healthcare. The method comprises four steps, including the following: 1. System modeling and scenario generator. 2. Analyzing risks to system order. 3. System characteristics. 4. Case studies.
In the next section, the method is demonstrated in three layers: purpose (Pi), structure (Sig), and function (Phi).
4. Discussion
The novelty of this paper lies in the degree of disruption of the order, focusing on AI in healthcare [
53]. The relationship is a complex and multi-expertise enterprise. Moreover, this paper contextualizes the possible and actual implications of AI by introducing a method to quantify risk as the disruption of the order of AI initiatives of healthcare systems, with the aim of finding the scenarios that are most and least disruptive to system order. This novel approach studies scenarios that bring about a re-ordering of initiatives in each of the following three characteristic layers:
purpose,
structure, and
function. The scoring tool is consistent with the recent literature [
6,
31,
32,
37].
Table 14 and
Table 15 suggest that the topic of the scenarios should be used to describe the scope of the tentative project, which shapes and guides the input of the R&D portfolio. This information allows investors and R&D managers to make informed decisions regarding resource allocation. Specifically, they can focus their investments on the most critical initiatives related to the risk analysis of AI in healthcare applications, as outlined in
Table 15. For instance,
x.Phi.29,
demonstrate validity or generalizability beyond the training conditions, is one of the most important initiatives and trustworthy formal recommendations for controlling AI risks in the
function layer. Additionally, they can consider the various scenarios presented in
Table 14, ranging from the most disruptive to the least disruptive. The study recommends the following methods for user education about safe AI usage based on the results in
Table 14: informing users about why and how the benefits of using the AI system outweigh its risks compared to other technologies on the market, convincing clinicians that specific AI system outcomes are safe, providing information to users on what data to use for training, validating, and testing AI models, including potential changes due to various input data, highlighting that AI systems may require more frequent maintenance and triggers for corrective maintenance due to data, model, or concept drift, demonstrating the validity or generalizability of AI systems beyond the training conditions, emphasizing the closeness of results of estimates, observations, and computations to the ground truth (true values), and advocating for responsible AI system design, development, and deployment practices. This analysis enables the identification of new topics that warrant additional resources and time, with the goal of improving the overall success of the system. For instance,
Table 14 highlights scenario
s.06,
non-interpretable AI and lack of human–AI communications, as the most disruptive scenario across all three layers of healthcare systems. Although the results from this pilot must be interpreted with caution and validated in a larger sample, this observation is consistent with the findings of [
54,
55], which indicate that AI transparency solutions primarily target domain experts. Given the emphasis on “high-stakes” AI systems, particularly in healthcare, this inclination is reasonable. It is vital to consider that daily-based tasks that involve AI are not as important for assessing the risks of AI in the domain, such as suggested movies in online streaming or suggesting other items in online shopping systems. Optimizing trustworthy AI properties is recommended in situations where high-stakes environments, such as healthcare, and scenarios involving the handling of sensitive and private data of individuals are present. Another observation is that risks of AI should be context-based [
55] and it should consider all the participants and stakeholders in the study for more comprehensive findings. One explanation does not fit all [
56]. Moreover, having a human in the loop [
57] is important for AI prediction verification and to facilitate effective collaboration and partnership between humans and AI.
In healthcare, AI is typically used by experts as a decision-support system. Consequently, the development of solutions prioritizes the needs and requirements of these knowledgeable professionals. Recognizing this context, it becomes evident that addressing the issues of non-interpretable AI and a lack of human–AI communications is crucial within healthcare systems. This is essential not only to ensure patient safety but also to foster trust, consider ethical implications, promote continuous learning, and ensure compliance with legal and regulatory frameworks. The implementation of artificial intelligence in healthcare comes with more human risks than in other sectors due to its unique capacity to directly impact quality of care and healthcare outcomes.
There are some methods that are advised for confirming the efficacy of AI systems after training the dataset, such as confusion matrix analysis, using XAI techniques, having the experts in the loop to validate the outcome, continuous iteration and training monitoring, validation and testing assessments, bias and fairness assessments, and more. Fairness and bias are critical issues to understand and assess in AI that is either applied or used in the healthcare sector. For example, AI requires large, robust “training” databases, but many of the databases used for healthcare and medical datasets are limited. These datasets can perpetuate biases that exist in society and cause further health disparities and inequities [
58,
59,
60]. It is critical to have a clear understanding of possible biases that could exist in AI systems, as well as how choosing specific outcome variables and labels can impact predictions [
61]. Moreover, studies have found that patients have concerns related to AI use in healthcare, including threats to patient choice, increased costs of healthcare, patient privacy and data security, and biases in the data sources used to train AI [
62,
63]. Successful use and implementation of AI in healthcare settings will require a thoughtful understanding of social determinants of health, health equity, and ethics. The data used in the study were collected in a manner that safeguards the privacy rights of individuals by implementing robust data collection measures, such as data quality assessments and validations by experts, standard data collection procedures, clinic data security measures, and more. Improving data management procedures, including metadata documentation, collecting, cleansing, and validation, is crucial for ensuring the quality, reliability, and usefulness of data. Integrating new software into an existing system requires careful planning to ensure compatibility, compliance with regulations, and a positive user experience by training on balanced datasets, performing risk analysis and assessment to find potential abnormalities in the dataset, enhancing data protection, and more.
The necessity of AI interpretability and human–AI communications in everyday contexts for end users remains poorly understood. The existing research on this topic is limited, but the available findings suggest that this form of transparency may not be significant to users in their everyday experiences [
54]. By prioritizing the most important initiatives and investing in mitigating the most disruptive scenarios in the system, the full potential of AI will be unlocked while responsibly integrating it into healthcare practices, benefiting both patients and the healthcare industry as a whole.
The methods of this paper serve as a demonstration, and they emphasize the constraints associated with each disruptive scenario in tandem with the partial consideration of system layers. This paper serves as a means to enhance transparency. By involving patients and care partners, it mitigates the risks of bias and unintended adverse consequences in AI applications within healthcare systems. The scope of initiatives and emerging conditions extends beyond the aforementioned lists and will be further elaborated upon. While this paper primarily focuses on socioeconomic status, it is important to note that future endeavors will encompass other demographic factors linked to health disparities, such as race/ethnicity, sexual orientation, geographic location, and disability status. As an extension to this paper, the study by [
32] demonstrated that developing plans with diverse participants in terms of expertise, aptitude, and background changes the most and least disruptive scenarios in the system.
The upcoming interviews will encompass patients, care partners, and community-based organizations that work with populations affected by health disparities. It is crucial to recognize that individuals, including patients, caregiving partners, and community entities, are assuming increasingly important roles. These entities are acknowledged as authoritative sources due to their personal experiences, a form of knowledge gaining equitable recognition in various national contexts. Consequently, their involvement is vital across all stages, starting from the initial conceptualization of AI application goals in healthcare.
The method is well suited for use by healthcare professionals [
53] who lack the background necessary to comprehend and employ more complex methodologies that capture the intricacies of artificial intelligence. This argument acknowledges some of the limitations of the method and provides a clear explanation of why these limitations render it fit for its intended purpose.
The advantage of ordinal over cardinal ratings marks an improvement in ease of elicitation. The ratings in this paper are used as a measurement scale and are not vulnerable to ordinal disadvantages. Ref. [
64] points out the subjectivity, loss of granularity, and challenges in prioritization associated with these matrices. Ref. [
64] suggests the need for more robust, data-driven approaches to improve the accuracy and reliability of risk assessments through methods such as probabilistic risk assessment (PRA), Bayesian networks, or other quantitative methods [
64,
65]. To overcome this challenge, Krisper introduces different kinds of distributions, both numerically and graphically. Some common distributions of ranks are
linear,
logarithmic,
normally distributed (Gaussian), and
arbitrary (fitted) [
66]. For instance, for each scenario in this paper,
linear distributions of ranks were used. That is, the scales split a value range into equally distributed ranges of {8, 6, 1, 1/6, 1/8}.
As detailed in the Methods and Demonstration section, the disruptiveness (Dk) of scenario sk is calculated as the sum of the squared differences in priority for each initiative when compared to the baseline scenario. These scores are then normalized within the range of 0 to 100 for easy comparison. It is crucial to interpret these results thoughtfully before engaging in further discussions on alternatives, including nonlinear combinations of statements within multi-criteria decision analysis frameworks. The interpretation should be undertaken by principals and managers, taking into account the context of different systems.
Rozell (2015) describes the challenges of using qualitative and semi-qualitative risk ranking systems. When time and resources are limited, obtaining a simple, fully quantitative risk assessment or an informal expert managerial review and judgment are considered better approaches [
67]. In this paper, expert managerial review and judgement are the core of the risk registers across all three layers.
The innovation of the paper is not in the scoring but rather in the measurement of risk via the disruptions of a system order using the scenarios. The readers are encouraged to select their own ways of ordering and re-ordering the initiatives. The identification of scenarios that most disrupt the system order helps healthcare professionals in the characterization of AI-related risks. This characterization occurs in parallel across various system layers:
purpose,
structure, and
function. The method contributes to the reduction of errors by offering a user-friendly interface that enhances accessibility and ease of use. It promotes adaptability, providing flexibility to accommodate diverse healthcare settings and contexts. This usability fosters increased engagement from both experts and stakeholders, facilitating a more inclusive and comprehensive analysis of AI-related risks [
68] within the healthcare sector.
As a scenario-based methodology, this study identified the least and most disruptive scenarios within the context of the identified scenarios, based on the available sources and data during the study. Limited access to additional data and documents, as well as restricted stakeholder engagement, are additional limitations. It is important to consider the potential for biases among stakeholders and experts during the interview process, given their diverse motivations. To mitigate any strategic or manipulative behavior that might affect the analysis results, conducting an investigation focused on identifying the most disruptive scenarios could be beneficial. The primary aim was not solely to aggregate stakeholder inputs but also to identify areas requiring further examination, preserving the unique influences of individual stakeholders.
5. Conclusions
This study focuses on research and development priorities for managing the risks associated with trustworthy AI in health applications [
69,
70]. The methodology serves as a demonstration, and it emphasizes the constraints associated with the chosen scenarios and the partial consideration of system layers. The methodology identifies success criteria, R&D initiatives, and emergent conditions across multiple layers of the healthcare system, including the
purpose (Pi) layer, implant/device or
structure (Sig) layer, and disease diagnosis or
function (Phi) layer. The success criteria are consistently applied across all layers of the study.
The core concept of the paper is not to make the judgments required by the model; instead, the focus is on measuring the disruptive order. In other words, the emphasis is on adapting a figure of merit to score the initiatives and rank them rather than performing a decision analysis.
This paper strikes a balance between the goals of AI, human rights, and societal values by considering the seven main characters of the NIST AI risk management framework as the main success criteria for all layers, while also involving a variety of perspectives, stakeholders, managers, and experts in each system layer in the process. By analyzing these initiatives, emergent conditions, and scenarios within the healthcare system layers, the study identifies the most and least disruptive scenarios based on stakeholder preferences [
6]. This information allows stakeholders and managers to make informed decisions regarding resource allocation and prioritize certain initiatives over others.
Figure 4,
Figure 6 and
Figure 8 illustrate the potential disruptions caused by non-interpretable AI and a lack of human–AI communications, which is in line with the research by [
71]. Conversely,
Figure 5,
Figure 7 and
Figure 9 emphasize the significant role of interpretable and explainable AI in the healthcare system [
72,
73]. As AI-based algorithms gain increasing attention and results in the healthcare sector, it becomes crucial to enhance their understandability for human users, as emphasized by [
74].
The initiatives outlined in this paper hold promise for improving communication and mitigating the risks associated with AI in healthcare applications, involving various stakeholders. Moving forward, it is crucial to incorporate the viewpoints of healthcare practitioners and patients who are directly impacted by these approaches.
By acknowledging the biases and perspectives of individuals and communities, the proposed scenarios can effectively capture the diverse weights assigned by different stakeholders [
39]. The matter of expert bias is of concern, not only in this context but also across the broader field. Various approaches could be employed to alleviate such biases. These methods include techniques such as simple averaging, assigning importance weights to experts, employing the Analytic Hierarchy Process (AHP), Fuzzy Analytic Hierarchy Process (FAHP), decomposing complex problems into multiple layers, and others. Stakeholders could be weighted in future efforts according to their level of expertise in the field.
Notably, the methods presented in this paper can offer patients valuable insights into the relevance of AI applications in their treatment plans, promoting transparency for both patients and caregivers. The initiatives and emergent conditions discussed in this study provide a foundation for future research, which will build upon these findings to delve deeper into the subject. Further investigations will expand the analysis to encompass additional layers, such as the boundary (Bet) that exists between patients and society. This expanded scope will explore the wider implications of AI in healthcare systems, shedding light on its impact on various aspects of society.
In summary, addressing the major challenge of risk assessments for AI tools, this paper introduces a context-specific approach to understanding the risks associated with AI, emphasizing that these risks cannot be universally applied. The proposed AI risk framework in this study recognizes this context within three layers of healthcare systems. It provides insights into quantifying risk by assessing the disturbance to the order of AI initiatives in healthcare systems. The objective is to identify scenarios, analyzing their impact on system order, and organizing them from the most to least disruptive. Additionally, this study highlights the significant role of humans in the loop in identifying the risks associated with AI in healthcare and evaluating and improving the suggestions and outcomes of AI systems.
There are additional components of an effective AI risk management framework that may guarantee the accuracy and consistency of outputs produced by AI. These include fostering diversity among participants [
32], identifying AI effects in terms of ethics [
75], law, society, and technology, seeking official guidelines from experts, considering various social values, enhancing and improving unbiased algorithms and data quality by prioritizing privacy and security, and regular maintenance of AI systems [
22]. Moreover, identifying and minimizing uncertainties and unexpected scenarios, adhering to ethical and legal standards, ensuring the correctness of AI outputs and predictions through various validation and assessment practices, such as employing Explainable AI (XAI) techniques [
76], ensuring human–AI teaming [
32] and collaboration, and optimizing AI features and performance during design and implementation, among other aspects, are more components of an effective AI risk management framework. Given different business sizes and resource availability, and based on the experience mentioned above, it is clear that there is a need and opportunity for each system principal to determine appropriate AI risk management frameworks.
There are many potential methods for identifying reliable and trustworthy formal guidance for AI risk management. Seeking government guidance and guidelines from officials, R&D findings from industry and academia, verifying compliance with standard and legal protocols, and more could be some of the sources for risk management with AI. There are several safeguards and security measures that can be implemented to ensure the dependability and error-free operation of AI systems, such as validating the results by engaging the patients, medical professionals, and system designers in the loop, identifying and mitigating the risks of uncertain scenarios to the system, regular monitoring, and updating/training the system to adhere to ethical and lawful standards and protocols.
The methods outlined in this paper hold potential for cross-domain applicability beyond the healthcare sector. They can be adapted and applied to diverse fields such as transportation, finance, design, risk analysis of quantum technologies in medicine, and more [
77]. By enhancing transparency and addressing the associated risks of AI, this research benefits not only healthcare systems globally but also various other applications and industries. The findings and insights gained from this study can inform and guide the development and implementation of AI systems in a wide range of domains, such as supply chains, disaster management, emergency response, and more, fostering responsible and effective use of this technology. In summary, one view of this work is that it concentrates the opinions and consensus of a few stakeholders and that the conclusions are limited to a specific topic. On the other hand, the method and its rubrics have general relevance to a variety of life science topics across medical diagnosis, epidemiology, pathology, pharmacology, toxicology, microbiology, immunology, and more.