Cross-Assessment & Verification for Evaluation (CAVe) Framework for AI Risk and Compliance Assessment Using a Cross-Compliance Index (CCI)
Abstract
1. Introduction
- Cross-framework indicator intersection and mapping. We systematically identify and map a common set of measurable indicators across the NIST AI Risk Management Framework, the EU AI Act, and ISO/IEC AI standards by analyzing their control objectives and technical requirements (Section 3).
- Policy-aware quantitative compliance scoring. We propose a unified quantitative evaluation mechanism that integrates metric normalization, framework-specific thresholds, and evidence-based penalty factors into a single CCI, enabling consistent comparison across heterogeneous regulatory frameworks (Section 3.2 and Section 3.3).
- Tunable evaluation reflecting regulatory priorities. We introduce a weighting scheme that allows the evaluation outcome to be adjusted according to the regulatory philosophy or domain context. The effects of framework-specific weights are incorporated into the final CCI computation and algorithmic evaluation process (Section 3.4).
- Empirical validation of cross-framework behavior. We validate the proposed framework through controlled experiments that demonstrate the effects of metric variation, threshold enforcement, and policy-driven weighting on the resulting CCI, confirming the interpretability and reproducibility of the evaluation model (Section 4).
2. Related Work
2.1. NIST AI RMF
2.2. EU AI Act
2.3. ISO/IEC 23894 and ISO/IEC 42001
2.4. Related Works on AI Risk Management and Governance
2.5. Recent Advances in AI Security and Privacy
3. Proposed Framework
3.1. Framework Structure
- Measure common metrics (accuracy, robustness, privacy, fairness) for each framework.
- Normalize each metric to produce comparable subscores.
- Aggregate the weighted normalized scores to compute the overall CCI, applying thresholds and veto rules for safety-critical validation.
- Privacy quantifies the system’s resistance to information leakage in response to growing data protection mandates; reflecting the diverse threat landscape that ranges from membership inference to attribute inference attacks [23,26], the framework adopts a metric specifically evaluating susceptibility to re-identification.
- Fairness serves as a critical social safeguard to prevent discriminatory outcomes; while recognizing that fairness is an inherently multifaceted concept with varying normative definitions [27], CAVe operationalizes it as a measurable, group-level performance disparity to ensure that compliance assessment remains both reproducible and objective.
3.2. Metric Normalization and Subscore Calculation
3.3. Evidence-Based Penalty and Framework Scoring
3.4. CAVe Algorithm and Final CCI Computation
- Raw metric values (normalized to inside the algorithm)
- Threshold parameters
- Internal requirement weights
- Cross-framework weights
- Evidence-based penalty factors
- Final CCI
- Assigned grade (Pass/Conditional/Fail)
| Algorithm 1 CAVe Algorithm for CCI Calculation |
| Require: Raw metrics , configs , frameworks , weights |
| Ensure: CCI score and grade |
| 1: for each metric k do |
| 2: Normalize according to and clip to |
| 3: end for |
| 4: for each requirement in framework f do |
| 5: Compute partial scores for |
| 6: Aggregate via AND/OR/W-AVG |
| 7: Apply evidence deduction |
| 8: end for |
| 9: ; |
| 10: Apply grading thresholds and veto rule |
4. Validation
4.1. Materials and Methods
- Datasets: The spam-classification experiment uses the UCI SMS Spam Collection dataset, which is released under the Creative Commons Attribution 4.0 (CC BY 4.0) license. The healthcare experiment uses the MIMIC-CXR dataset, which is distributed under the PhysioNet credentialed Data Use Agreement (DUA) and may be used only for approved research purposes. Both datasets were used in accordance with their respective licensing conditions.
- Model configuration: In the spam scenario, a Multinomial Naïve Bayes classifier was trained using a Bag-of-Words representation extracted from the SMS corpus. In the healthcare scenario, a CheXNet-style convolutional neural network was applied to chest X-ray images from the MIMIC-CXR dataset.
- Evaluation procedure: Each model was evaluated using the four indicators defined in the CAVe framework: accuracy, robustness, privacy, and fairness. Robustness was measured through input perturbation in the spam scenario and through distribution-shift evaluation in the healthcare scenario. Fairness was assessed based on group-level performance comparisons. Privacy risk was estimated using generalization-gap-based re-identification susceptibility and membership-inference tendencies.
- CCI computation: All raw measurements were normalized to the range [0, 1] following the procedure described in Section 3. Threshold scoring, evidence-based penalties, and framework-level weights were then applied to compute the final CCI for each experiment.
4.2. Spam Mail Filtering
4.3. Healthcare AI
5. Discussion
5.1. Experiment 1: Baseline Metric Sensitivity
5.2. Experiment 2: Threshold Sensitivity Analysis
5.3. Experiment 3: Weight Sensitivity Analysis
5.4. Integrated Interpretation
- Metric changes produce predictable, linear improvements in the CCI, reflecting the technical performance of the AI model.
- Threshold changes produce discontinuous drops, acting as an absolute safety filter that enforces minimum regulatory criteria.
- Weight changes tune the evaluation outcome according to the regulatory philosophy of a specific jurisdiction or industry.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| AIMS | Artificial Intelligence Management System |
| AI RMF | AI Risk Management Framework |
| AIRO | AI Risk Ontology |
| AUC | Area Under the Curve |
| AUROC | Area Under the Receiver Operating Characteristic |
| C2AIRA | Concrete and Connected AI Risk Assessment |
| CAVe | Cross-Assessment & Verification for Evaluation |
| CC BY | Creative Commons Attribution |
| CCI | Cross-Compliance Index |
| CV | Cross-Validation |
| DUA | Data Use Agreement |
| EO | Equal Opportunity |
| EU | European Union |
| F1 | F1 Score |
| FGSM | Fast Gradient Sign Method |
| FPR | False Positive Rate |
| GPAI | General-Purpose Artificial Intelligence |
| IEC | International Electrotechnical Commission |
| IID | Independent and Identically Distributed |
| ISO | International Organization for Standardization |
| LLM | Large Language Model |
| MTTR | Mean Time To Recovery |
| NIST | National Institute of Standards and Technology |
| OOD | Out-of-Distribution |
| PGD | Projected Gradient Descent |
| PII | Personally Identifiable Information |
| PMM | Post-Market Monitoring |
| SAI | Structured AI Indicators |
| SMS | Short Message Service |
| SPD | Statistical Parity Difference |
| TPR | True Positive Rate |
| UCI | University of California, Irvine |
| UML | Unified Modeling Language |
| XAI | Explainable AI |
References
- European Parliament. EU AI Act: First Regulation on Artificial Intelligence; European Parliament: Strasbourg, France, 2023.
- ISO/IEC 42001:2023; Information Technology—Artificial Intelligence—Management System. ISO/IEC: Geneva, Switzerland, 2023. Available online: https://www.iso.org/standard/81230.html (accessed on 8 January 2026).
- Vassilev, A.; Oprea, A.; Fordyce, A.; Anderson, H. Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. In NIST Trustworthy and Responsible AI NIST AI 100-2e2023; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2024. [Google Scholar] [CrossRef]
- Shrestha, S.; Banda, C.; Mishra, A.K.; Djebbar, F.; Puthal, D. Investigation of Cybersecurity Bottlenecks of AI Agents in Industrial Automation. Computers 2025, 14, 456. [Google Scholar] [CrossRef]
- National Institute of Standards and Technology. Artificial Intelligence Risk Management Framework (AI RMF 1.0); National Institute of Standards and Technology: Gaithersburg, MD, USA, 2023; p. 100-1. Available online: https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf (accessed on 8 January 2026).
- ISO/IEC 23894:2023; Information Technology—Artificial Intelligence—Guidance on Risk Management. ISO/IEC: Geneva, Switzerland, 2023. Available online: https://www.iso.org/standard/77304.html (accessed on 8 January 2026).
- Veale, M.; Zuiderveen Borgesius, F. Demystifying the Draft EU Artificial Intelligence Act. arXiv 2021, arXiv:2107.03721. [Google Scholar]
- Hacker, P.; Engel, A.; Mauer, M. Regulating ChatGPT and other Large Generative AI Models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23), Chicago, IL, USA, 12–15 June 2023. [Google Scholar] [CrossRef]
- Brundage, M.; Avin, S.; Wang, J.; Belfield, H.; Krueger, G.; Hadfield, G.; Khlaaf, H.; Yang, J.; Toner, H.; Fong, R.; et al. Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. arXiv 2020, arXiv:2004.07213. [Google Scholar] [CrossRef]
- Morley, J.; Floridi, L.; Kinsey, L.; Elhalal, B. An Initial Review of Publicly Available AI Ethics Tools, Methods and Research to Translate Principles into Practices. Sci. Eng. Ethics 2020, 26, 2141–2168. [Google Scholar] [CrossRef] [PubMed]
- Schiff, D.; Rakova, B.; Ayesh, A.; Fanti, A.; Lennon, M. Explaining the Principles to Practices Gap in AI. IEEE Technol. Soc. Mag. 2021, 40, 81–94. [Google Scholar] [CrossRef]
- National Institute of Standards and Technology. Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile; NIST Trustworthy and Responsible AI: Gaithersburg, MD, USA, 2024.
- Barrett, A.M.; Hendrycks, D.; Newman, J.; Nonnecke, B. Actionable guidance for high-consequence AI risk management: Towards standards addressing AI catastrophic risks. arXiv 2022, arXiv:2206.08966. [Google Scholar]
- Smith, G.; Stanley, K.D.; Marcinek, K.; Cormarie, P.; Gunashekar, S. General-Purpose Artificial Intelligence (GPAI) Models and GPAI Models with Systemic Risk: Classification and Requirements for Providers; RAND: Arlington, VA, USA, 2024. [Google Scholar]
- Simonetta, A.; Paoletti, M.C. ISO/IEC Standards and Design of an Artificial Intelligence System; CEUR: Aachen, Germany, 2024. [Google Scholar]
- Boza, P.; Evgeniou, T. Implementing AI Principles: Frameworks, Processes, and Tools. INSEAD Work. Pap. 2021. [Google Scholar] [CrossRef]
- Golpayegani, D.; Harshvardhan, J.; Pandit, D.L. AIRO: An Ontology for Representing AI Risks Based on the Proposed EU AI Act and ISO Risk Management Standards. Semantic Web 2022, 55, 51–65. [Google Scholar] [CrossRef]
- Xia, B.; Lu, Q.; Perera, H.; Zhu, L.; Xing, Z.; Liu, Y.; Whittle, J. Towards Concrete and Connected AI Risk Assessment (C2AIRA): A Systematic Mapping Study. arXiv 2023, arXiv:2301.11616. [Google Scholar]
- Karras, D.A. On Modelling a Reliable Framework for Responsible and Ethical AI in Digitalization and Automation: Advancements and Challenges. WSEAS Trans. Financ. Eng. 2025, 3, 333–350. [Google Scholar] [CrossRef]
- Cui, Q.; You, X.; Wei, N.; Nan, G.; Zhang, X.; Zhang, J.; Lyu, X.; Ai, M.; Tao, X.; Feng, Z.; et al. Overview of AI and communication for 6G network: Fundamentals, challenges, and future research opportunities. Sci. China Inf. Sci. 2025, 68, 171301. [Google Scholar] [CrossRef]
- Yuan, S.; Xu, G.; Li, H.; Zhang, R.; Qian, X.; Jiang, W.; Cao, H.; Zhao, Q. FIGhost: Fluorescent Ink-based Stealthy and Flexible Backdoor Attacks on Physical Traffic Sign Recognition. arXiv 2025, arXiv:2505.12045. [Google Scholar] [CrossRef]
- Zhou, Y.; Ni, T.; Lee, W.B.; Zhao, Q. A Survey on Backdoor Threats in Large Language Models (LLMs): Attacks, Defenses, and Evaluations. arXiv 2025, arXiv:2502.05224. [Google Scholar] [CrossRef]
- Choquette-Choo, C.A.; Tramer, F.; Carlini, N.; Papernot, N. Label-Only Membership Inference Attacks. arXiv 2021, arXiv:2007.14321. [Google Scholar] [CrossRef]
- Upreti, R.; Pedro, G.; Lind, A.E.; Yazidi, A. Security and privacy in large language models: A survey. Int. J. Inf. Secur. 2024, 23, 2287–2314. [Google Scholar] [CrossRef]
- Nastoska, A.; Jancheska, B.; Rizinski, M.; Trajanov, D. Evaluating Trustworthiness in AI: Risks, Metrics, and Applications Across Industries. Electronics 2025, 14, 2717. [Google Scholar] [CrossRef]
- Hu, H.; Salcic, Z.; Sun, L.; Dobbie, G.; Yu, P.S.; Zhang, X. Membership Inference Attacks on Machine Learning: A Survey. arXiv 2022, arXiv:2103.07853. [Google Scholar] [CrossRef]
- Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A Survey on Bias and Fairness in Machine Learning. ACM Comput. Surv. 2021, 54, 115. [Google Scholar] [CrossRef]
- Raji, I.D.; Smart, A.; White, R.N.; Mitchell, M.; Gebru, T.; Hutchinson, B.; Smith-Loud, J.; Theron, D.; Barnes, P. Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAccT ’20), Barcelona, Spain, 27–30 January 2020. [Google Scholar] [CrossRef]
- Almeida, T.A.; Hidalgo, J.M.G.; Yamakami, A. Contributions to the study of SMS spam filtering: New collection and results. In Proceedings of the 11th ACM Symposium on Document Engineering, Mountain View, CA, USA, 19–22 September 2011; pp. 259–262. [Google Scholar]
- Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership Inference Attacks against Machine Learning Models. arXiv 2017, arXiv:1610.05820. [Google Scholar] [CrossRef]
- Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.; Shpanskaya, K.; et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv 2017, arXiv:1711.05225. [Google Scholar]
- Chen, H.; Alfred, M.; Brown, A.D.; Atinga, A.; Cohen, E. Intersection of Performance, Interpretability, and Fairness in Neural Prototype Tree for Chest X-Ray Pathology Detection: Algorithm Development and Validation Study. JMIR Form. Res. 2024, 8, e59045. [Google Scholar] [CrossRef] [PubMed]
- OECD. OECD Principles on Artificial Intelligence. 2019. Available online: https://oecd.ai/en/ai-principles (accessed on 8 January 2026).
- UNESCO. Recommendation on the Ethics of Artificial Intelligence. 2021. Available online: https://unesdoc.unesco.org/ark:/48223/pf0000380455 (accessed on 8 January 2026).




| Raw metric value for indicator k | |
| Normalized (safe) metric value in | |
| Threshold-based partial score | |
| Internal weight of requirement i for framework f | |
| Cross-framework weight | |
| Evidence-based penalty factor | |
| CCI | Cross-Compliance Index |
| Category | Indicator/Unit | Measurement Protocol and Data Source | Norm. Dir. | NIST | EU AI Act | ISO |
|---|---|---|---|---|---|---|
| Technical | ||||||
| Accuracy | Error rate, F1, AUC | IID validation set performance, k-fold CV | ↑ | ✓ | △ | ✓ |
| Robustness | OOD perf. drop (%), attack succ. rate (%) | Distribution-shift and adversarial tests (PGD/FGSM) | ↓ | ✓ | △ | ✓ |
| Security | Vulnerability count, MTTR, incidents | Security scan report, incident response log | ↓ | ✓ | △ | ✓ |
| Privacy | PII leakage risk ratio (%) | Membership inference, encryption/decryption audit | mixed | ✓ | ✓ | △ |
| Fairness | Max group gap (%), SPD/EO diff. | Groupwise performance comparison (balanced sample) | ↓ | ✓ | △ | ✓ |
| Transparency | Log coverage (%), XAI score | Mandatory event log coverage, XAI quality check | ↑ | △ | ✓ | ✓ |
| Regulatory | ||||||
| Human Oversight | Intervention rate, bypass fail rate | Human-in-the-loop stop/retry testing | ↑ | △ | ✓ | ✓ |
| Post-Market Monitoring | Alert sens./prec., report delay | Operational telemetry, PMM compliance rate | mixed | △ | ✓ | ✓ |
| Governance | ||||||
| Governance/Auditability | Doc. completeness, traceability (%) | Policy documentation, role trace logs, audits | ↑ | △ | △ | ✓ |
| Lifecycle Management | Re-eval. cycle, review rate | Stage-wise risk review, approval record (ISO 23894) | ↑ | △ | △ | ✓ |
| Metric k | Raw Value | Normalized |
|---|---|---|
| Accuracy | ||
| Robustness | ||
| Fairness | ||
| Privacy |
| Framework | Weight |
|---|---|
| NIST | |
| EU | |
| ISO |
| Metric k | Raw Value | Normalized |
|---|---|---|
| Accuracy | ||
| Robustness | ||
| Fairness | ||
| Privacy |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Min, C.-H.; Lee, D.-G.; Kwak, J. Cross-Assessment & Verification for Evaluation (CAVe) Framework for AI Risk and Compliance Assessment Using a Cross-Compliance Index (CCI). Electronics 2026, 15, 307. https://doi.org/10.3390/electronics15020307
Min C-H, Lee D-G, Kwak J. Cross-Assessment & Verification for Evaluation (CAVe) Framework for AI Risk and Compliance Assessment Using a Cross-Compliance Index (CCI). Electronics. 2026; 15(2):307. https://doi.org/10.3390/electronics15020307
Chicago/Turabian StyleMin, Cheon-Ho, Dae-Geun Lee, and Jin Kwak. 2026. "Cross-Assessment & Verification for Evaluation (CAVe) Framework for AI Risk and Compliance Assessment Using a Cross-Compliance Index (CCI)" Electronics 15, no. 2: 307. https://doi.org/10.3390/electronics15020307
APA StyleMin, C.-H., Lee, D.-G., & Kwak, J. (2026). Cross-Assessment & Verification for Evaluation (CAVe) Framework for AI Risk and Compliance Assessment Using a Cross-Compliance Index (CCI). Electronics, 15(2), 307. https://doi.org/10.3390/electronics15020307

