When Models Fail: Trustworthy Anomaly Detection Under Distributional Drift via Dual-Layer Monitoring of Data and AI Behaviour

Miller, Tymoteusz; Durlik, Irmina

doi:10.3390/app16115293

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

When Models Fail: Trustworthy Anomaly Detection Under Distributional Drift via Dual-Layer Monitoring of Data and AI Behaviour

by

Tymoteusz Miller

^1,2

and

Irmina Durlik

^3,*

¹

Institute of Marine and Environmental Sciences, University of Szczecin, 70-364 Szczecin, Poland

²

Faculty of Data Science and Information, INTI International University, Nilai 71800, Malaysia

³

Faculty of Navigation, Maritime University of Szczecin, 70-500 Szczecin, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(11), 5293; https://doi.org/10.3390/app16115293

Submission received: 16 April 2026 / Revised: 12 May 2026 / Accepted: 20 May 2026 / Published: 25 May 2026

(This article belongs to the Special Issue AI Applications in the Maritime Sector)

Download Versions Notes

Abstract

Artificial intelligence (AI) plays an increasingly important role in maritime systems, enabling advanced monitoring, anomaly detection, and decision support. However, the reliability of such systems is challenged by distributional drift, which may significantly degrade model performance over time. While anomaly detection has been extensively studied in the context of data irregularities, considerably less attention has been devoted to detecting anomalies in AI model behaviour itself. In this study, we propose MARLIN-AD (Maritime AI Reliability and Learning Intelligence Network—Anomaly Detection), a dual-layer anomaly detection framework designed to jointly monitor anomalies in data streams and anomalies in model behaviour. The framework integrates data-centric detection methods with model-centric monitoring techniques, including distributional shift detection and prediction stability analysis, within a unified anomaly scoring mechanism. The evaluation is conducted using a fully controlled synthetic data generation process, enabling precise injection of anomalies and systematic simulation of distributional drift across multiple scenarios. Experimental results demonstrate a strong and consistent degradation of model performance under drift conditions. Statistical validation using non-parametric tests, permutation-based inference, and Bayesian bootstrap analysis confirms that the observed degradation is both statistically significant and practically meaningful. In particular, posterior distributions of performance differences indicate a near-zero probability that drifted configurations outperform the baseline model. The results highlight that model degradation under drift exhibits a consistent and structured pattern, reproducible across multiple independent random seeds. Furthermore, the study shows that model-centric monitoring provides the primary signal for detecting degradation—a finding corroborated by ablation analysis—while data-centric monitoring enhances interpretability and root-cause attribution. A pilot validation on publicly available Automatic Identification System (AIS) data from the Danish Maritime Authority confirms the applicability of the data-level component to real operational trajectories. The proposed framework contributes to the development of trustworthy AI systems by enabling comprehensive monitoring of both data integrity and model behaviour in dynamic environments.

Keywords: anomaly detection; distributional drift; concept drift; model monitoring; trustworthy AI; anomaly detection systems; machine learning reliability; drift detection; model degradation; AI system monitoring

Share and Cite

MDPI and ACS Style

Miller, T.; Durlik, I. When Models Fail: Trustworthy Anomaly Detection Under Distributional Drift via Dual-Layer Monitoring of Data and AI Behaviour. Appl. Sci. 2026, 16, 5293. https://doi.org/10.3390/app16115293

AMA Style

Miller T, Durlik I. When Models Fail: Trustworthy Anomaly Detection Under Distributional Drift via Dual-Layer Monitoring of Data and AI Behaviour. Applied Sciences. 2026; 16(11):5293. https://doi.org/10.3390/app16115293

Chicago/Turabian Style

Miller, Tymoteusz, and Irmina Durlik. 2026. "When Models Fail: Trustworthy Anomaly Detection Under Distributional Drift via Dual-Layer Monitoring of Data and AI Behaviour" Applied Sciences 16, no. 11: 5293. https://doi.org/10.3390/app16115293

APA Style

Miller, T., & Durlik, I. (2026). When Models Fail: Trustworthy Anomaly Detection Under Distributional Drift via Dual-Layer Monitoring of Data and AI Behaviour. Applied Sciences, 16(11), 5293. https://doi.org/10.3390/app16115293

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

When Models Fail: Trustworthy Anomaly Detection Under Distributional Drift via Dual-Layer Monitoring of Data and AI Behaviour

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI