Safety-Calibrated Out-of-Distribution Prediction via Contrastive Embeddings for Safety-Critical Systems

Aseeri, Ahmad O.

doi:10.3390/electronics15112408

Open AccessArticle

Safety-Calibrated Out-of-Distribution Prediction via Contrastive Embeddings for Safety-Critical Systems

by

Ahmad O. Aseeri

Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia

Electronics 2026, 15(11), 2408; https://doi.org/10.3390/electronics15112408

Submission received: 7 May 2026 / Revised: 28 May 2026 / Accepted: 28 May 2026 / Published: 1 June 2026

(This article belongs to the Special Issue Intelligent Optimization and Machine Learning in Power and Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

Trustworthy deployment of artificial intelligence in safety-critical systems requires accurate diagnosis of anticipated scenarios and reliable rejection of out-of-distribution (OOD) inputs that fall outside the modeled operational scope. Existing data-driven diagnostic models typically assume that test inputs are drawn from the training distribution or rely on heuristically tuned thresholds that lack enforceable safety guarantees. This article presents SCOPE (Safety-Calibrated Out-of-distribution Prediction via Contrastive Embeddings), a framework integrating supervised contrastive learning with split-conformal prediction to provide statistically grounded OOD rejection with finite-sample false-alarm control. SCOPE employs a causal residual convolutional encoder to map multivariate sensor streams into a hyperspherical embedding space with a compact, class-specific structure. A k-nearest-neighbor density nonconformity score, computed in the encoder embedding space, flags transients that occupy low-density regions relative to known accident manifolds; an ablation shows that this density score outperforms prototype distance, entropy, and conservative maximum fusion as well as a panel of standard OOD baselines (MSP, ODIN, energy, Mahalanobis, OpenMax, MC-dropout, and a reconstruction autoencoder). To support temporally evolving trajectories, SCOPE aggregates window-level scores under a monotone decision policy and performs trajectory-level conformal calibration, yielding distribution-free guarantees that bound the probability of falsely rejecting a known accident run. SCOPE is evaluated on the Nuclear Power Plant Accident Data (NPPAD) benchmark using high-openness splits that withhold entire accident families as unknowns, and all metrics are reported as mean ± standard deviation across multiple random seeds. Results demonstrate strong diagnostic accuracy on accepted trajectories, conservative false-alarm rates satisfying user-specified safety constraints across multiple operating points, and timely rejection of unseen accident mechanisms, making SCOPE suitable for deployment in safety-critical monitoring applications.

Keywords:

trustworthy AI; nuclear power safety; safety-critical systems; conformal prediction; supervised contrastive learning; causal representation learning; uncertainty quantification

1. Introduction

The increasing adoption of deep learning methods in safety-critical systems has raised concerns about reliability under distributional shifts [1]. In nuclear power plants, diagnostic models ingest high-dimensional, multivariate sensor streams to support decision-making, where overconfident errors can have severe operational consequences. Although modern time-series classifiers can achieve strong performance when test conditions match training data, their validity typically hinges on an implicit in-distribution assumption [2]. In practice, deployments may encounter unseen accident mechanisms, compound faults, rare transients, or configuration changes not represented in the training data. When a closed-set classifier is forced to assign a familiar label to an out-of-distribution (OOD) trajectory, it can mask unawareness with high-confidence misdiagnoses, thereby undermining trust in the monitoring pipeline.

Out-of-distribution recognition addresses this reliability gap by requiring a model to classify samples from known conditions while abstaining or rejecting inputs that do not conform to the training distribution. Recent studies on nuclear diagnosis have begun to incorporate OOD-style rejection mechanisms, including prototype-based approaches and post hoc methods [3,4,5,6]. While these works demonstrate that unknown states can often be detected without degrading in-distribution accuracy, their rejection rules are typically governed by heuristically chosen thresholds based on softmax confidence, reconstruction errors, or distance scores. Such thresholds lack an interpretable operational control parameter and do not provide auditable guarantees on the false-alarm risk, defined as the probability of incorrectly rejecting truly known and acceptable conditions. Furthermore, much of the literature evaluates diagnosis using static snapshots, whereas real-time monitoring is inherently sequential with decisions evolving over sliding windows as sensor evidence accumulates.

Conformal prediction offers a principled route to safety calibration by transforming nonconformity scores into statistically valid decision rules with finite-sample error control under exchangeability assumptions [7,8]. This property is attractive for safety-critical settings because it exposes an explicit parameter

α

that directly specifies an upper bound on the probability of incorrectly rejecting a truly known condition. However, applying conformal prediction to sequential sensor streams requires careful treatment of the calibration unit. Window-level calibration would require independence across overlapping windows, an assumption violated by construction in sliding-window deployment. Recent work on conformal inference under temporal dependence highlights both the practical importance and the methodological challenges of this regime [9,10].

This work introduces SCOPE, a framework that couples discriminative temporal representation learning with trajectory-level conformal calibration to obtain out-of-distribution rejection rules with operational meaning. SCOPE trains a causal residual one-dimensional convolutional encoder using a supervised contrastive objective [11] to map multivariate time windows into a hyperspherical embedding space [12] with a compact class-consistent structure for known accident families. Class prototypes on this hypersphere support known-class prediction, while a k-nearest-neighbor density score over the stored training embeddings serves as the nonconformity score, flagging transients that occupy sparsely populated regions of the known-accident manifold. A component ablation shows that this density score outperforms the cosine-based prototype distance, the normalized entropy, and their conservative-maximum fusion for open-set detection. To accommodate sequential monitoring, SCOPE aggregates window-level scores into a single trajectory-level score under a monotone rejection policy and applies split-conformal calibration at the run level, requiring only run-wise exchangeability rather than window-level independence. To the best of my knowledge, no prior data-driven approach to nuclear accident diagnosis provides trajectory-level statistical guarantees that bound the probability of falsely rejecting a truly known accident under a user-specified safety level. SCOPE is evaluated on the Nuclear Power Plant Accident Data (NPPAD) benchmark dataset [13] using a high-openness protocol that withholds entire accident families as unknowns, and both safety compliance and decision timeliness are assessed via run-level reliability and time-to-rejection analyses. The main contributions of this work can be summarized as follows:

Nuclear power plant accident diagnosis is formulated as an out-of-distribution prediction problem subject to explicit safety constraints, prioritizing statistical control of false-alarm rates over heuristic thresholding.
A causal, contrastive temporal representation framework is developed to structure multivariate sensor streams into compact, class-consistent manifolds on a hyperspherical embedding space.
A k-nearest-neighbor density nonconformity score is introduced; ablated against prototype distance, entropy, and conservative maximum fusion; and split-conformal calibration is performed at the trajectory level under a monotone rejection policy, providing finite-sample control over the probability that a known accident run is ever rejected as unknown.
SCOPE is evaluated on the NPPAD benchmark using a high-openness protocol, and reliability, per-family detection behavior, and time-to-rejection metrics are reported to characterize both safety compliance and operational responsiveness.

The remainder of this article is organized as follows. Section 2 reviews related work in nuclear accident diagnosis, out-of-distribution recognition, and conformal prediction. Section 3 presents the SCOPE methodology, including the causal temporal encoder, nonconformity scoring, and trajectory-level conformal calibration. Section 4 describes the experimental setup, dataset handling, and evaluation protocol. Section 5 reports the empirical results, including safety calibration, OOD detection performance, time-to-rejection analysis, and sensitivity studies. Finally, Section 6 concludes the paper and outlines directions for future research.

2. Related Works

2.1. Data-Driven Accident Diagnosis in Nuclear Power Plants

The increasing availability of high-fidelity simulation data has driven a shift from first-principles modeling to data-driven diagnosis in the nuclear domain. Deep learning architectures have demonstrated strong performance in classifying complex accident transients. Saeed et al. [14] applied deep neural networks to fault diagnosis with attention to overlapping signatures across severity levels. Lee et al. [15] demonstrated that deep convolutional neural networks outperform shallow classifiers for abnormality detection in nuclear power plants. She et al. [16] employed recurrent architectures to capture temporal dependencies in loss-of-coolant accident sequences. However, a common limitation in these works is the closed-world assumption, which presumes that all test samples belong to one of the training classes. When presented with an unknown accident type, such models are forced to assign a familiar label, potentially producing confident but incorrect diagnoses.

More recently, Transformer-based sequence models and time-series foundation models have advanced general-purpose temporal representation learning, and uncertainty-aware deep learning, such as Monte Carlo dropout and deep ensembles, has been used to flag unreliable predictions. In the nuclear setting, however, labeled accident realizations are scarce (on the order of

10^{2}

–

10^{3}

runs per benchmark), a regime in which the translation-invariance inductive bias of dilated causal convolutions, i.e., temporal convolutional networks (TCNs) [17], is typically more sample-efficient than attention learned from scratch. SCOPE accordingly adopts a causal residual CNN (a TCN-style encoder); rather than contrasting whole architectures, representative OOD scoring rules and uncertainty-aware baselines are compared on this common representation in Section 5.

2.2. Out-of-Distribution Detection and Open Set Recognition

To address the limitations of closed-set classifiers, out-of-distribution (OOD) detection has emerged as a critical capability for safety-aware systems. Classical approaches in the general machine learning literature include reconstruction-based methods using autoencoders or variational autoencoders, which assume that models will fail to reconstruct inputs from unseen distributions [18]. In the discriminative regime, maximum softmax probability (MSP) [2] and ODIN [19] utilize output logits to distinguish in-distribution from OOD samples, while energy-based scoring [20] provides a theoretically grounded alternative. OpenMax [3] adapts extreme value theory to model the tail distribution of penultimate-layer activations for open set recognition. Fort et al. [21] focus on near out-of-distribution (OOD) detection, noting that semantically close outliers are harder to detect than far-OOD cases, and show that large pre-trained transformer models substantially improve near-OOD performance while also exploring few-shot outlier exposure and zero-shot class-name cues to further enhance detection.

In the nuclear power domain, recent studies have begun adopting OOD and open-set recognition techniques. Li et al. [5] proposed a convolutional prototype learning network for open set fault diagnosis, using distance-based rejection to identify unknown conditions. Kim et al. [4] applied OpenMax-style methods to detect untrained accident scenarios in nuclear plant diagnosis. Zhou et al. [6] extended prototype learning with label masking for compound fault recognition under open set conditions. While these works demonstrate that unknown states can be detected without degrading in-distribution accuracy, their rejection rules rely on heuristically chosen thresholds tuned to maximize validation performance. Such thresholds lack an interpretable operational parameter and provide no mechanism to bound the false-alarm rate on known safety-critical conditions, rendering them difficult to certify for regulatory compliance. Several of these studies use the same NPPAD benchmark as the present work; however, they adopt differing known/unknown partitions, openness levels, and (often closed-set or window-level) evaluation protocols, so their reported figures are not directly comparable to those reported here. To enable a controlled comparison that isolates the scoring rule from the representation, representative OOD detectors are instead evaluated, including the OpenMax technique of Kim et al. [4], as post hoc scores on a common encoder (Section 5).

2.3. Contrastive Representation Learning

Standard cross-entropy training often yields embedding spaces in which class manifolds exhibit substantial overlap, thereby limiting the effectiveness of distance-based OOD detection. Supervised contrastive learning [11] addresses this by explicitly optimizing the embedding geometry, pulling samples from the same class together while pushing samples from different classes apart. Wang and Isola [12] provided a theoretical analysis showing that contrastive objectives promote alignment of positive pairs and uniformity on the hypersphere, yielding compact class clusters with well-separated decision boundaries. This geometric structure is particularly relevant for OOD detection, as it enlarges the low-density regions where unknown samples are likely to reside. More broadly, related efforts in other high-dimensional sensing domains similarly emphasize discriminative embedding and clustering for reliable pattern recognition; for instance, Antony Asir Daniel et al. [22] combine an enhanced affinity propagation clustering scheme with a modified extreme learning machine for segmentation and classification of hyperspectral imagery. Such work underscores the broad utility of well-structured feature spaces for separating classes in complex, high-dimensional measurements, a principle that SCOPE leverages through its hyperspherical contrastive embedding.

2.4. Conformal Prediction for Safety Calibration

Conformal prediction provides distribution-free, finite-sample guarantees on prediction validity under exchangeability assumptions [7]. Unlike Bayesian uncertainty quantification, which depends on prior specification and model assumptions, conformal methods guarantee that the probability of rejecting a conforming sample does not exceed a user-specified level

α

. Recent tutorials [8] have made split-conformal prediction computationally tractable for deep learning, requiring only a held-out calibration set rather than full leave-one-out computation. Extensions to temporal data [9,10] have addressed the challenges of applying conformal inference under serial dependence.

SCOPE departs from prior nuclear OOD methods by replacing heuristic thresholds with conformal calibration, thereby exposing an explicit safety parameter

α

. Furthermore, whereas existing conformal approaches for time series typically operate at the window- or sample-level, SCOPE performs calibration at the trajectory level under a monotone rejection policy. This formulation requires only run-wise exchangeability among accident trajectories rather than independence across overlapping windows, making it applicable to sliding-window deployment without violating the assumptions underlying the finite-sample guarantee. To the best of my knowledge, prior open-set/OOD methods for nuclear accident diagnosis do not provide trajectory-level conformal guarantees that control the probability of mis-rejecting a truly known accident run at a user-specified level

α

.

3. Methodology

SCOPE is a framework for safety-calibrated out-of-distribution (OOD) prediction in nuclear power plant accident diagnosis. It integrates three components. First, a causal temporal encoder trained with supervised contrastive learning shapes a hyperspherical embedding geometry for known accident families. Second, a k-nearest-neighbor density nonconformity score in the encoder embedding space detects transients lying in low-density regions of the known-accident manifold with prototype distance, entropy, and conservative maximum fusion retained as ablated alternatives. Third, split-conformal calibration at the trajectory level converts nonconformity values into an interpretable rejection threshold with finite-sample control of the mis-rejection probability for known accident runs under run-wise exchangeability [7,8]. The training and prototype construction stage is summarized in Algorithm 1, while safety calibration and the deployed decision rule are described in Algorithm 2 at the end of this section.

A schematic overview of the pipeline is shown in Figure 1. The deployed inference pipeline for a trajectory

X

containing windows

{X^{(t)}}_{t \in T}

can be summarized as

X \to {e^{(t)}}_{t \in T} \to {S (X^{(t)})}_{t \in T} \to S_{run} = max_{t} S (X^{(t)}) \to δ_{α^{'}} (X),

where

e^{(t)}

is the encoder-space hyperspherical embedding of window

X^{(t)}

,

S (X^{(t)})

is the density-based nonconformity score,

S_{run}

is the trajectory-level score under the monotone rejection policy, and

δ_{α^{'}} (\cdot)

is the deployed decision rule calibrated at the corrected level

α^{'} = α_{target} / M

.

Algorithm 1 SCOPE training and prototype construction.

Require:: Training set $D_{train} = {(X_{i}, y_{i})}$ with $y_{i} \in Y_{known}$
Require:: Encoder $g_{θ}$ , projection head h, temperature $τ$
Ensure:: Trained encoder $g_{θ}$ , prototypes ${μ_{k}}_{k \in Y_{known}}$ , training encoder embeddings $E_{train}$

1:: Train $g_{θ}$ and h on $D_{train}$ using the supervised contrastive loss in Equation (1).
Compute normalized encoder embeddings:
2:: $E_{train} \leftarrow \emptyset$
3:: for each $(X_{i}, y_{i}) \in D_{train}$ do
4:: $e_{i} \leftarrow g_{θ} (X_{i}) / {∥ g_{θ} (X_{i}) ∥}_{2}$ ▹ encoder space; projection head h used only for the loss above
5:: $E_{train} \leftarrow E_{train} \cup {e_{i}}$
6:: end for
Compute normalized class prototypes:
7:: for each class $k \in Y_{known}$ do
8:: $I_{k} \leftarrow {i : y_{i} = k}$
9:: ${\tilde{μ}}_{k} \leftarrow \frac{1}{| I_{k} |} \sum_{i \in I_{k}} e_{i}$
10:: $μ_{k} \leftarrow {\tilde{μ}}_{k} / {∥ {\tilde{μ}}_{k} ∥}_{2}$
11:: end for

Algorithm 2 SCOPE safety calibration and deployed decision rule.

Require:: Calibration runs $D_{cal} = {X^{(j)}}_{j = 1}^{m}$ (known-class trajectories)
Require:: Run score $S_{run} (X)$ induced by ${μ_{k}}$ and $E_{train}$ from Algorithm 1 as in Equation (5)
Require:: Target level $α_{target}$ , reference level $α_{0}$ , buffer $δ$ , confidence $1 - γ$
Ensure:: Final threshold ${\hat{q}}_{1 - α^{'}}$ and corrected level $α^{'}$

Compute run scores:

1:: $s_{j} \leftarrow S_{run} (X^{(j)})$ for $j = 1, \dots, m$
Two-fold cross-fit margin:
2:: Partition ${1, \dots, m}$ into folds $A, B$
3:: ${\hat{q}}^{(A)} \leftarrow$ ConformalQuantile $({s_{j}}_{j \in A}, α_{0})$
4:: ${\hat{q}}^{(B)} \leftarrow$ ConformalQuantile $({s_{j}}_{j \in B}, α_{0})$
5:: ${FP}_{pool} \leftarrow | {j \in B : s_{j} > {\hat{q}}^{(A)}} | + | {j \in A : s_{j} > {\hat{q}}^{(B)}} |$
6:: $\hat{p} \leftarrow {FP}_{pool} / (| A | + | B |)$ ▹ pooled cross-fit FP rate
7:: ${FPR}_{upper} \leftarrow$ WilsonUpper $(\hat{p}, | A | + | B |, γ)$
8:: $M \leftarrow max {1, (1 + δ) {FPR}_{upper} / α_{0}}$ , $α^{'} \leftarrow α_{target} / M$
Final calibration:
9:: ${\hat{q}}_{1 - α^{'}} \leftarrow$ ConformalQuantile $({s_{j}}_{j = 1}^{m}, α^{'})$
Deployed decision rule:
10:: For a new trajectory $X^{new}$ , compute $s_{new} \leftarrow S_{run} (X^{new})$
11:: return OOD if $s_{new} > {\hat{q}}_{1 - α^{'}}$ , else return ${\hat{y}}_{run}$ (by prototype vote)

3.1. Problem Setup and Decision Rule

Let

X = {x_{t}}_{t = 1}^{T}

denote a multivariate sensor trajectory (run) where

x_{t} \in R^{C}

and C is the number of channels. At decision time t, a causal lookback window is formed

X^{(t)} = (x_{t - L + 1}, \dots, x_{t}) \in R^{L \times C},

where L is the window length. Training and calibration observe labels in the known set

Y_{known} = {1, \dots, K}

. At deployment, trajectories may arise from accident mechanisms not represented in

Y_{known}

. Rejection due to insufficient conformity with known accident classes is denoted by the label OOD.

The goal is to construct a trajectory-level decision rule

δ : {trajectories} \to Y_{known} \cup {OOD},

that classifies known accident trajectories while rejecting unknown ones, subject to a user-specified operational target

α_{target} \in (0, 1)

that controls the tolerated probability of mis-rejecting truly known accident runs.

3.2. Causal Contrastive Representation Learning

SCOPE uses a causal one-dimensional residual convolutional encoder

g_{θ} (\cdot)

based on dilated causal convolutions [17], so that representations at time t depend only on

X^{(t)}

and not on future measurements. During contrastive training only, the encoder output is passed through a projection head

h (\cdot)

and L2-normalized to the unit hypersphere,

z^{(t)} = \frac{h (g_{θ} (X^{(t)}))}{∥ h (g_{θ} (X^{(t)})) ∥_{2}} \in S^{D^{'} - 1},

which is used only inside the supervised contrastive objective, where

D^{'}

is the projection dimension. At deployment, the projection head is discarded: all class prototypes, k-nearest-neighbor density estimates, nonconformity scores, and conformal calibration are computed in the encoder embedding space using the L2-normalized encoder representation

e^{(t)} = \frac{g_{θ} (X^{(t)})}{∥ g_{θ} (X^{(t)}) ∥_{2}} \in S^{D - 1},

where D is the encoder embedding dimension. Performing the downstream geometric scoring in the encoder space rather than the projection space is standard in contrastive representation learning: the projection head primarily aids optimization of the contrastive loss, whereas the encoder representation retains more task-relevant structure for prototype- and density-based scoring [12].

The encoder and projection head are trained using supervised contrastive learning [11]. That is, for a mini-batch with index set I, define

P (i) = {p \in I ∖ {i} : y_{p} = y_{i}}

as the set of positive pairs sharing the same label and

A (i) = I ∖ {i}

as all other samples. The supervised contrastive loss is then defined as

L_{SupCon} = \sum_{i \in I} \frac{- 1}{| P (i) |} \sum_{p \in P (i)} log \frac{exp (z_{i}^{⊤} z_{p} / τ)}{\sum_{a \in A (i)} exp (z_{i}^{⊤} z_{a} / τ)},

(1)

where

τ > 0

is a temperature parameter. This objective promotes compact intra-class clusters and class separation on the hypersphere [12], thereby supporting prototype-based OOD detection. The complete training procedure and prototype construction process are summarized in Algorithm 1.

3.3. Prototype Construction and Density-Based Nonconformity Scoring

After training, SCOPE constructs a prototype for each known class

k \in Y_{known}

from training embeddings in the encoder space defined above. Let

I_{k} = {i : y_{i} = k}

denote training indices for class k. Define

{\tilde{μ}}_{k} = \frac{1}{| I_{k} |} \sum_{i \in I_{k}} e_{i}, μ_{k} = \frac{{\tilde{μ}}_{k}}{∥ {\tilde{μ}}_{k} ∥_{2}},

so that

μ_{k} \in S^{D - 1}

. This class-mean prototype follows the construction used in prototypical networks [23]. Additionally, SCOPE retains all training encoder embeddings

E_{train} = {e_{i}}_{i = 1}^{N_{train}}

for density-based scoring.

3.3.1. Prototype Distance Score

For a test window X with normalized encoder embedding e, the cosine-based nonconformity score measures distance to the nearest prototype as follows:

S_{prot} (X) = \frac{1 - {max}_{k \in Y_{known}} e^{⊤} μ_{k}}{2} .

(2)

where the normalization maps the score to

[0, 1]

since

e^{⊤} μ_{k} \in [- 1, 1]

. This score detects far-OOD samples that lie distant from all known prototypes. This is a normalized variant of the standard nearest-prototype score [3,23] on the unit hypersphere.

3.3.2. Entropy Score

Prototype distance alone may fail to detect near-OOD samples residing in fuzzy regions that appear between known clusters. To capture distributional uncertainty, an entropy-based score is computed from the similarity distribution as follows:

S_{ent} (X) = \frac{H (softmax (s / τ_{s}))}{log K},

(3)

where

s = {[e^{⊤} μ_{1}, \dots, e^{⊤} μ_{K}]}^{⊤}

is the vector of similarities to all prototypes,

τ_{s} > 0

is a temperature parameter,

H (\cdot)

denotes Shannon entropy, and normalization by

log K

ensures the score lies in

[0, 1]

.

3.3.3. k-Nearest Neighbor Density Score

Samples in low-density regions of the embedding space, even if close to a single prototype, may represent OOD inputs. To capture local density, the cosine distance to the

k^{t h}

nearest training embedding is computed as [24]

S_{density} (X) = \frac{1 - e^{⊤} e_{(k)}}{2},

(4)

where

e_{(k)} \in E_{train}

is the

k^{t h}

nearest neighbor of e under cosine similarity. This score helps flag samples that reside in sparsely populated regions of the learned manifold. This is the deep k-NN OOD-detection construction of Sun et al. [24], adapted here to the run-level conformal pipeline.

3.3.4. Deployed Nonconformity Score and Alternatives

As the deployed nonconformity score, SCOPE uses the k-nearest-neighbor density score of Equation (4):

S (X) : = S_{density} (X) = \frac{1 - e^{⊤} e_{(k)}}{2},

(5)

which flags windows that lie in low-density regions relative to the known-accident manifold. The prototype distance score

S_{prot}

(Equation (2) and the entropy score

S_{ent}

(Equation (3), together with their conservative maximum fusion

max {S_{prot} (X), S_{ent} (X), S_{density} (X)}

, are retained as alternative nonconformity scores. A component ablation (Section 5.4) shows that the density score alone provides the strongest and most stable open-set detection, whereas the conservative maximum is dominated by the higher-magnitude prototype and entropy terms and therefore dilutes the more discriminative density signal; this domination also renders the maximum insensitive to the k-NN fusion weight w, which consequently does not appear in Equation (5). The class prototypes

{μ_{k}}

are retained to produce the run-level class prediction

{\hat{y}}_{run}

(Section 3.7). Because Equation (5) is a deterministic function of the encoder embedding, the split-conformal calibration of Section 3.5 preserves finite-sample validity under run-wise exchangeability for the deployed score.

3.4. Trajectory-Level Aggregation and Monotone Policy

Safety-critical monitoring approaches typically adopt a monotone rejection policy. That is, once sufficient evidence for an out-of-distribution behavior is observed, the trajectory (run) is rejected and remains rejected thereafter. Accordingly, a trajectory-level nonconformity score is defined as the maximum over all windows:

S_{run} (X) = max_{t \in T} S (X^{(t)}),

(6)

where

T

denotes the set of decision times. Under this policy, a trajectory is rejected if any window is sufficiently incompatible with all known prototypes, which is equivalent to thresholding

S_{run}

.

This aggregation strategy has an important advantage: by projecting each trajectory to a single scalar score, conformal calibration is performed on trajectory-level scores rather than window-level scores. Consequently, the exchangeability assumption required for finite-sample validity applies to trajectories (runs) rather than to individual windows, allowing arbitrary temporal dependence within each run.

3.5. Split-Conformal Calibration at the Trajectory Level

Let the calibration set

D_{cal}

consist of m known accident trajectories:

D_{cal} = {X^{(j)}}_{j = 1}^{m}

, each with label

y^{(j)} \in Y_{known}

. For each calibration trajectory, compute the run-level score using the same label-free scoring rule as deployment:

S_{run}^{(j)} = max_{t \in T} S (X^{(j, t)}), j = 1, \dots, m .

(7)

This deployment-consistent calibration ensures the threshold applies directly to the deployed rejection rule.

For a nominal conformal level

α \in (0, 1)

, the split-conformal threshold is defined as the

k^{t h}

smallest value among

{S_{run}^{(1)}, \dots, S_{run}^{(m)}}

, where

k = ⌈(m + 1) (1 - α)⌉

(8)

If

k > m

, set the threshold to

+ \infty

(conservative edge-case when

α < 1 / (m + 1)

). Otherwise,

{\hat{q}}_{1 - α} = S_{(k)},

(9)

where

S_{(k)}

denotes the

k^{t h}

-order statistic.

Equation (8) implies a minimum calibration size for a non-trivial (finite) threshold: the condition

k \leq m

holds only when

α \geq 1 / (m + 1)

, that is, when

m \geq ⌈ 1 / α ⌉ - 1

calibration runs are available. This relationship explains the

+ \infty

threshold reported at the most stringent operating point in Section 5. The safety margin tightens the deployed level to

α^{'} = α_{target} / M

, so for

α_{target} = 0.01

with

M \approx 2.22

the effective level is

α^{'} \approx 0.0045

, which would require

m ≳ 222

calibration trajectories, whereas only

m = 142

are available. Notably, the raw level

α = 0.01

on its own requires only

m \geq 99

runs and is therefore already feasible with the present calibration set; the infinite threshold is thus a consequence of the conservative margin rather than of split-conformal calibration itself, and the stringent operating point can be made non-trivial either by enlarging the calibration set to

m ≳ 222

runs or by reducing the margin M.

Finite-Sample Guarantee for Known Trajectories

Assume that the calibration trajectories and a new test trajectory are exchangeable draws from the distribution of known accident runs (run-wise exchangeability). The standard split-conformal arguments then imply the finite-sample bound [7,8]

P (S_{run}^{new} > {\hat{q}}_{1 - α}) \leq α,

(10)

which yields the trajectory-level safety statement

P (δ_{α} (X^{new}) = OOD | Y^{new} \in Y_{known}) \leq α .

This guarantee controls the probability that a known accident trajectory is ever rejected at any decision time under the monotone policy, without requiring independence across overlapping windows.

3.6. Safety Margins for Run-Wise Exchangeability Violations

Although trajectory-level calibration avoids within-run temporal dependence, the finite-sample guarantee in Equation (10) still relies on run-wise exchangeability between calibration and future trajectories. In practice, mild violations may arise from correlated simulator configurations or similar initial conditions. To enforce conservative behavior, SCOPE estimates a safety margin

M \geq 1

using two-fold cross-fitting on the calibration set itself without a separate development set.

The calibration trajectories are partitioned into two disjoint folds:

D_{cal}^{(A)}

and

D_{cal}^{(B)}

. For each fold pair, calibration is performed on one fold and the empirical false-positive behavior is evaluated on the other. Let

{FP}^{A \to B}

denote the number of known trajectories in fold B that exceed the threshold calibrated on fold A, and so

{FP}^{B \to A}

analogously. The pooled false-positive count and sample size are:

{FP}_{p o o l} = {FP}^{A \to B} + {FP}^{B \to A}, n_{pool} = | D_{cal}^{(A)} | + | D_{cal}^{(B)} | .

The pooled empirical rate

\hat{p} = {FP}_{p o o l} / n_{pool}

is defined. To account for finite-sample variability, the one-sided Wilson score upper confidence bound [25] is computed at level

1 - γ

(typically

γ = 0.05

), denoted

{FPR}_{u p p e r} (\hat{p}, n_{pool}, γ)

. Thus, the buffered safety margin is defined as below:

M = max \{1, (1 + δ) \frac{{FPR}_{u p p e r} (\hat{p}, n_{pool}, γ)}{α_{0}}\},

(11)

where

α_{0}

is the reference calibration level used during cross-fitting and

δ > 0

is a small buffer. The deployed calibration level is tightened to

α^{'} = α_{target} / M

, and the final rejection threshold is

{\hat{q}}_{1 - α^{'}}

.

It is worth noting that such a cross-fit procedure with Wilson upper bounds provides a statistically principled margin that adapts to the observed degree of run-wise exchangeability violation while avoiding overly optimistic reliance on point estimates.

Scope of the guarantees: The two layers of SCOPE’s safety argument are explicitly distinguished. The split-conformal bound in Equation (10) is a formal, finite-sample guarantee: under run-wise exchangeability, it holds exactly, for any fixed deterministic nonconformity score, with no asymptotic approximation. The safety margin M in Equation (11), by contrast, is a heuristic, empirically conservative correction that does not carry a separate finite-sample certificate of its own. Its purpose is to hedge against violations of the exchangeability assumption on which the formal bound rests, by tightening the deployed level to

α^{'} = α_{target} / M

. The adequacy of M is therefore assessed empirically through the reliability analysis of Section 5 rather than established by a theorem; when exchangeability holds exactly, the formal bound already applies at

α^{'} \leq α_{target}

and the margin merely adds conservatism.

Run-wise exchangeability in practice: The assumption requires that the calibration runs and a future known run be exchangeable draws from a common distribution of known accident trajectories. In a real plant, this can be violated in several ways: correlated simulator seeds, shared initial or boundary conditions, and near-duplicate runs induce dependence within the calibration pool; gradual sensor drift, instrument recalibration, or plant aging introduce distribution shift between calibration time and deployment time; and operator interventions may alter trajectory dynamics relative to the simulated calibration data. Such departures would render the nominal bound optimistic. The cross-fit margin mitigates mild, approximately stationary violations by absorbing the observed cross-fold over-rejection into M, but it does not protect against arbitrary non-stationary shifts between the simulated calibration distribution and a specific deployed plant. Periodic recalibration of

{\hat{q}}_{1 - α^{'}}

on recent, plant-representative known runs is therefore recommended, and SCOPE is positioned as a calibrated rejection layer that complements rather than replaces existing protection and monitoring systems.

3.7. Deployed Decision Rule

The calibration procedure, safety margin estimation, and deployed rejection rule are summarized in Algorithm 2. For a new trajectory

X^{new}

, compute the run-level score

S_{run}^{new}

using Equation (6). The trajectory-level classification is obtained by majority vote over window-level nearest-prototype predictions:

{\hat{y}}_{run} = mode \{arg max_{k \in Y_{known}} {e^{(t)}}^{⊤} μ_{k} : t \in T\} .

The deployed decision rule at corrected level

α^{'}

is

δ_{α^{'}} (X^{new}) = \{\begin{matrix} {\hat{y}}_{run}, & if S_{run}^{new} \leq {\hat{q}}_{1 - α^{'}}, \\ OOD, & if S_{run}^{new} > {\hat{q}}_{1 - α^{'}} . \end{matrix}

(12)

4. Experimental Setup

This section describes the experimental design used to evaluate SCOPE. Protocol choices are made to avoid information leakage, preserve causal deployment constraints, and ensure that calibration and evaluation operate at the trajectory level as described in Section 3.

4.1. Dataset and Preprocessing

Experiments are conducted on the Nuclear Power Plant Accident Data (NPPAD) benchmark [13], which comprises multivariate time-series simulations spanning diverse accident scenarios. Each trajectory contains synchronized sensor measurements including pressures, temperatures, flow rates, and radiological indicators sampled at fixed intervals. NPPAD provides 18 accident categories in total, each with varying numbers of trajectories, enabling evaluation under realistic data-imbalance conditions.

To enforce causality, data are processed as sliding windows of fixed length L, constructed such that each window

X^{(t)} = (x_{t - L + 1}, \dots, x_{t})

contains only past and present measurements. Sensor streams are standardized channel-wise using statistics computed exclusively at the training set. No future information is used at any stage of training, calibration, or evaluation.

Topological variants of identical fault mechanisms (e.g., Hot Leg and Cold Leg breaks for LOCA, Loop A and Loop B ruptures for SGTR) are aggregated into unified accident families to ensure the diagnostic model learns location-invariant representations. This aggregation reflects operational practice, where emergency operating procedures are structured around critical safety functions and fault mechanisms rather than precise spatial origins [26].

4.2. Known and Unknown Accident Families

OOD detection is evaluated under a rigorous, high-openness protocol, in which the dataset is partitioned by semantic accident family. Unlike random hold-out splits, this protocol ensures that test-time unknowns correspond to physically distinct accident mechanisms that are completely unobserved during training and calibration phases.

Four semantic families are selected as known for training, calibration, and safety guarantee evaluation:

Y_{known} = {LOCA, SGTR, MSLB, FLB} .

These families are nominated because they represent major design-basis accidents with sufficient trajectory counts to support statistically meaningful calibration and evaluation. All remaining accident types are then treated as unknown and withheld entirely until the test phase:

Y_{unknown} = {ATWS, LACP, LLB, LOF, LR, MD, RI, RW, SP, TT, Normal} .

This configuration reflects a safety-critical deployment setting in which the system must maintain high diagnostic accuracy on design-basis accidents while rejecting transients not represented in the training distribution. Table 1 summarizes the OOD dataset composition. Five unknown families contain sufficient trajectories (

n \geq 82

) for statistically meaningful evaluation, while six families are singletons due to NPPAD dataset construction. Per-family detection rates for singletons are reported for completeness but carry no statistical weight. Aggregate OOD metrics are computed primarily over the well-represented unknown families.

The operational role of the Normal category is further clarified: NPPAD provides it as a single steady-state run, and it is treated as a descriptive (non-statistical) singleton. Normal denotes nominal operation rather than an accident mechanism, and is therefore conceptually outside the known accident families

Y_{known}

; including it among the unknowns simply verifies that nominal dynamics fall outside the learned accident manifolds. Operationally, SCOPE is intended as an abnormal-regime diagnostic engaged after an upstream normal/abnormal trigger (e.g., a reactor trip or an alarm condition), so steady-state normal operation in a real plant would be handled by that preceding layer rather than by SCOPE’s accident-family rejection. Accordingly, Normal is excluded from all calibration-based safety claims and is reported for completeness only.

4.3. Data Splitting and Calibration Protocol

Known-class trajectories are partitioned into four disjoint subsets at the run level to ensure no trajectory appears in multiple splits:

Training set $D_{train}$ (50%): Used exclusively to learn the causal contrastive encoder and to construct class prototypes, as discussed via Algorithm 1.
Validation set $D_{val}$ (10%): Used for model selection based on known-class macro-F1 performance without access to unknown families or calibration procedures.
Calibration set $D_{cal}$ (20%): Used solely to compute the split-conformal threshold ${\hat{q}}_{1 - α^{'}}$ on trajectory-level nonconformity scores as discussed via Algorithm 2.
Test set $D_{test}$ (20% of known trajectories + all unknown trajectories): Used for final evaluation of safety compliance and OOD detection performance.

The safety margin M is estimated via two-fold cross-fitting on

D_{cal}

using the procedure described in Section 3.6, thus avoiding the need for a separate development set. In addition, calibration is performed at the trajectory level: each calibration run yields a single scalar score

S_{run}^{(j)} = {max}_{t} S (X^{(j, t)})

, and the conformal threshold is computed from these run-level scores. This ensures the finite-sample guarantee applies to trajectories under run-wise exchangeability rather than requiring independence across overlapping windows.

4.4. Evaluation Metrics

Performance is assessed across three complementary axes, with all primary metrics computed at the trajectory (run) level to match the calibration granularity.

4.4.1. Safety Compliance

The empirical run-level false-alarm rate on known test trajectories is measured:

{FPR}_{emp} = \frac{# {X \in D_{test}^{known} : δ_{α^{'}} (X) = OOD}}{# {X \in D_{test}^{known}}} .

Reliability diagrams [30] are reported across target levels

α_{target} \in {0.01, 0.05, 0.10}

with 95% Wilson score confidence intervals [25]. Calibration quality is summarized via the calibration ratio

{FPR}_{emp} / α_{target}

, where values at or below 1.0 indicate conservative (valid) behavior.

4.4.2. True Unknown Rate Detection

Trajectory-level separability between known and unknown runs is quantified using the AUROC metric [31] computed from run-level nonconformity scores

S_{run}

. The true unknown rate (TUR) is also reported at the calibrated threshold:

TUR = \frac{# {X \in D_{test}^{unknown} : δ_{α^{'}} (X) = OOD}}{# {D_{test}^{unknown}}} .

TUR is reported separately for all unknown trajectories and for statistically valid families only (

n \geq 10

).

4.4.3. Classification Accuracy on Accepted Trajectories

For known trajectories that are not rejected, run-level classification accuracy and macro-F1 are computed by majority-voting window-level predictions. This metric isolates diagnostic performance conditional on not triggering the OOD alarm.

4.4.4. Time-to-Rejection

For unknown trajectories, detection timeliness is characterized by the time-to-rejection (TTR), defined as the elapsed time from trajectory onset to the first window whose nonconformity score exceeds the calibrated threshold:

TTR (X) = inf {t \in T : S (X^{(t)}) > {\hat{q}}_{1 - α^{'}}} .

The median, mean, and 90th percentile TTR are reported per unknown family to characterize responsiveness across transient types.

4.5. Hyperparameter Selection and Sensitivity Analysis

To assess robustness, a grid search is conducted over the following architectural and scoring hyperparameters: window length

L \in {30, 60, 90}

time steps, embedding dimension

D \in {32, 64, 128}

, contrastive temperature

τ \in {0.05, 0.07, 0.10}

, k-NN neighbors

k \in {5, 10, 20}

, and k-NN weight

w \in {0.3, 0.5, 0.7}

. Since the encoder depends only on

(L, D, τ)

, each encoder configuration is trained once and all

(k, w)

scoring combinations are evaluated on the resulting embeddings. In addition, model selection is performed strictly using known-class validation macro-F1 without any access to unknown families or test data. The optimal configuration is fixed before computing calibration thresholds and evaluating on the held-out test set.

This protocol ensures that safety compliance and OOD detection results are not influenced by configuration choices optimized for unknown-class performance, reflecting realistic deployment conditions in which the characteristics of future unknown transients cannot be anticipated.

5. Results and Discussion

This section presents the experimental evaluation of SCOPE using the NPPAD benchmark. The results are organized around four axes: (1) conformal safety compliance, verifying that empirical false-alarm rates respect user-specified targets; (2) OOD detection performance, characterizing separation between known and unknown accident trajectories; (3) time-to-rejection analysis, assessing detection timeliness; and (4) sensitivity analysis, evaluating robustness across hyperparameter configurations. Deeper insights from these results are left for discussion at the end of this section.

5.1. Conformal Safety Compliance

The primary operational requirement for SCOPE is to bound the rate at which known accident trajectories are incorrectly rejected as OOD. Calibration performance was evaluated across three target safety levels:

α_{target} \in {0.01, 0.05, 0.10}

, representing stringent (1%), standard (5%), and relaxed (10%) false-alarm tolerances. All metrics in this section are reported as mean ± standard deviation over five random seeds. The encoder configuration (window size

L = 30

, temperature

τ = 0.05

, embedding dimension

D = 32

, k-NN neighbors

k = 5

) was selected by validation macro-F1, and the deployed nonconformity score is the k-NN density score of Equation (5).

Table 2 summarizes the calibration of the deployed density scorer. At the standard (

α_{target} = 0.05

) and relaxed (

α_{target} = 0.10

) operating points, the margin-corrected empirical false-alarm rate (

0.038 \pm 0.008

and

0.062 \pm 0.018

) remains below target across all seeds, confirming that trajectory-level conformal calibration with the Wilson cross-fit margin (

M = 1.52 \pm 0.08

) controls the run-level false-alarm rate conservatively. At the stringent level

α_{target} = 0.01

, the margin tightens the deployed level to

α^{'} \approx 0.007

; certifying this level requires roughly 222 calibration runs (Section 3.5), exceeding the

m = 142

available, so the threshold is infinite and no run is rejected. Calibrating directly at

α = 0.01

without the margin requires only

m \geq 99

runs and is therefore feasible, yielding a non-degenerate

FPR = 0.010 \pm 0.004

with

TUR = 0.633 \pm 0.178

. Figure 2 shows the corresponding reliability diagram, with the operating points lying within the safe zone at or below the ideal calibration diagonal.

Table 2 reveals the inherent trade-off between safety and detection power. Enforcing the stringent constraint (

α = 0.01

, with margin) yields an infinite threshold that suppresses the true unknown rate to

0 %

. Relaxing the constraint to

α = 0.10

recovers a TUR of

0.83 \pm 0.05

on statistically valid OOD families. This trade-off is fundamental to any calibrated rejection system and provides operators with an explicit control parameter to balance false-alarm risk against unknown detection sensitivity, in line with plant-specific safety requirements.

For known trajectories, run-level classification uses nearest-prototype majority voting and is therefore unaffected by the choice of nonconformity scorer. The unconditional known-class accuracy, computed over all 143 known test runs prior to any rejection, is

0.952 \pm 0.067

(macro-F1

0.95 \pm 0.07

), with four of the five seeds at or above

0.98

; on the accepted subset at

α_{target} = 0.05

the accuracy reaches

1.0

for those four seeds. This confirms that the contrastive embedding space separates the four known accident families (LOCA, SGTR, MSLB, FLB) and that the rejection mechanism does not degrade classification quality among accepted runs.

5.2. OOD Detection Performance

Beyond safety compliance, the ability to detect unknown accident trajectories was evaluated using the deployed density scorer. The overall run-level AUROC is

0.884 \pm 0.051

, quantifying trajectory-level separability between known and unknown runs. Figure 3 shows a t-SNE projection of run-level encoder embeddings: the known families form compact clusters and the well-represented unknown families occupy distinct regions, indicating that the open-set difficulty is primarily a scoring problem rather than a representation problem. Figure 4 reports the per-family detection rates of the density scorer.

The density scorer detects four of the five statistically valid unknown families almost perfectly: rod withdrawal (RW), load rejection (LR), rod insertion (RI), and letdown line break (LLB) are each detected at ≈100% across seeds. This is a marked change from the prototype/entropy-dominated maximum fusion, under which LLB was largely missed (detection

\approx 13 %

). As Figure 3 shows, LLB is in fact well separated in the embedding; its earlier low detection was an artifact of the conservative-maximum fusion, whose threshold was set by the higher-magnitude prototype and entropy terms, rather than a physical limitation. The single remaining hard family is moderator dilution (MD), detected at only

8.8 \pm 19.7 %

: slow reactivity dilution produces gradual changes whose window-level density relative to the known manifold remains low, so most MD windows are not flagged [27,28]. The “fundamentally hard” characterization is therefore restricted to MD-type slow transients, and multimodal and physics-informed features (Section 6) are identified as the route to detecting them.

Comparison with OOD baselines: To place SCOPE’s open-set detection in context, its density score is compared against six representative OOD scores—maximum softmax probability (MSP) [2], ODIN [19], energy scoring [20], the Mahalanobis detector [32], OpenMax [3], and MC-dropout predictive entropy [33], computed post hoc on the same frozen encoder, the controlled protocol in which only the scoring rule varies. A separately trained convolutional autoencoder scored by reconstruction error (

0.803 \pm 0.004

) is additionally included, a representative reconstruction-based anomaly detector [18]. Table 3 reports run-level AUROC (mean ± std over five seeds). SCOPE’s density score attains the highest mean AUROC (

0.884 \pm 0.051

) and the lowest variance; the strongest baselines (Mahalanobis

0.820 \pm 0.064

and ODIN

0.819 \pm 0.080

) trail by roughly six points. It is emphasized that this comparison is restricted to threshold-free separability: unlike SCOPE, none of these baselines accepts a user-specified false-alarm budget

α

or provides a finite-sample mis-rejection guarantee, so they cannot be compared on the safety-calibration axis that constitutes SCOPE’s central contribution. A comparison against a Transformer backbone would require retraining a different encoder; in this data-scarce regime (

10^{2}

–

10^{3}

runs), where the convolutional inductive bias is more sample-efficient, the causal CNN is retained and scoring rules are compared on the shared representation.

5.3. Time-to-Rejection Analysis

Table 4 reports detection rate and median time-to-rejection (TTR) for the deployed density scorer on the statistically valid OOD families (mean ± std over five seeds). The four detected families (RW, LR, RI, LLB) are flagged at ≈100%, with a median TTR of ≈290 s, essentially the first decision window, indicating that, once a transient is detectable, the density score crosses the calibrated threshold almost immediately. MD remains largely undetected (

8.8 %

). The overall detection rate across valid families rises to ≈81%, up from ≈60% under the prototype/entropy-dominated maximum fusion.

The near-immediate median TTR (≈290 s, a single window) reflects the sharp separation of the density score: known runs lie very close to the training manifold (near-zero density distance), whereas detectable unknowns depart from it within the first window. The residual gap to a perfect aggregate detection rate is due almost entirely to MD, whose slow reactivity dilution keeps most windows close to the known manifold.

5.4. Sensitivity Analysis

Nonconformity-component ablation: The choice of nonconformity score is first ablated, comparing the deployed k-NN density score against prototype distance only, entropy only, and conservative maximum fusion (Table 5, Figure 5). The density score dominates: it attains the highest AUROC (

0.884 \pm 0.051

) and, at

α = 0.05

, by far the highest true-unknown rate (

0.828 \pm 0.050

versus

0.565 \pm 0.232

for the maximum fusion), while also being the most stable across seeds. The conservative maximum is dominated by the larger-magnitude prototype and entropy terms, which raise the calibrated threshold and suppress the more discriminative density signal; this both lowers detection and explains the fusion’s insensitivity to the weight w. These results motivate deploying the density score alone.

Hyperparameter sensitivity: To assess robustness, a grid search was conducted over window length

L \in {30, 60, 90}

, embedding dimension

D \in {32, 64, 128}

, contrastive temperature

τ \in {0.05, 0.07, 0.10}

, k-NN neighbors

k \in {5, 10, 20}

, and k-NN weight

w \in {0.3, 0.5, 0.7}

. Table 6 presents the marginal impact of each hyperparameter on validation macro-F1.

The results here reveal that a window size of

L = 60

achieves the highest stability, with a standard deviation of only

0.004

and a tuning gap of

0.001

. This indicates that 60-step windows (corresponding to 600 s at a 10 s sampling rate) provide a robust temporal context that is insensitive to other hyperparameter choices. In contrast, window sizes of 30 and 90 steps exhibit higher variance (std > 0.10), suggesting that shorter windows may miss discriminative features, whereas longer windows may introduce noise.

Temperature

τ = 0.05

yields the highest mean F1 (0.970) with low variance, indicating that moderate contrastive sharpness promotes well-separated class manifolds. Higher temperatures (

τ = 0.07, 0.10

) produce more variable results due to softer similarity distributions.

Embedding dimension

D = 128

achieves the highest mean F1 (0.971) and lowest tuning gap (0.016), suggesting that additional representational capacity improves robustness. However, the selected configuration (

D = 32

) was chosen based on validation F1 under a specific encoder initialization and achieves comparable peak performance (0.987).

The k-NN parameters (k and w) exhibit identical marginal statistics across all values, indicating they do not affect validation macro-F1. This is expected because model selection uses a high threshold that accepts all validation samples, so the scoring rule does not influence the closed-set predictions. This invariance does not weaken the safety guarantee: split-conformal calibration is valid for any fixed deterministic score, so the calibrated threshold adapts to whatever scorer is used and the finite-sample bound of Equation (10) holds for each. Because the deployed scorer is the k-NN density score (Equation (5)), its only hyperparameter is the neighbor count k (the fusion weight w pertains solely to the ablated maximum variant). Table 7 reports a direct k-sweep of the deployed scorer: run-level AUROC varies by under

0.05

across

k \in {1, 5, 10, 20, 50}

(from

0.868 \pm 0.030

at

k = 1

to

0.826 \pm 0.028

at

k = 50

) while the false-alarm rate stays near target, so the scorer is robust to k; small neighbor counts (

k \leq 10

) are marginally best, and

k = 5

is used.

Robustness to the known/unknown split: To probe sensitivity to which families are designated known, a leave-one-known-family-out study was run over three seeds: each of the four known families is withheld in turn and treated as unknown, with the encoder retrained on the remaining three (Table 8). Two findings stand out. First, conservative calibration is preserved across all partitions; the margin-corrected false-alarm rate for the remaining known families remains at or below the

α = 0.05

target in every fold. Second, a withheld design-basis accident is itself reliably detected once it becomes unknown: LOCA, SGTR, and FLB are detected at ≈100%, confirming that the conclusions are not an artifact of the chosen split. The exception is MSLB, detected at only

36 \pm 22 %

when withheld; as a secondary-side steam event, it shares thermal hydraulic signatures with the remaining known families (notably SGTR), so it overlaps their manifolds, consistent with the physical-similarity limitation identified for MD. Overall run-level AUROC remains high across folds (

0.866

–

0.920

).

Figure 6 visualizes the performance landscape across window sizes and embedding dimensions. The heatmap reveals that window size 60 achieves consistently high F1 scores (≥0.983) across all embedding dimensions, confirming its robustness. The configuration (30, 64) exhibits degraded performance (0.860) due to a single suboptimal temperature setting, highlighting the importance of hyperparameter interactions.

5.5. Discussion

The experimental results demonstrate that SCOPE achieves its primary objective: statistically calibrated OOD rejection with controlled false-alarm rates on known accident trajectories. The conservative calibration behavior (FPR below target at all operating points) validates the trajectory-level conformal framework with Wilson-bounded safety margins, confirming that run-wise exchangeability provides a sound basis for finite-sample safety guarantees in this application. Also, with the deployed k-NN density score, SCOPE detects four of the five statistically valid unknown families (RW, LR, RI, and LLB) at ≈100%, with near-immediate median time-to-rejection (≈290 s). The component ablation (Table 5) shows that this density score, rather than the prototype/entropy terms or their conservative-maximum fusion, is responsible for the strong open-set performance, identifying transients that occupy low-density regions of the known accident manifold.

With the density score, the only statistically valid family that remains hard is moderator dilution (MD), detected at

8.8 \pm 19.7 %

. MD produces slow reactivity changes whose window-level density relative to the known manifold stays low, so most windows are not flagged. It is emphasized that LLB, previously reported as near-undetectable, is detected at ≈100% here and is well separated in the embedding (Figure 3); its earlier non-detection was an artifact of the conservative maximum fusion rather than a physical limitation. The residual MD difficulty is consistent with genuine thermal hydraulic similarity to nominal variation and motivates the complementary (multimodal, physics-informed) information sources discussed below.

Several directions could address this limitation in future work. First, incorporating temporal trend features or recurrent architectures may capture slow-evolving dynamics that distinguish near-OOD transients over extended time horizons. Second, multi-scale scoring that combines short-window and long-window nonconformity signals may improve sensitivity to gradual transients. Third, physics-informed features derived from first-principles thermal hydraulic models could provide complementary discrimination for transients with subtle sensor signatures.

Despite these limitations, SCOPE is considered to provide operationally meaningful capabilities for diagnosing nuclear power accidents. Perfect classification accuracy on accepted trajectories ensures reliable diagnosis when the system expresses confidence. Conservative false-alarm control at all operating points provides auditable guarantees suitable for regulatory contexts. Detection of severe unknown transients within 7 to 12 min of onset supports timely operator intervention. The explicit safety parameter

α

enables plant-specific tuning of the conservatism-sensitivity trade-off based on operational requirements.

Computational complexity and deployment cost: SCOPE’s inference cost is dominated by two stages. A single causal-encoder forward pass over a length-L, P-channel window costs

O (L \sum_{l} C_{in}^{(l)} C_{out}^{(l)} K_{l})

, i.e., linear in the window length for a fixed architecture. Scoring a window then requires the prototype and entropy terms at

O (K D)

(with

K = 4

known classes and encoder dimension D) and the k-NN density term at

O (N_{train} D)

for a brute-force cosine search against the

N_{train}

stored training embeddings; the latter dominates. A full trajectory with

| T |

decision windows therefore costs

O (| T | N_{train} D)

for scoring plus

| T |

encoder passes, and the run-level decision adds only an

O (| T |)

maximum. Calibration is a one-time

O (m log m)

sort over the m run scores (plus the cross-fit margin estimation), and the memory footprint is

O (N_{train} D)

for the stored embeddings. At the NPPAD scale (

N_{train} \approx 1.3 \times 10^{5}

windows,

D \leq 128

,

K = 4

), per-window inference is comfortably real-time on commodity hardware. Measured single-window inference (encoder forward plus density scoring against the full training embedding bank) takes

0.22

ms on an Apple M1 Max GPU (MPS) and

1.2

ms on CPU, i.e., ≈22 ms and ≈121 ms per 100-window trajectory—well within real-time monitoring budgets. Where larger training banks are required, the dominant k-NN term can be reduced from linear to sub-linear using approximate nearest-neighbor indexing (e.g., HNSW or IVF) without affecting conformal validity, since the score remains a deterministic function of the embedding.

6. Conclusions

This work introduced SCOPE, a safety-calibrated framework for out-of-distribution prediction in nuclear power plant accident diagnosis. By integrating causal contrastive representation learning with split-conformal calibration, SCOPE provides explicit and interpretable control over false-alarm risk while enabling principled rejection of unmodeled accident trajectories. Experiments on the NPPAD benchmark demonstrate that SCOPE maintains strong diagnostic accuracy on accepted known accidents, achieves conservative and stable safety calibration under sliding-window deployment, and detects several severe unknown transients with operationally meaningful latency. Quantitatively, on the high-openness NPPAD protocol (all figures reported as mean ± standard deviation over five random seeds), the deployed k-nearest-neighbor density scorer attains a run-level AUROC of

0.884 \pm 0.051

, an unconditional known-class accuracy of

0.952 \pm 0.067

, empirical false-alarm rates that remain below target (

0.038 \pm 0.008

at

α_{target} = 0.05

), and detection rates of ≈100% for the rod withdrawal, load rejection, rod insertion, and letdown line break families, at an inference cost of ≈0.22 ms per decision window. It outperforms a panel of standard OOD baselines (MSP, ODIN, energy, Mahalanobis, OpenMax, MC-dropout, and a reconstruction autoencoder). These results illustrate the practical value of combining modern representation learning with statistical calibration for safety-critical, real-time monitoring systems.

Several limitations merit consideration. The finite-sample safety guarantees rely on run-wise exchangeability assumptions, and while the proposed Wilson-bounded safety margin promotes conservative empirical behavior, exact validity under arbitrary temporal dependence is not claimed. Additionally, near-OOD transients whose thermal hydraulic signatures overlap with known accident manifolds, such as moderator dilution and portions of letdown line break trajectories, remain challenging to detect using embedding-based methods alone. These cases reflect fundamental physical similarity rather than algorithmic deficiency.

Future work will investigate extensions of conformal calibration under stronger forms of temporal dependence and explore multimodal integration to enhance the detection of subtle or near-boundary transients. In particular, three concrete mitigation strategies are planned, aimed at the hard near-OOD families (e.g., MD and much of LLB). First, multimodal sensing: jointly embedding the operational process stream with the radiological/dose stream (and, where available, acoustic or vibration channels), so that release or leakage signatures that are weak in process variables but distinctive in nuclide concentration dynamics can contribute to the nonconformity score. Second, physics-informed features: augmenting the learned representation with residuals from first-principles thermal hydraulic balances (mass, energy, and reactivity), which can expose slow departures from nominal behavior that raw sensor patterns tend to blur. Third, multi-scale temporal scoring: fusing short- and long-window nonconformity signals so that gradual transients accumulate detectable evidence over extended horizons. Crucially, each of these augmentations leaves the trajectory-level conformal machinery unchanged, as it modifies only the deterministic score

S (\cdot)

and therefore preserves the finite-sample guarantee. Beyond nuclear applications, SCOPE provides a general template for safety-calibrated OOD prediction in other high-risk domains, including chemical process monitoring and energy system management.

Funding

This research received no external funding.

Data Availability Statement

Data used in this work is publically available via https://github.com/thu-inet/NuclearPowerPlantAccidentData (accessed on 27 May 2026).

Conflicts of Interest

The author declares no conflicts of interest.

References

Amodei, D.; Olah, C.; Steinhardt, J.; Christiano, P.; Schulman, J.; Mané, D. Concrete Problems in AI Safety. arXiv 2016, arXiv:1606.06565. [Google Scholar] [CrossRef]
Hendrycks, D.; Gimpel, K. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. arXiv 2017. [Google Scholar] [CrossRef]
Bendale, A.; Boult, T. Towards Open Set Deep Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 1563–1572. [Google Scholar]
Kim, S.G.; Chae, Y.H.; Koo, S.R. Application of an open-set recognition method for detecting untrained accident scenarios in a nuclear power plant accident diagnosis model. Nucl. Eng. Des. 2024, 427, 113421. [Google Scholar] [CrossRef]
Li, J.; Lin, M.; Wang, B.; Tian, R.; Tan, S.; Li, Y.; Chen, J. Open set recognition fault diagnosis framework based on convolutional prototype learning network for nuclear power plants. Energy 2024, 290, 130101. [Google Scholar] [CrossRef]
Zhou, S.; Lin, M.; Huang, S.; Xiao, K. Open set compound fault recognition method for nuclear power plant based on label mask weighted prototype learning. Appl. Energy 2024, 369, 123603. [Google Scholar] [CrossRef]
Vovk, V.; Gammerman, A.; Shafer, G. Algorithmic Learning in a Random World; Springer: New York, NY, USA, 2005. [Google Scholar] [CrossRef]
Angelopoulos, A.N.; Bates, S. Conformal Prediction: A Gentle Introduction. Found. Trends Mach. Learn. 2023, 16, 494–591. [Google Scholar] [CrossRef]
Zaffran, M.; Féron, O.; Goude, Y.; Josse, J.; Dieuleveut, A. Adaptive Conformal Predictions for Time Series. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2022; pp. 25834–25866. [Google Scholar] [CrossRef]
Lin, Z.; Trivedi, S.; Sun, J. Conformal Prediction Intervals with Temporal Dependence. arXiv 2022. [Google Scholar] [CrossRef]
Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised contrastive learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 6–12 December 2020. NIPS ’20. [Google Scholar]
Wang, T.; Isola, P. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the 37th International Conference on Machine Learning; JMLR.org: Norfolk, MA, USA, 2020; ICML’20. [Google Scholar]
Qi, B.; Xiao, X.; Liang, J.; Po, L.c.C.; Zhang, L.; Tong, J. An open time-series simulated dataset covering various accidents for nuclear power plants. Sci. Data 2022, 9, 766. [Google Scholar] [CrossRef] [PubMed]
Saeed, H.A.; Peng, M.; Wang, H.; Zhang, B. Novel fault diagnosis scheme utilizing deep learning networks. Prog. Nucl. Energy 2020, 118, 103066. [Google Scholar] [CrossRef]
Lee, G.; Lee, S.J.; Lee, C. A convolutional neural network model for abnormality diagnosis in a nuclear power plant. Appl. Soft Comput. 2021, 99, 106874. [Google Scholar] [CrossRef]
She, J.; Shi, T.; Xue, S.; Zhu, Y.; Lu, S.; Sun, P.; Cao, H. Diagnosis and Prediction for Loss of Coolant Accidents in Nuclear Power Plants Using Deep Learning Methods. Front. Energy Res. 2021, 9, 665262. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
An, J.; Cho, S. Variational autoencoder based anomaly detection using reconstruction probability. Spec. Lect. IE 2015, 2, 1–18. [Google Scholar]
Liang, S.; Li, Y.; Srikant, R. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks. arXiv 2018. [Google Scholar] [CrossRef]
Liu, W.; Wang, X.; Owens, J.D.; Li, Y. Energy-based out-of-distribution detection. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 6–12 December 2020. NIPS ’20. [Google Scholar]
Fort, S.; Ren, J.; Lakshminarayanan, B. Exploring the limits of out-of-distribution detection. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 6–14 December 2021. NIPS ’21. [Google Scholar]
Antony Asir Daniel, V.; Vijayalakshmi, K.; Pawar, P.P.; Kumar, D.; Bhuvanesh, A.; Josephine Christilda, A. Enhanced affinity propagation clustering with a modified extreme learning machine for segmentation and classification of hyperspectral imaging. E-Prime-Adv. Electr. Eng. Electron. Energy 2024, 9, 100704. [Google Scholar] [CrossRef]
Snell, J.; Swersky, K.; Zemel, R.S. Prototypical Networks for Few-shot Learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS); NeurIPS: Denver, CA, USA, 2017; Volume 30. [Google Scholar]
Sun, Y.; Ming, Y.; Zhu, X.; Li, Y. Out-of-Distribution Detection with Deep Nearest Neighbors. In Proceedings of the 39th International Conference on Machine Learning (ICML); PMLR: Cambridge, MA, USA, 2022; Volume 162, pp. 20827–20840. [Google Scholar]
Wilson, E.B. Probable Inference, the Law of Succession, and Statistical Inference. J. Am. Stat. Assoc. 1927, 22, 209–212. [Google Scholar] [CrossRef]
International Atomic Energy Agency. Development and Review of Plant Specific Emergency Operating Procedures; Technical Report 48; International Atomic Energy Agency: Vienna, Austria, 2006. [Google Scholar]
Todreas, N.E.; Kazimi, M.S. Nuclear Systems Volume I: Thermal Hydraulic Fundamentals, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar] [CrossRef]
International Atomic Energy Agency. Accident Analysis for Nuclear Power Plants; IAEA Safety Report Series No. 23 STI/PUB/1131; International Atomic Energy Agency: Vienna, Austria, 2002. [Google Scholar]
U.S. Nuclear Regulatory Commission. Severe Accident Risks: An Assessment for Five U.S. Nuclear Power Plants; Technical Report 1150; U.S. Nuclear Regulatory Commission: Washington, DC, USA, 1990.
Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On Calibration of Modern Neural Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML); PMLR: Cambridge, MA, USA, 2017; Volume 70, pp. 1321–1330. [Google Scholar]
Hanley, J.A.; McNeil, B.J. The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef] [PubMed]
Lee, K.; Lee, K.; Lee, H.; Shin, J. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Proceedings of the Advances in Neural Information Processing Systems; NeurIPS: Denver, CA, USA, 2018; Volume 31. [Google Scholar]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2016; pp. 1050–1059. [Google Scholar]

Figure 1. Overview of SCOPE: (a) Causal contrastive representation learning yields hyperspherical embeddings for known accident classes. (b) Safety calibration on held-out known trajectories maps run-level nonconformity scores to a calibrated rejection threshold, with an optional safety margin to mitigate run-wise exchangeability violations. (c) Deployed inference classifies known accidents or rejects with label OOD under a monotone trajectory-level policy.

Figure 2. Reliability diagram of empirical false-alarm rate vs. target level for the deployed density scorer (mean ± std over five seeds). Both the margin-corrected (deployed) and margin-free operating points are shown; points lie within the safe zone at or below the ideal calibration diagonal.

Figure 3. t-SNE of run-level encoder embeddings on the test set. Known families (blue) and the unknown families MD (red), LLB (orange), and RW (green) occupy largely distinct regions; remaining unknown families are shown in grey. The separation of LLB indicates that its earlier non-detection was a scoring artifact rather than a representation limitation.

Figure 4. Per-family detection rate of the deployed density scorer at

α_{target} = 0.05

(mean ± std over five seeds). RW, LR, RI, and LLB are detected at ≈100%; only MD (slow reactivity dilution) remains hard.

Figure 4. Per-family detection rate of the deployed density scorer at

α_{target} = 0.05

(mean ± std over five seeds). RW, LR, RI, and LLB are detected at ≈100%; only MD (slow reactivity dilution) remains hard.

Figure 5. Nonconformity-component ablation (5-seed mean ± std). The density score yields the highest AUROC and true-unknown rate; the conservative maximum dilutes it.

Figure 6. Validation macro-F1 across window sizes and embedding dimensions, averaged over temperature and k-NN configurations. A window size of 60 achieves consistently high performance across all embedding dimensions.

Table 1. Out-of-Distribution (OOD) Dataset Composition and Physical Characteristics.

OOD Family	No. of Trajectories	Transient Severity *	Evaluation Status
Statistically Valid Families ( $n \geq 10$ )
LLB (Letdown Line Break)	101	Mild (small-break CVCS)	Primary
MD (Moderator Dilution)	100	Mild (slow reactivity)	Primary
RW (Rod Withdrawal)	100	Severe (rapid reactivity)	Primary
LR (Load Rejection)	99	Mild (turbine transient)	Primary
RI (Rod Insertion)	82	Severe (rapid reactivity)	Primary
Singleton Families ( $n = 1$ )
ATWS (Anticipated Transient Without Scram)	1	Severe	Descriptive only
LACP (Loss of AC Power)	1	Severe	Descriptive only
LOF (Loss of Flow)	1	Severe	Descriptive only
Normal (Normal Operation)	1	Baseline	Descriptive only
SP (Seizure of Primary Pump)	1	Severe	Descriptive only
TT (Turbine Trip)	1	Mild	Descriptive only
Total (Primary, 98.8% of OOD)	482
Total (All)	488

* Transient severity reflects thermal hydraulic impact [27,28,29]: severe transients exhibit rapid parameter excursions with distinctive signatures, while mild transients produce gradual changes that may overlap with known accident manifolds.

Table 2. Conformal calibration of the deployed k-NN density scorer across target safety levels (mean ± std over five seeds; Wilson cross-fit margin

M = 1.52 \pm 0.08

). With the margin, the empirical false-alarm rate stays at or below target for

α \geq 0.05

(conservative). The threshold-free AUROC is

0.884 \pm 0.051

. At

α = 0.01

the margin-corrected level exceeds the resolution of the

m = 142

calibration runs, giving an infinite threshold; calibrating directly at

α = 0.01

(margin-free) is feasible and yields

FPR = 0.010 \pm 0.004

,

TUR = 0.633 \pm 0.178

.

Table 2. Conformal calibration of the deployed k-NN density scorer across target safety levels (mean ± std over five seeds; Wilson cross-fit margin

M = 1.52 \pm 0.08

). With the margin, the empirical false-alarm rate stays at or below target for

α \geq 0.05

(conservative). The threshold-free AUROC is

0.884 \pm 0.051

. At

α = 0.01

the margin-corrected level exceeds the resolution of the

m = 142

calibration runs, giving an infinite threshold; calibrating directly at

α = 0.01

(margin-free) is feasible and yields

FPR = 0.010 \pm 0.004

,

TUR = 0.633 \pm 0.178

.

Target $α_{target}$	Corrected $α^{'} = α / M$	Empirical FPR (↓)	TUR (↑)
0.01	0.007	$0.000 \pm 0.000$	$0.000 \pm 0.000$
0.05	0.033	$0.038 \pm 0.008$	$0.809 \pm 0.040$
0.10	0.066	$0.062 \pm 0.018$	$0.828 \pm 0.050$

↑ indicates higher is better while ↓ indicates lower is better.

Table 3. Run-level AUROC over five seeds (mean ± std). The six post hoc scores share SCOPE’s frozen encoder, so only the scoring rule varies; the autoencoder (∗) is trained separately and scored by reconstruction error. SCOPE’s k-NN density score is the strongest and most stable and, unlike any baseline, provides calibrated

α

-control.

Table 3. Run-level AUROC over five seeds (mean ± std). The six post hoc scores share SCOPE’s frozen encoder, so only the scoring rule varies; the autoencoder (∗) is trained separately and scored by reconstruction error. SCOPE’s k-NN density score is the strongest and most stable and, unlike any baseline, provides calibrated

α

-control.

OOD Score	Run-Level AUROC (↑)
OpenMax [3]	$0.758 \pm 0.094$
MSP [2]	$0.784 \pm 0.079$
Energy [20]	$0.796 \pm 0.076$
Autoencoder (recon.) ^∗ [18]	$0.803 \pm 0.004$
MC-dropout entropy [33]	$0.811 \pm 0.064$
ODIN [19]	$0.819 \pm 0.080$
Mahalanobis [32]	$0.820 \pm 0.064$
SCOPE ( $k$ -NN density)	$0.884 \pm 0.051$

↑ indicates higher is better.

Table 4. Detection rate and median time-to-rejection of the deployed density scorer for statistically valid OOD families (mean ± std over five seeds). MD is rarely detected, so its TTR is not meaningful.

OOD Family	No. of Trajectories	Detection Rate (%) (↑)	Median TTR (s) (↓)
RW (Rod Withdrawal)	100	$100.0 \pm 0.0$	290
LR (Load Rejection)	99	$100.0 \pm 0.0$	302
RI (Rod Insertion)	82	$100.0 \pm 0.0$	290
LLB (Letdown Line Break)	101	$100.0 \pm 0.0$	290
MD (Moderator Dilution)	100	$8.8 \pm 19.7$	—
Overall (Valid)	482	≈81	290

↑ indicates higher is better while ↓ indicates lower is better.

Table 5. Nonconformity-component ablation (mean ± std over five seeds,

α = 0.05

). The deployed k-NN density score gives the best AUROC and true unknown rate at a comparable false-alarm rate.

Table 5. Nonconformity-component ablation (mean ± std over five seeds,

α = 0.05

). The deployed k-NN density score gives the best AUROC and true unknown rate at a comparable false-alarm rate.

Nonconformity Score	AUROC (↑)	TUR (↑)	FPR
Prototype-only	$0.806 \pm 0.073$	$0.514 \pm 0.292$	$0.063 \pm 0.011$
Entropy-only	$0.775 \pm 0.074$	$0.520 \pm 0.184$	$0.056 \pm 0.018$
Hybrid (max fusion)	$0.830 \pm 0.048$	$0.565 \pm 0.232$	$0.059 \pm 0.014$
Density (deployed)	$0.884 \pm 0.051$	$0.828 \pm 0.050$	$0.050 \pm 0.013$

↑ indicates higher is better.

Table 6. Marginal sensitivity analysis. Mean F1 and stability (standard deviation) are computed across all configurations sharing each parameter value. Lower tuning gap indicates reduced sensitivity to other hyperparameters.

Parameter	Value	Mean F1 (↑)	Peak F1 (↑)	Std (↓)	Tuning Gap (↓)
Window Size (L)	30 steps	0.945	0.987	0.116	0.042
	60 steps	0.986	0.987	0.004	0.001
	90 steps	0.930	0.987	0.107	0.057
Temperature ( $τ$ )	0.05	0.970	0.987	0.043	0.017
	0.07	0.945	0.987	0.116	0.042
	0.10	0.946	0.987	0.105	0.041
Embedding Dim (D)	32	0.950	0.987	0.106	0.037
	64	0.940	0.987	0.115	0.047
	128	0.971	0.987	0.043	0.016
k-NN Neighbors (k)	5	0.954	0.987	0.095	0.034
	10	0.954	0.987	0.095	0.034
	20	0.954	0.987	0.095	0.034
k-NN Weight (w)	0.3	0.954	0.987	0.095	0.034
	0.5	0.954	0.987	0.095	0.034
	0.7	0.954	0.987	0.095	0.034

↑ indicates higher is better while ↓ indicates lower is better.

Table 7. Density-scorer k-sweep (mean ± std over three seeds): run-level AUROC and empirical false-alarm rate at

α = 0.05

. AUROC varies by <0.05 across k, confirming robustness to the neighbor count.

Table 7. Density-scorer k-sweep (mean ± std over three seeds): run-level AUROC and empirical false-alarm rate at

α = 0.05

. AUROC varies by <0.05 across k, confirming robustness to the neighbor count.

k	1	5	10	20	50
AUROC (↑)	0.868 _±0.030	0.859 ±0.033	0.852 _±0.033	0.836 _±0.032	0.826 _±0.028
FPR ( $α$ = 0.05)	$0.068$	$0.068$	$0.061$	$0.061$	$0.058$

↑ indicates higher is better.

Table 8. Leave-one-known-family-out robustness (mean ± std over three seeds). Each family is withheld and treated as unknown; the encoder is retrained on the remaining three. Calibration stays conservative (

FPR \leq α = 0.05

) in every fold.

Table 8. Leave-one-known-family-out robustness (mean ± std over three seeds). Each family is withheld and treated as unknown; the encoder is retrained on the remaining three. Calibration stays conservative (

FPR \leq α = 0.05

) in every fold.

Withheld Family	FPR (Known) (↓)	Detection of Withheld (↑)	Overall AUROC (↑)
LOCA	$0.007 \pm 0.006$	$99.8 \pm 0.3 %$	$0.897 \pm 0.027$
SGTR	$0.043 \pm 0.015$	$100.0 \pm 0.0 %$	$0.920 \pm 0.007$
MSLB	$0.029 \pm 0.017$	$36.0 \pm 21.7 %$	$0.866 \pm 0.049$
FLB	$0.046 \pm 0.012$	$100.0 \pm 0.0 %$	$0.870 \pm 0.039$

↑ indicates higher is better while ↓ indicates lower is better.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aseeri, A.O. Safety-Calibrated Out-of-Distribution Prediction via Contrastive Embeddings for Safety-Critical Systems. Electronics 2026, 15, 2408. https://doi.org/10.3390/electronics15112408

AMA Style

Aseeri AO. Safety-Calibrated Out-of-Distribution Prediction via Contrastive Embeddings for Safety-Critical Systems. Electronics. 2026; 15(11):2408. https://doi.org/10.3390/electronics15112408

Chicago/Turabian Style

Aseeri, Ahmad O. 2026. "Safety-Calibrated Out-of-Distribution Prediction via Contrastive Embeddings for Safety-Critical Systems" Electronics 15, no. 11: 2408. https://doi.org/10.3390/electronics15112408

APA Style

Aseeri, A. O. (2026). Safety-Calibrated Out-of-Distribution Prediction via Contrastive Embeddings for Safety-Critical Systems. Electronics, 15(11), 2408. https://doi.org/10.3390/electronics15112408

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Safety-Calibrated Out-of-Distribution Prediction via Contrastive Embeddings for Safety-Critical Systems

Abstract

1. Introduction

2. Related Works

2.1. Data-Driven Accident Diagnosis in Nuclear Power Plants

2.2. Out-of-Distribution Detection and Open Set Recognition

2.3. Contrastive Representation Learning

2.4. Conformal Prediction for Safety Calibration

3. Methodology

3.1. Problem Setup and Decision Rule

3.2. Causal Contrastive Representation Learning

3.3. Prototype Construction and Density-Based Nonconformity Scoring

3.3.1. Prototype Distance Score

3.3.2. Entropy Score

3.3.3. k-Nearest Neighbor Density Score

3.3.4. Deployed Nonconformity Score and Alternatives

3.4. Trajectory-Level Aggregation and Monotone Policy

3.5. Split-Conformal Calibration at the Trajectory Level

Finite-Sample Guarantee for Known Trajectories

3.6. Safety Margins for Run-Wise Exchangeability Violations

3.7. Deployed Decision Rule

4. Experimental Setup

4.1. Dataset and Preprocessing

4.2. Known and Unknown Accident Families

4.3. Data Splitting and Calibration Protocol

4.4. Evaluation Metrics

4.4.1. Safety Compliance

4.4.2. True Unknown Rate Detection

4.4.3. Classification Accuracy on Accepted Trajectories

4.4.4. Time-to-Rejection

4.5. Hyperparameter Selection and Sensitivity Analysis

5. Results and Discussion

5.1. Conformal Safety Compliance

5.2. OOD Detection Performance

5.3. Time-to-Rejection Analysis

5.4. Sensitivity Analysis

5.5. Discussion

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI