Previous Article in Journal
Redefining PET Imaging Through Nuclear Properties, Production Technologies and Scalability of Diagnostic Radionuclides
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Risk Monitoring of Small Modular Reactors by Grey-Box Models: Feature Extraction and Global Sensitivity Analysis

1
Energy Department, Politecnico di Milano, 20133 Milan, Italy
2
Centre de Recherche sur les Risques et les Crises (CRC), Department of Economics, Management and Society, MINES Paris-PSL Université Paris, 06904 Sophia Antipolis, France
*
Author to whom correspondence should be addressed.
J. Nucl. Eng. 2026, 7(2), 34; https://doi.org/10.3390/jne7020034
Submission received: 17 February 2026 / Revised: 21 April 2026 / Accepted: 30 April 2026 / Published: 7 May 2026

Abstract

Gray-Box (GB) models are being considered for risk monitoring of Small Modular Reactors (SMRs). Their effectiveness is linked to the proper selection of the model parameters. This paper proposes a systematic methodology for identifying the most influential parameters of a GB model for estimating safety-critical variables of an SMR during normal operation and accident scenarios. The GB integrates a reduced-order physics-based model (White-Box, WB) with a data-driven (Black-Box, BB) model that corrects the outputs of the WB using the condition-monitoring data collected by sensors positioned onto the SMR. The proposed method combines signal decomposition, specifically the Hilbert–Huang Transform (HHT), and global sensitivity analysis (SA), based on first-order Kucherenko indices, to quantify the contribution of non-stationary, correlated GB input parameters to the variability of the safety-critical output parameters of interest. The proposed approach is applied to the Small Modular Dual Fluid Reactor (SMDFR), and the obtained results demonstrate its effectiveness in identifying informative and physically interpretable features, reducing complexity and computational burden to enable real-time risk monitoring.

1. Introduction

Risk monitoring relies on timely and accurate estimations of the values of safety-critical parameters [1,2,3]. The compact core design of Small Modular Reactors (SMRs) challenges in-core sensor placement for directly measuring the values of safety-critical parameters for risk monitoring [4,5]. Virtual sensing [6,7] can be adopted for enabling the estimation (and prediction) of safety-critical parameters not directly measurable [8]. For the needed estimation accuracy and interpretability, Gray-Box (GB) models can be used, strategically combining physics-based White-Box (WB) models [9] with data-driven Black-Box (BB) models [7,10,11] to achieve transparency (WB) and computational efficiency (BB) [12,13].
GB models require a proper input parameter selection to obtain the expected benefits of output explainability and generalization capabilities with end-user trust, which are required for application in the nuclear power industry [14,15,16]. The systematic identification of physically meaningful and informative inputs for GB modeling is a critical issue, especially in the context of SMRs, because (i) input parameters (e.g., fuel and coolant temperature, mass flow rates) may show strong correlations, nonlinear behavior and non-stationarity [17,18,19] that may increase model complexity and reduce the GB model’s interpretability; (ii) during transients, the characteristics of the inputs change since their evolution is non-stationary [1,2,20]. Proper feature extraction and sensitivity analysis (SA) techniques are then needed to determine the subset of informative input parameters that most contribute to the variability in the GB model output [20,21,22,23,24,25]. Various feature extraction techniques have been proposed for BB models, like embedded regularization techniques such as LASSO for sparse variable identification [26], dimensionality reduction approaches such as Principal Component Analysis (PCA) and Partial Least Squares (PLS) [27], and SHapley Additive eXplanatory (SHAP) values for input relevance ranking in regression tasks [28].
In this work, we propose a systematic methodology for identifying and selecting physically relevant, minimally redundant and dynamically informative input parameters to be fed to the BB model of the GB model used within a framework of SMR risk monitoring. The methodology integrates nonlinear time-series feature extraction with a global variance-based SA. The deployment of the approach consists of two steps. First, the Hilbert–Huang Transform (HHT) [29,30], a hybrid empirical model decomposition, is applied to the nonlinear and non-stationary condition-monitoring data to extract Intrinsic Mode Functions and instantaneous spectral features of the monitored parameters. These features serve as a refined representation of the reactor’s dynamic behavior under transient and accident scenarios. Second, a global SA using first-order Kucherenko indices [31] is performed to quantify the contribution of each feature of the monitored parameters to the variability of the relevant safety-critical output parameters. Unlike classical Sobol indices [32], Kucherenko indices explicitly address correlated inputs, making them particularly suitable for SMR operational environments. This methodology, as a whole, enables the identification of a physically meaningful and interpretable set of measurable variables for GB model development, thereby improving model transparency and computational efficiency.
An application is shown with reference to the Small Modular Dual Fluid Reactor (SMDFR), a next-generation SMR concept characterized by compact design and, thus, with limited sensor placement feasibility. The obtained results demonstrate the capability of the methodology in identifying a subset of sensitive and physically meaningful features, thereby reducing the GB model’s complexity and improving its computational efficiency, rendering feasible its use for risk monitoring purposes; the method is applicable to different designs of Nuclear Power Plants (NPPs), and is most useful for SMRs, whose design challenges in-core sensor placement.
The rest of the paper is structured as follows: in Section 2, the GB model for multi-step-ahead time-series forecasting and risk monitoring is presented; Section 3 introduces the proposed feature extraction and SA methods; Section 4 describes the SMR case study; Section 5 shows the numerical results; and Section 6 concludes the paper with some final remarks.

2. Problem Formulation

We consider the risk monitoring of an SMR in relation to a safety-critical parameter y (e.g., Peak Cladding Temperature (PCT)) which may not exceed a certain threshold value during operation. The dynamics of the non-measurable safety-critical parameter y can be modeled as:
y ( t ) = g x ( t ) , u ( t ) , θ
where x ( t ) = x 1 ( t ) , x 2 ( t ) , , x m ( t ) , , x M ( t ) is the vector of M measurable input parameters (e.g., pressure, temperature, flow rate) at time t , u ( t ) are variables affecting process evolution (e.g., coolant leakage rate during a loss-of-coolant accident) at time t and θ is the vector of design parameters (e.g., heat transfer area, core geometry, fuel/coolant composition properties).
If considering time as a discrete independent variable, t   =   t 0 , t 1 , , t n , , t N , where t 0 = 0 , t n t n 1 = Δ t and t N is the end time of the transient considered, and treating each variable x m as a time-series signal for which historical observations x m ( t n r + 1 : t n ) over a window of length r are collected by sensors in X ( t n r + 1 : t n ) R M × r , a multi-step-ahead GB model (see Figure 1) [33] can be developed to predict the s -steps-ahead output y ^ G B ( t n : t n + s ) of y ( t n : t n + s ) R s :
y ^ G B ( t n : t n + s ) = y ^ W B ( t n : t n + s ) + Δ y ^ ( t n : t n + s )
where y ^ W B is the output of a reduced-order physics-based WB model, which is fed by θ , u t n and the projection up to time t n + s estimated by a BB model. The term Δ y ^ is the WB model discrepancy (error correction) between the WB model output and y . The WB model is an approximation of the SMR physics, as follows:
y ^ W B ( t n : t n + s ) = f W B x ( t n ) ,   u ( t n : t n + s ) ,   θ
where f W B · retains the essential governing reactor physics, whereas the prediction discrepancy Δ y is defined as:
Δ y ( t n : t n + s ) = y ( t n : t n + s )     y W B ( t n : t n + s )
In a GB-based framework for risk monitoring [33], Δ y ^ and u can be estimated by a data-driven BB model f B B ( · ) as follows:
Y B B ( t n : t n + s ) = f B B ( X ( t n r + 1 : t n ) )
where X ( t n r + 1 : t n ) R M × r corresponds to the memory matrix and f B B : R M × r R 2 × s . The BB model is developed to simultaneously approximate the discrepancy Δ y and the variable u (i.e., Y B B = [ Δ y ^ , u ^ ] T ) required for the WB model estimation (Equation (3)), based on r available observations taken from the monitored data. Given the above formulation, the objective of the present work is to identify and select an optimal subset of input variables x x that explain most of the variability in both Δ y and u . Selecting such a subset is essential for the efficient implementation of f B B ( · ) within the GB model, thereby ensuring reduced computational complexity and enhanced predictive performance for real-time risk monitoring of the SMR.

3. The Proposed Methodology

The proposed methodology is sketched in Figure 2. First, HHT is applied to the time series of the signal measurements in N d available accident scenarios [29,30]. Without loss of generality, it is assumed that each accident scenario d N d is initiated at a fixed time t f t N and lasts until time t N , with the measurements being taken at discrete time t = t 0 , t 0 + Δ t , , t n , , t N with constant timestep Δ t . For each scenario d , both measured variables X t n r + 1 : t n and input signals u t n + 1 : t n + s are collected, using fixed r (memory window length) and s (prediction window length) with time index n =   r , , N s (see Figure 3). At each timestep t n , the SMR model g and the WB model f W B are evaluated to compute the output vector Y B B ( t n : t n + s ) to be used as the target output of the BB model (see Section 2). Feature extraction is then performed on both the input X t n r + 1 : t n and the target output Y B B ( t n : t n + s ) of the BB model using HTT [30] (see Section 3.1).
During a second step, a global SA based on Kucherenko indices K p , q [31] is performed on the extracted features p V (input variables) and q W (output variables), where V and W are the full sets of HHT-based features extracted from the inputs and the target outputs, respectively (see Section 3.2). The input selection is performed by aggregating the computed sensitivity indices K p , q from the feature space to the physical input/output parameter space. An aggregation strategy is employed in the framework of multi-output sensitivity analysis [34] to select a physically robust subset of BB inputs x x (see Section 3.3). The relationships among the extracted features, sensitivity indices, and the final input selection are schematically summarized in Figure 4.

3.1. Feature Extraction Using Hilbert–Huang Transform (HHT)

HHT is employed to extract representative time-frequency features prior to the SA of the nonlinear and non-stationary model of the input and output signals [29,30]. The HHT is an adaptive and data-driven feature extraction technique that decomposes signals through an Empirical Mode Decomposition (EMD) method, producing Intrinsic Mode Functions (IMFs) that inherently reflect true local oscillatory modes without relying on predefined basis functions [30]. This makes the HHT particularly suitable for feature extraction and denoising of nonlinear and non-stationary signals [29].
Without loss of generality, the vector of extracted features for each m -th input and j -th output are denoted as:
f m = f 1 m , , f i m , , f I m m T
f j = f 1 j , , f l j , , f L j j T
respectively, where f m contains the vector of extracted features f i m of the i -th IMF of the m -th input feature, i = 1 , , I m . To obtain IMFs on a certain input feature m , the time-series signal must exhibit oscillatory content (i.e., a set of local extrema) so that the EMD can be applied [30,35]. For this reason, the signals are preprocessed to identify the non-decomposable input features. For each non-decomposable input signal ( x m ), the set of extracted features includes the slope coefficient of linear regression ( β ^ m ), mean ( x ^ m , 1 ) and standard deviation ( x ^ m , 2 ). In this case, then, the vector of input features renders:
f i m = β ^ m , x ^ m , 1 , x ^ m , 2
with I m = 1 , thus f m = f 1 m . For those decomposable input and output features, the EMD is conducted and a total of | I m | and | J l | IMFs are extracted, respectively, satisfying I m > 1 ,   | J l | > 1 . For each extracted i -th IMF of the m -th input, the vector of features f i m is expressed as:
f i m = f i , 1 m , , f i , v m , , f i , V i m
where f i m R V i , and f i , v m R correspond to the v -th extracted feature of the i -th IMF, where v = 1 , , V i . Similarly, the extracted features of the j -th output can be defined as:
f j = f 1 j , , f l j , , f L j j T
f l j = f l , 1 j , . . . , f l , w j , . . . , f l , W l j
For those signals of input x m ( t ) and output y B B , j that are decomposed via EMD, the IMFs satisfy:
x m t = i = 1 I m c m , i t + r i t
y B B , j t = l = 1 L j c j , l t + r l t
where c m , i t and c j , l t are the i -th and l -th IMFs capturing oscillatory behaviors at a characteristic frequency, and r i t and r l t the residual trends. The Hilbert transform is applied to each extracted IMF c ( t ) by introducing the operator T as follows:
T c t = a t , ω ( t )
where a m , i ( t )   ( a j , l ( t ) ) corresponds to the Instantaneous Amplitude (IA) of the i -th IMF of the m -th signal at time t and ω m , i ( ω j , l ( t ) ) corresponds to the instantaneous frequency (IF) of the i -th IMF of the m -th input (output) signal at time t . The operator T performs the Hilbert transform, the calculation of the analytic function and the extraction of the IA and IF (see [30]). Selected features, including mean IA ( a ¯ m , i ), IF ( ω ¯ m , i ) and Energy ( e ¯ m , i ) are extracted for each IMF and stored in f i m or f l j , correspondingly (see Section 4):
f i m = a ¯ m , i , ω ¯ m , i , e ¯ m , i
f l j = a ¯ j , l , ω ¯ j , l , e ¯ j , l
This procedure allows us to capture the nonlinear and transient dynamics of the reactor parameters, serving as explanatory input variables for the global SA, GSA (see Section 3.2). These features are not intended for direct physical interpretation by operators; rather, they are used internally for input selection (see Section 3.3). The final GB model operates on selected physical input variables, and interpretability is ensured through SHAP analysis in the input space (see Section 5.3).

3.2. Calculation of the Kucherenko Indices

Kucherenko indices are an extension of the standard Sobol indices for SA that can handle correlated inputs [31]. The dependent variables y B B for the GSA contain the available residual error data Δ y B B and the safety-critical inputs u . The aim is to quantify the contribution of each feature f m extracted from BB input vector x (and, by aggregation, of each physical/measurable input x m ) to the variance of y B B extracted features f j while accounting for correlations among features.
For each accident scenario d N d and query time index n = r , , N s , the extracted feature vectors from all BB input signals are concatenated:
v d , n = f 1 T t n , , f m T t n
where v d , n R L V . Stacking all samples across all N d scenarios produces the dataset V :
V = v d , n d = 1 , , N d ;   n = r , , N s
where V R N q × V . The extracted features are stored in the dataset V , collecting the samples for the query times indexed n = r , , N s , for each accident scenario d N d . Likewise, the corresponding BB model output increments Δ y ( t n + 1 : t n + s ) and control signal u ( t n + 1 : t n + s ) (thus forming Y B B ) are obtained to extract output features f j , stored in matrix W :
w d , n = f 1 T t n , , f j T t n
W = w d , n d = 1 , , N d ;   n = r , , N s
where w d , n R L W , and W R N q × V . After processing all N d scenarios and N query indexes for each d -th scenario, the first-order Kucherenko indices K p , q are computed in the feature space using matrices V and W , as follows:
K p , q = V a r E W q | V p V a r W q
The sensitivity indices K p , q are obtained for each feature p V and its contribution to features q W , where V and W correspond to the set of extracted features from the BB input and output, respectively. As the analytical calculation of K p , q in Equation (21) is cumbersome [36], in this work we adopt the sample-based approach proposed by [36,37].

3.3. Selection of BB Model Inputs

The selection of the subset of measurable quantities ( x x ) that will serve as inputs to the BB model is performed based on K p , q , where p V are the features extracted from the measurable inputs x m and q W are those extracted from the BB outputs. In practice, each measurable input x m is represented by a group of extracted features ( V m V ), where V m are disjoint sets satisfying V = m = 1 , , M V m , and, similarly, each j -th BB output is associated with a set of extracted features ( W j W ), with W = j = 1 , , J W j and j 1,2 . The BB input selection problem reduces to ranking the BB inputs by accounting for all features of a given BB output by a group-level sensitivity index [34,38] that aggregates feature-level sensitivity information K p , q into a single metric b m , q j for an m -th input and j -th BB output by taking the maximum K p , q across the input features p belonging to the group V m .
To do that, for each measurable input ( x m ) and for each output feature ( q W j ), we first evaluate the most influential contribution among all features extracted from x m :
b m , q j = max p V m K p , q
where b m , q j measures the maximum sensitivity index of input x m to the variance of the specific output feature q . Then, a group-level score is computed
K m j = 1 W ( j ) q     W ( j ) b m , q j
that quantifies the average explanatory power of the measurable input ( x m ) of the j -th BB output (i.e., m = argmax m ( K m j ) is the input feature that explains j -th output variability better than any other input m m ).
To obtain a robust ranking of measurable inputs accounting for multiple BB outputs, the index K m j is obtained for each BB output:
K j = max m K m j
In this way, the proposed strategy for the selection of x * rewards those signals that offer an overall advantage in terms of K p , q to explain the variability of each output feature. Therefore, x can be extracted by considering x = m : K m j = max m K m j j J .

4. Case Study

The methodology is applied with reference to the SMDFR case study [39,40]. Given the lack of experimental datasets, a High-Fidelity (HF) model is used as a proxy of the state-space model ( g · ) for generating data. The HF model is taken from [40]: it consists of a nodalization for each circuit of L = 12 nodes, h = 1 , , N h . For each h -th node, the fuel temperature T f h t , the piping wall temperature T w h t and the coolant temperature profile T κ h t are calculated along with the coolant mass flow rate ( m ˙ κ h t ). The safety critical parameter considered for risk monitoring is the maximum temperature within the tube piping system T p t at time t (i.e., a proxy of PCT):
y t = T p t = max h N h T w h t
We consider as the Initiating Event (IE) the occurrence of a loss-of-coolant accident (LOCA) in the primary coolant loop, in line with [40]. The considered variable u ( t ) and system design parameters θ for the accidental scenario modeling are listed in Table 1.
To mimic a realistic dataset, we simulate the LOCA as in [41], assuming at time t f the onset in the primary coolant circuit with rate m ˙ l e a k t and random white noise ( ϵ ):
m ˙ l e a k t = B l e a k k l e a k t t f 1 A l e a k k l e a k + ϵ i f                                     t t f 0 i f                                     t < t f
where k l e a k ~ U k _ l e a k , k ¯ l e a k , , ϵ ~ N 0 , σ ϵ , A l e a k ~ U A _ l e a k , A ¯ l e a k ,   B l e a k ~ U B _ l e a k , B ¯ l e a k .
Excessive overheating during a LOCA is prevented by the activation of the Protection System (PS) that consists of an Auxiliary Cooling System (ACS) that supplies coolant mass flow rate m ˙ A C S t . The resulting net coolant mass flow rate m ˙ κ , n e t t is given by:
m ˙ κ , n e t t = m ˙ A C S t     m ˙ l e a k t
The variables x = m ˙ l e a k , T f , 12 e , T κ , i n , T κ , 12 e that can be measured at each Δ t by means of in-place sensors suitable for metal-cooled SMRs [42] are listed in Table 2.
The GB model that is used to estimate the non-measurable parameters of interest such as the pipe wall temperature, T w h t , consists of a zero-dimensional WB model, from which the model discrepancy Δ y is calculated (see Section 2), and a BB model f B B that corresponds to a Bi-directional Long-Short Term Memory (Bi-LSTM) network for multi-step-ahead estimation of u and Δ y (see [33]). The GB model formulation for the case study is summarized in Table 3, whereas the GB model parameters are presented in Table 4. The BB and GB models use a memory window length ( r ) and prediction window length ( s ) of r = s = 200 with sampling time t s a m = 0.5 s . These settings allow us to capture the short-term evolution of the LOCA transient of interest and anticipate the enforcement of symptom-based operating procedures, while complying with computational time constraints [33].
BB input/output features were extracted from time-series signals, obtained by simulating N d accident scenarios via Monte Carlo (MC)-based random failure-injection [41]. Each scenario d N d includes a randomly sampled fault-injection time t f t N . For each scenario d , the variable matrix X t n r + 1 : t n and the input signal u t n + 1 : t n + s are collected using fixed parameters (r, s) over the time index n   =   r , , N s . At each timestep t n , the HF model g ( ) and the WB model f W B are evaluated to compute the BB model targets Y B B t n : t n + s (see Section 3, Figure 2). The parameters needed for MC failure-injection [41,43] are listed in Table 5.

5. Results

5.1. Feature Extraction with Hilbert–Huang Transform

Signals are firstly preprocessed to identify the non-decomposable input features, prior to feature extraction. As shown in Figure 5, input signals such as the inlet coolant temperature ( T κ , i n ) and outlet fuel/coolant temperature ( T f , 12 e , T κ , 12 e ) do not show oscillatory content, revealing monotonically increasing/decreasing values. These input features become non-decomposable signals and therefore the HHT is not applicable, whereas output feature signals ( m ˙ l e a k ,   Δ T p ) reveal oscillatory content that is extracted via HHT (see Figure 6).
The generated dataset is processed by applying the Hilbert–Huang Transform (HHT) to extract the features ( f i ) and ( f j ) from input signals x and output signals y B B , respectively. The list of extracted features for non-stationary signals is described in Table 6.
For non-decomposable input signals ( x m ), the features extracted are the slope coefficient of linear regression ( β ^ ), mean ( r ^ 1 ) and standard deviation ( r ^ 2 ) (see Table 7).
For the BB input signals x m , the input signal m = 1 (coolant leakage rate) was considered for feature extraction via HHT. A total of I m = 1 = 4 IMFs were extracted. The selection of IMFs is based on their relative energy contribution (i.e., highest mean values in power distribution, see Figure 7b), and its adequacy is verified through the subsequent sensitivity analysis; low-frequency information is preserved via residual-based features, which are also included in the SA. Following this criterion, IMFs 1 and 2 were selected as they carry more information in terms of signal power for both short- and long-term trends (see Figure 7).
For the rest of the BB model inputs ( T f , 12 e , T κ , 12 e , T κ , i n , where m = 2,3 , 4 , correspondingly), features such as the slope coefficient of linear regression ( β ^ m ), mean ( x ^ m , 1 ) and standard deviation ( x ^ m , 2 ) of the time series were extracted (see Table 8). As stated, a total of V = 18 features were extracted.
For the BB output signals, the outputs j 1,2 were considered for feature extraction via HHT. For j = 1 ( m ˙ l e a k ), a total of L j = 1 = 4 IMFs were extracted during the sifting process (see Figure 8), where IMFs 1 and 2 were selected as they contain more information in terms of signal power for both short- and long-term trends (see Figure 7). Following the same procedure for j = 2 ( T p ), a total of V j = 2 = 3 IMFs were extracted, where IMFs 2 and 3 were selected during the feature extraction process (see Figure 9). The extracted features are summarized in Table 9. As stated, a total of W = 18 features were extracted.

5.2. Sensitivity Analysis with Kucherenko Indices

The dataset of extracted features ( V and W ) is used to calculate the first-order Kucherenko indices. A matrix of first-order Kucherenko indices K p , q is shown in Figure 10. The obtained indices are postprocessed to select a subset of measured quantities x as BB model inputs ( X B B ), given the computational time constraints for online risk monitoring. The SA based on the first-order Kucherenko indices highlights the importance of selecting a subset of input features for the BB model, given the following findings:
  • The features extracted from the residual of leakage rate m ˙ l e a k achieve high sensitivity indices when predicting autoregressive long-term trends.
  • Among the extracted features from the temperature signals, features from the outlet coolant temperature T κ , 12 e and the inlet coolant temperature T κ , i n present higher sensitivity levels when predicting the residuals of both BB outputs (i.e., denoised long-term trends of coolant leakage rate m ˙ l e a k and the PCT estimation error Δ T p during the LOCA scenario).
  • The extracted features of the leakage rate do not substantially contribute to explaining the variability in the PCT estimation error Δ T p , which can be explained due to the nonlinear effects of hydraulic phenomena and the thermohydraulic equations of the HF model.

5.3. Selection of BB Model Inputs

Each measurable input x m corresponds to a group of features V ( m ) V , whereas each BB output j { 1,2 } (leakage rate and Δ T p ) corresponds to an output-feature group W ( j ) W . Following the strategy described in Section 3.3, the grouped sensitivity index for each measurable input is computed as stated in Equations (22)–(24). Table 10 and Table 11 show the maximum Kucherenko indices grouped by output j ( b m , q j ), from which the average score per input is computed by means of Equation (23) (see Table 12).
As stated, input signal m = 1 (coolant mass flow leakage rate, m ˙ l e a k ) is the best at explaining the overall variability of output j = 1 ( m ˙ l e a k ), whereas signal m = 4 (inlet coolant temperature, T κ , i n ) is selected to explain the variability of j = 2 (PCT estimation error, Δ T p ).
The grouped Kucherenko indices K j are then computed by means of Equation (23), obtaining K j = { K 1 1 = 0.3425 ,   K 4 2 = 0.2516 } . Based on the grouped ranking, the selected measurable inputs are:
x * = { m ˙ l e a k ,   T κ , i n }
As stated, these two signals form the selected optimal subset of BB inputs for the deployment of the GB model to be used for the monitoring of the SMDFR.
A BB model (Bi-directional Long-Short Term Memory, Bi-LSTM) [33] is trained for multi-step-ahead estimation, and embedded into the GB model. Figure 11 shows the performance (in terms of RMSE) of the GB model, when different sets of BB input variables ( x ) are considered, including all possible pairs of input variables and all input variables x = m ˙ l e a k , T κ , 12 e , T f , 12 e , T κ , i n . It can be seen that the set of x * = { m ˙ l e a k ,   T κ , i n } that has been identified by the proposed method allows the selection of a minimal yet informative set of inputs, i.e., it provides similar results to the GB model that considers all the measurable variables as input features ( x = m ˙ l e a k , T κ , 12 e , T f , 12 e , T κ , i n ).
The GB model that uses x * = { m ˙ l e a k ,   T κ , i n } is then tested against the GB model that uses x = m ˙ l e a k , T κ , 12 e , T f , 12 e , T κ , i n and x = m ˙ l e a k , T κ , 12 e in terms of computational time. The GB models are used to risk-monitor 20 random-trial LOCA scenarios (see [33] for further details): the results are summarized in Figure 12. The GB model that uses x * = { m ˙ l e a k ,   T κ , i n } complies with the computational constraints imposed by regulatory requirements (i.e., a total computing time of 120 s, with no exceptions [1]), whereas the GB model that uses x = m ˙ l e a k , T κ , 12 e , T f , 12 e , T κ , i n sometimes exceeds the constraint due to increased complexity (see the outliers ). The GB model candidate that uses x = m ˙ l e a k , T κ , 12 e satisfies the computational constraint; however, the proposed model with x * = { m ˙ l e a k ,   T κ , i n } proves to be a better candidate in terms of accuracy and computational time.
The explainability of the GB models are then also analyzed with SHAP, as proposed in [33], to quantify the contribution of each model input feature to its output at each query time t q [44,45] of a simulated accident scenario (see Table 13). Figure 13 shows the evolution of the SHAP values for the BB output features ( m ˙ l e a k and T p ) predicted by the Bi-LSTM that uses x * = { m ˙ l e a k ,   T κ , i n } within the GB model at t q = 1100 , , 1300   s . The results are summarized as follows:
  • m ˙ l e a k is shown in Figure 13 (top) to depend mostly on m ˙ l e a k since T κ ,   i n does not affect the severity of the leakage rate that is instead primarily determined by the rupture break size: initially, the rupture causes a leakage (e.g., −10 kg/s) (negative SHAP values); as the LOCA progresses and the system depressurizes, leakage reduces (e.g., −4 kg/s) (positive SHAP values). This confirms that hydraulic phenomena dominate this output, where the model correctly prioritizes historical leakage data to forecast future leakage rates.
  • T p is shown in Figure 13 (bottom) to depend on T κ ,   i n : During the early LOCA stage, less coolant inventory means less heat removal, raising the PCT and lowering T κ ,   i n . The WB model overestimates this PCT rise, requiring a lower PCT correction ( Δ T p ). The BB model uses T κ ,   i n to estimate the coolant inventory state: when T κ ,   i n is lower (less coolant inventory, e.g., at time t = 1100   s ), the model decreases the PCT error correction (lower SHAP values) due to the overestimation by the WB model; when T κ ,   i n is higher (more coolant inventory, e.g., at t = 1000   s ), the model increases the PCT correction (higher SHAP values) to compensate for the error.
The SHAP-based analysis confirms that the BB model predictions are consistent with the accident dynamics and shows that the GB model that uses x * = { m ˙ l e a k ,   T κ , i n } is explainable for risk monitoring applications. The GB model candidate x = m ˙ l e a k , T κ , 12 e has also proven to be sufficiently explainable, as shown in [33].
For comparison purposes, Figure 14 shows the SHAP values of ( m ˙ l e a k and T p ) where the BB of the GB model is fed with all measurable variables x = m ˙ l e a k , T κ , 12 e , T f , 12 e , T κ , i n as input features. Results are summarized as follows:
  • m ˙ l e a k is shown in Figure 14 (top) to depend mostly on m ˙ l e a k , confirming that hydraulic phenomena dominate this output, as previously shown in Figure 13. However, the inlet coolant temperature ( T κ ,   i n ) and the outlet fuel temperature ( T f , 12 e ) show a contribution in terms of the SHAP values during early LOCA stages (e.g., t q = 1000   s ) that cannot be explained in terms of physical phenomena, as the coolant leakage rate ( m ˙ l e a k ) is primarily injected by means of MC-based random sampling.
  • T p is shown in Figure 14 (bottom) to depend mostly on T κ , 12 e , exhibiting a similar behavior as the inlet coolant temperature T κ ,   i n in Figure 13. This is expected, as the outlet coolant temperature can be used as a proxy of the coolant inventory; however, the influence of the inlet coolant temperature ( T κ ,   i n ) and the outlet fuel temperature ( T f , 12 e ) is not clear in terms of SHAP values during late LOCA stages (e.g., t q = 1200   s ,   1250   s ).
For this reason, the explanatory power of the GB model with all measurable variables as input features lacks consistency, specifically when adding correlated variables such as the outlet fuel temperature ( T f , 12 e ). Also, as stated, including more input parameters improves the overall accuracy of the GB model but at the expense of higher computing times.
Table 14 summarizes the insights of the comparison: we can claim that the GB model that uses x * = { m ˙ l e a k ,   T κ , i n } offers a suitable tradeoff between accuracy, explainability and computational time which makes it the perfect candidate for risk monitoring the PCT of the SMDFR. As stated previously, selecting the proposed GB inputs can be critical to satisfying regulatory requirements (e.g., see Figure 12). The physical consistency of the selected inputs with the underlying LOCA dynamics suggests that the results are not specific to the adopted WB model, but rather reflect the dominant thermohydraulic phenomena, supporting the robustness of the selection process.

6. Conclusions

This work presents a methodology for the selection of the inputs to a GB model to be embedded in a framework of risk monitoring for SMRs. The methodology integrates the HHT for adaptive feature extraction from nonlinear, transient input time series, and first-order Kucherenko sensitivity indices for global SA. Tested on the SMDFR, the approach supports the selection of a subset of measurable inputs from monitoring data for the estimation of the non-measurable safety-critical parameters of interest, the PCT. By feeding the selected inputs into the GB model, computationally efficient and accurate multi-step-ahead predictions are obtained both under nominal operating conditions and simulated accident scenarios. The proposed method and input selection scheme is agnostic to specific GB model architectures and accident types, thus making it useful for scalable deployment in risk monitoring applications, where physical interpretability and real-time computational demands are key constraints. For Digital Twin applications, the feature extraction and sensitivity analysis can be performed offline and updated only when significant changes in operating conditions are detected, ensuring a balance between computational cost and sustained model performance.
Future work will include the validation of the proposed methodology using real datasets from laboratory experiments, whose signals typically contain noise and systematic errors. Some pending challenges include the scalability and computational cost of Kucherenko index estimation and HHT feature extraction for high-dimensional, high-frequency sensor data in SMRs, and the extension to online input selection under different operating conditions. As stated, the identified input subset is scenario-dependent; however, the proposed methodology can be extended to enable adaptive input selection across different transient conditions. Finally, to apply the methodology to other SMR designs, the WB and BB models should be tailored on their specific features, for example measurable inputs for performing SA.

Author Contributions

L.M.: Conceptualization, Formal analysis, Investigation, Methodology, Visualization, Writing—original draft. I.A.: Conceptualization, Methodology, Supervision, Writing—review and editing. F.D.M.: Conceptualization, Methodology, Supervision, Writing—review and editing. E.Z.: Conceptualization, Methodology, Supervision, Writing—review and editing, Project administration. All authors have read and agreed to the published version of the manuscript.

Funding

Marie Skłodowska-Curie Actions. Horizon 2020-European Training Network on Grey-Box Models for Safe and Reliable Intelligent Mobility Systems: Grant agreement No. 955393.

Data Availability Statement

Data can be made available upon request.

Conflicts of Interest

The authors declare they have no known competing financial interests or personal relationships that could have influenced this work.

Nomenclature

Abbreviations
BBBlack-Box model
Bi-LSTMBidirectional Long-Short Term Memory
CDFCumulative Density Function
EMDEmpirical Mode Decomposition
GBGray-Box model
HFHigh-fidelity model
HHTHilbert–Huang Transform
IAInstantaneous Amplitude
IFInstantaneous Frequency
IMFIntrinsic Mode Function
LOCALoss-Of-Coolant Accident
LASSOLeast Absolute Shrinkage and Selection Operator
MCMonte Carlo
MHDMagnetohydrodynamic Pump
MLMachine Learning
NPPNuclear Power Plant
PCTPeak Cladding Temperature
PLSPartial Least Squares
PKPoint Kinetic equations
PSProtection System
SASensitivity Analysis
SMDFRSmall Modular Dual Fluid Reactor
SMRSmall Modular Reactor
THThermo-Hydraulic model
WBWhite-Box model
Symbols
g SMR state-space model
x m ( t ) m -th measurable input signal
y ( t ) Model output of the SMR state-space model
y WB ( t ) White-box model output
y GB ( t ) Gray-box model output
Δ y ( t ) Modeling error estimation
y BB Vector of BB model outputs Δ y , u
u ( t ) Accident-related input signal (coolant leakage rate)
θ Vector of system design parameters
t n Discrete time step
Δ t Sampling time
N Total number of time samples
r Memory window length
s Prediction window length
t N Mission time
N d Number of simulated accident scenarios
t f Failure (fault-injection) time
X ( t n r + 1 : t n ) Memory matrix of input signals
Y BB ( t n + 1 : t n + s ) BB multi-step predicted outputs
c m , i t i -th intrinsic mode function (IMF) of input x m
c j , l t l -th IMF of output y B B , j
r m ( t ) , r j ( t ) Residual of input/output signal after Empirical Mode Decomposition
I m Number of IMFs extracted from input m
L j Number of IMFs extracted from output j
a m , i ( t ) Instantaneous Amplitude (IA) of the i -th IMF of the m -th input
a j , l ( t ) Instantaneous Amplitude of the l -th IMF of the j -th output
ω m , i ( t ) Instantaneous Frequency (IF) of the i -th IMF of m -th input
ω j , l ( t ) Instantaneous Frequency of output l -th IMF of the j -th output
a ˉ m , i , a ˉ j , l Mean Instantaneous Amplitude
ω ˉ m , i , ω ˉ j , l Mean Instantaneous Frequency
e ˉ m , i , e ˉ j , l Mean energy of each IMF
β ^ m , β ^ j Slope coefficient of linear regression of the residual
r ^ m , 1 , r ^ i , 2 Mean and standard deviation of input residual
r ^ m , 1 , r ^ j , 2 Mean and standard deviation of output residual
f m Feature vector of the m -th input
f j Feature vector of j -th BB model output
V Matrix of input feature samples
W Matrix of output feature samples
V Set of all extracted input features, v   V
W Set of all extracted output features, w   W
v d , n Input feature sample at scenario d , time n
w d , n Output feature sample at scenario d , time n
K p , q First-order Kucherenko sensitivity index (features p a n d q )
V m Subset of features corresponding to measurable input x m
W j Subset of features belonging to output j
b m , q ( j ) = m a x p V ( m ) K p , q Best feature of input m for output feature q
K m ( j ) Grouped sensitivity index for input m
K j Maximum K m ( j ) value for each output j
x * Selected measurable input set for BB model
m ˙ leak ( t ) Coolant leakage mass flow rate
k leak Leakage scale parameter
A leak Leakage shape parameter
B leak Leakage additional scale parameter
ϵ Leakage disturbance term
m ˙ A C S ( t ) Auxiliary Cooling System mass flow
m ˙ κ , net ( t ) Net coolant mass flow (ACS–leakage)
g ( ) High-fidelity model
f WB ( ) White-box model
f BB ( ) Black-box model
Δ y ( t ) = y ( t ) y WB ( t ) Modeling error
Y BB = [ Δ y , u ] BB model output vector
T f , h t Fuel temperature at node h
T w , h t Wall temperature at node h
T κ , h t Coolant temperature at node h
m ˙ κ , h t Coolant mass flow rate at node h
T p ( t ) = m a x h T w , h ( t ) Peak Cladding Temperature (PCT) proxy
T κ , in Inlet coolant temperature
T κ , out Outlet coolant temperature
T f , in Fuel inlet temperature
T f , out Fuel outlet temperature

References

  1. Nuclear Energy Agency. Risk Monitors: The State of The Art Report in their Development and Use at Nuclear Power Plants. OECD. 2004. Available online: https://www.oecd-nea.org/jcms/pl_18136/risk-monitors-the-state-of-the-art-report-soar-in-their-development-and-use-at-nuclear-power-plants-produced-on-behalf-of-the-iaea-and-the-nea-wgrisk?details=true (accessed on 6 June 2022).
  2. Coble, J.B.; Coles, G.A.; Ramuhalli, P.; Meyer, R.M.; Berglin, E.J.; Wootan, D.W.; Mitchell, M.R. Technical Needs for Enhancing Risk Monitors with Equipment Condition Assessment for Advanced Small Modular Reactors; U.S. Department of Energy: Richland, WA, USA, 2013. [Google Scholar] [CrossRef]
  3. Williams, Q.J.; Stewart, R.H.; Palmer, T.S.; Palmer, C.J.; Pope, C.; Shields, A.; Ritter, C. Selection of Sampling and Surrogate Modeling Methods for State-Point Evaluations of an AGN-201M Reactor. Nucl. Sci. Eng. 2025, 200, S391–S405. [Google Scholar] [CrossRef]
  4. Nuclear Energy Agency. Small Modular Reactors: Challenges and Opportunities. 2021. Available online: https://www.oecd-nea.org/upload/docs/application/pdf/2021-03/7560_smr_report.pdf (accessed on 29 April 2026).
  5. Fossum, K.L.; Bhowmik, P.K.; Sabharwall, P. Droplet Entrainment in Steam Supply System of Water-Cooled Small Modular Reactors: Experiment and Modeling Approaches. J. Nucl. Eng. 2024, 5, 563–583. [Google Scholar] [CrossRef]
  6. Sarran, L.; Smith, K.M.; Hviid, C.A.; Rode, C. Grey-box modelling and virtual sensors enabling continuous commissioning of hydronic floor heating. Energy 2022, 261, 125282. [Google Scholar] [CrossRef]
  7. Ahmad, I.; Ayub, A.; Kano, M.; Cheema, I.I. Gray-box soft sensors in process industry: Current practice, and future prospects in era of big data. Processes 2020, 8, 243. [Google Scholar] [CrossRef]
  8. Hossain, R.; Ahmed, F.; Kobayashi, K.; Koric, S.; Abueidda, D.; Alam, S.B. Virtual sensing-enabled digital twin framework for real-time monitoring of nuclear systems leveraging deep neural operators. npj Mater. Degrad. 2025, 9, 21. [Google Scholar] [CrossRef]
  9. Gong, L.; Peng, C.; Huang, Q. Deterministic Data Assimilation in Thermal-Hydraulic Analysis: Application to Natural Circulation Loops. J. Nucl. Eng. 2025, 6, 23. [Google Scholar] [CrossRef]
  10. Tulleken, H.J.A.F. Grey-box modelling and identification using physical knowledge and bayesian techniques. Automatica 1993, 29, 285–308. [Google Scholar] [CrossRef]
  11. Xue, Y.; Zhang, B.; Su, K.; Li, Y.; Zhu, H.; Pan, H. A preliminary study of digital twin for nuclear reactor dynamics: A synergy of machine learning and model predictive control. Eng. Appl. Artif. Intell. 2025, 153, 110940. [Google Scholar] [CrossRef]
  12. Pintelas, E.; Livieris, I.E.; Pintelas, P. A Grey-Box Ensemble Model Exploiting Black-Box Accuracy and White-Box Intrinsic Interpretability. Algorithms 2020, 13, 17. [Google Scholar] [CrossRef]
  13. Sahadath, M.H.; Cheng, Q.; Pan, S.; Ji, W. Characterization of DeepONet Performance for Neutron Transport Modeling. Nucl. Sci. Eng. 2026, 1–21. [Google Scholar] [CrossRef]
  14. IAEA. Considerations for Deploying Artificial Intelligence Applications in the Nuclear Power Industry; Technical Report No. NR-T-2.16; IAEA: Vienna, Austria, 2025. [Google Scholar] [CrossRef]
  15. Gu, W.; He, Y.; Wang, D. Critical flow break source term near the outlet of a slit. Prog. Nucl. Energy 2025, 188, 105878. [Google Scholar] [CrossRef]
  16. Yaseen, M.; Wu, X. Quantification of Deep Neural Network Prediction Uncertainties for VVUQ of Machine Learning Models. Nucl. Sci. Eng. 2023, 197, 947–966. [Google Scholar] [CrossRef]
  17. Yadav, V.; Agarwal, V.; Jain, P.; Ramuhalli, P.; Zhao, X.; Ulmer, C.; Carlson, J.; Eskins, D.; Iyengar, R. Technical Challenges and Gaps in Digital-Twin-Enabling Technologies for Nuclear Reactor Applications; U.S. Nuclear Regulatory Commission: Washington, DC, USA. Available online: https://www.nrc.gov/docs/ML2136/ML21361A261.pdf (accessed on 2 March 2025).
  18. Xiong, Q.; Du, P.; Deng, J.; Huang, D.; Song, G.; Qian, L.; Wu, Z.; Luo, Y. Global sensitivity analysis for nuclear reactor LBLOCA with time-dependent outputs. Reliab. Eng. Syst. Saf. 2022, 221, 108337. [Google Scholar] [CrossRef]
  19. Alexanderian, A.; Gremaud, P.A.; Smith, R.C. Variance-based sensitivity analysis for time-dependent processes. Reliab. Eng. Syst. Saf. 2020, 196, 106722. [Google Scholar] [CrossRef]
  20. Yu, H.; Chang, L.; Yang, M.; Chen, S.; Li, H.; Wang, J. Time series modeling and forecasting with feature decomposition and interaction for prognostics and health management in nuclear power plant. Energy 2025, 324, 135784. [Google Scholar] [CrossRef]
  21. Nguyen, H.-P.; Baraldi, P.; Zio, E. Ensemble empirical mode decomposition and long short-term memory neural network for multi-step predictions of time series signals in nuclear power plants. Appl. Energy 2021, 283, 116346. [Google Scholar] [CrossRef]
  22. Di Maio, F.; Pedroni, N.; Tóth, B.; Burgazzi, L.; Zio, E. Reliability Assessment of Passive Safety Systems for Nuclear Energy Applications: State-of-the-Art and Open Issues. Energies 2021, 14, 4688. [Google Scholar] [CrossRef]
  23. Harter, J.R.; DeHart, M.D. Uncertainty quantification and sensitivity analysis of a nuclear thermal propulsion reactor startup sequence. Front. Nucl. Eng. 2025, 4, 1628866. [Google Scholar] [CrossRef]
  24. Zio, E. Advancing nuclear safety. Front. Nucl. Eng. 2024, 2, 1346555. [Google Scholar] [CrossRef]
  25. Kobayashi, K.; Kumar, D.; Bonney, M.; Chakraborty, S.; Paaren, K.; Usman, S.; Alam, S. Uncertainty Quantification and Sensitivity Analysis for Digital Twin Enabling Technology: Application for BISON Fuel Performance Code. In Handbook of Smart Energy Systems; Springer: Cham, Switzerland, 2023; pp. 1–13. [Google Scholar] [CrossRef]
  26. Shi, W.; Machida, M.; Yamada, S.; Yoshida, T.; Hasegawa, Y.; Okamoto, K. Inverse estimation scheme of radioactive source distributions inside building rooms based on monitoring air dose rates using LASSO: Theory and demonstration. Prog. Nucl. Energy 2023, 162, 104792. [Google Scholar] [CrossRef]
  27. Roche, A. Local optimization of black-box functions with high or infinite-dimensional inputs: Application to nuclear safety. Comput. Stat. 2018, 33, 467–485. [Google Scholar] [CrossRef]
  28. Walker, C.; Ramuhalli, P.; Agarwal, V.; Lybeck, N.J.; Taylor, M. Development of Short-Term Forecasting Models Using Plant Asset Data and Feature Selection. Int. J. Progn. Health Manag. 2022, 13. [Google Scholar] [CrossRef]
  29. Huang, N.E. Chapter 1: Introduction to the Hilbert-Huang Transform and its related mathematical problems. In Hilbert–Huang Transform and Its Applications, 2nd ed.; Huang, N., Shen, S., Eds.; World Scientific Publishing: Singapore, 2014; Volume 16, pp. 1–26. [Google Scholar] [CrossRef]
  30. Huang, N.E.; Wu, Z. A review on Hilbert-Huang transform: Method and its applications to geophysical studies. Rev. Geophys. 2008, 46, RG2006. [Google Scholar] [CrossRef]
  31. Kucherenko, S.; Tarantola, S.; Annoni, P. Estimation of global sensitivity indices for models with dependent variables. Comput. Phys. Commun. 2012, 183, 937–946. [Google Scholar] [CrossRef]
  32. Sobol′, I.M. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 2001, 55, 271–280. [Google Scholar] [CrossRef]
  33. Miqueles, L.; Ahmed, I.; Di Maio, F.; Zio, E. Virtual sensing by Grey-Box modelling within an Importance Sampling Monte Carlo Dynamic Event Tree framework for risk monitoring of Small Modular Reactors. Reliab. Eng. Syst. Saf. 2026, 272, 112629. [Google Scholar] [CrossRef]
  34. Chen, L.; Huang, H. Global sensitivity analysis for multivariate outputs using generalized RBF-PCE metamodel enhanced by variance-based sequential sampling. Appl. Math. Model. 2024, 126, 381–404. [Google Scholar] [CrossRef]
  35. Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A 1998, 454, 903–995. [Google Scholar] [CrossRef]
  36. Marelli, S.; Lamas, C.; Sudret, B. UQLab User Manual—Sensitivity Analysis. 2015. Available online: https://www.uqlab.com/sensitivity-user-manual (accessed on 20 February 2025).
  37. Plischke, E.; Borgonovo, E. Fighting the Curse of Sparsity: Probabilistic Sensitivity Measures From Cumulative Distribution Functions. Risk Anal. 2020, 40, 2639–2660. [Google Scholar] [CrossRef]
  38. Benoumechiara, N.; Elie-Dit-Cosaque, K. Shapley effects for sensitivity analysis with dependent inputs: Bootstrap and kriging-based algorithms. ESAIM Proc. Surv. 2019, 65, 266–293. [Google Scholar] [CrossRef]
  39. Lewitz, J.; Huke, A.; Ruprecht, G.; Weißbach, D.; Gottlieb, S.; Hussein, A.; Czerski, K. The Dual Fluid Reactor—An Innovative Fast Nuclear-Reactor Concept with High Efficiency and Total Burnup. Int. J. Nucl. Power 2020, 65, 145–154. [Google Scholar]
  40. Liu, C.; Luo, R.; Macián-Juan, R. A New Uncertainty-Based Control Scheme of the Small Modular Dual Fluid Reactor and Its Optimization. Energies 2021, 14, 6708. [Google Scholar] [CrossRef]
  41. Miqueles, L.; Ahmed, I.; Di Maio, F.; Zio, E. Importance Sampling for Monte Carlo Dynamic Event Tree Analysis of Accident Scenarios in New-Generation Nuclear Power Plants. Nucl. Sci. Eng. 2025, 200, 1296–1322. [Google Scholar] [CrossRef]
  42. Park, J.H.; An, Y.J.; Yoo, K.H.; Na, M.G. Leak flow prediction during loss of coolant accidents using deep fuzzy neural networks. Nucl. Eng. Technol. 2021, 53, 2547–2555. [Google Scholar] [CrossRef]
  43. Wang, W.; Cammi, A.; Di Maio, F.; Lorenzi, S.; Zio, E. A Monte Carlo-based exploration framework for identifying components vulnerable to cyber threats in nuclear power plants. Reliab. Eng. Syst. Saf. 2018, 175, 24–37. [Google Scholar] [CrossRef]
  44. Neubauer, A.; Brandt, S.; Kriegel, M. Explainable multi-step heating load forecasting: Using SHAP values and temporal attention mechanisms for enhanced interpretability. Energy AI 2025, 20, 100480. [Google Scholar] [CrossRef]
  45. Lundberg, S.M.; Allen, P.G.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4768–4777. Available online: https://arxiv.org/pdf/1705.07874 (accessed on 29 April 2026).
  46. BS IEC 60737:2010; Nuclear Power Plants-Instrumentation Important to Safety-Temperature Sensors (In-Core and Primary Coolant Circuit)-Characteristics and Test Methods. International Electrotechnical Commission (IEC): Geneva, Switzerland, 2010.
  47. BS IEC 60751:2022; Industrial Platinum Resistance Thermometers and Platinum Temperature Sensors. International Electrotechnical Commission (IEC): Geneva, Switzerland; British Standards Institution: London, UK, 2022.
Figure 1. GB modeling for multi-step-ahead estimation of the safety-critical variable y .
Figure 1. GB modeling for multi-step-ahead estimation of the safety-critical variable y .
Jne 07 00034 g001
Figure 2. Proposed methodology.
Figure 2. Proposed methodology.
Jne 07 00034 g002
Figure 3. Data collection for feature extraction.
Figure 3. Data collection for feature extraction.
Jne 07 00034 g003
Figure 4. Schematic overview of the relationships among the variables and the symbols (Section 3).
Figure 4. Schematic overview of the relationships among the variables and the symbols (Section 3).
Jne 07 00034 g004
Figure 5. Examples of input signals.
Figure 5. Examples of input signals.
Jne 07 00034 g005
Figure 6. Example of output signals.
Figure 6. Example of output signals.
Jne 07 00034 g006
Figure 7. (a) Instantaneous frequency (left) and instantaneous frequency distribution weighted by IA (right), GB model input ( m ˙ l e a k ), for IMFs 1–4 unweighted and weighted by IA. (b) Power distribution of IMFs extracted for GB model input ( m ˙ l e a k ).
Figure 7. (a) Instantaneous frequency (left) and instantaneous frequency distribution weighted by IA (right), GB model input ( m ˙ l e a k ), for IMFs 1–4 unweighted and weighted by IA. (b) Power distribution of IMFs extracted for GB model input ( m ˙ l e a k ).
Jne 07 00034 g007
Figure 8. IMFs and residuals extracted for output signals, given a random sample. (Left): Coolant mass flow leakage rate ( m ˙ l e a k , with j = 1 ). (Right): Error estimation of PCT ( Δ T p , with j = 2 ).
Figure 8. IMFs and residuals extracted for output signals, given a random sample. (Left): Coolant mass flow leakage rate ( m ˙ l e a k , with j = 1 ). (Right): Error estimation of PCT ( Δ T p , with j = 2 ).
Jne 07 00034 g008
Figure 9. (a) Instantaneous frequency (left) and instantaneous frequency distribution weighted by IA (right), GB model output ( Δ T p ), for IMFs 1–4 unweighted and weighted by IA. (b) Power distribution of IMFs extracted for GB model output ( Δ T p ).
Figure 9. (a) Instantaneous frequency (left) and instantaneous frequency distribution weighted by IA (right), GB model output ( Δ T p ), for IMFs 1–4 unweighted and weighted by IA. (b) Power distribution of IMFs extracted for GB model output ( Δ T p ).
Jne 07 00034 g009
Figure 10. First-order Kucherenko index matrix.
Figure 10. First-order Kucherenko index matrix.
Jne 07 00034 g010
Figure 11. Comparison of instantiated GB models under different BB input features.
Figure 11. Comparison of instantiated GB models under different BB input features.
Jne 07 00034 g011
Figure 12. Boxplot (computing time) of proposed GB model against selected GB models.
Figure 12. Boxplot (computing time) of proposed GB model against selected GB models.
Jne 07 00034 g012
Figure 13. SHAP values during the accident scenario evolution, proposed GB model. (Top): predicted mass flow leakage rate ( m ˙ l e a k ). (Bottom): predicted PCT correction ( Δ T p ).
Figure 13. SHAP values during the accident scenario evolution, proposed GB model. (Top): predicted mass flow leakage rate ( m ˙ l e a k ). (Bottom): predicted PCT correction ( Δ T p ).
Jne 07 00034 g013
Figure 14. SHAP values during the accident scenario evolution, GB model with 4 input features. (Top): predicted mass flow leakage rate ( m ˙ l e a k ). (Bottom): predicted PCT correction ( Δ T p ).
Figure 14. SHAP values during the accident scenario evolution, GB model with 4 input features. (Top): predicted mass flow leakage rate ( m ˙ l e a k ). (Bottom): predicted PCT correction ( Δ T p ).
Jne 07 00034 g014
Table 1. List of input variables and system design parameters.
Table 1. List of input variables and system design parameters.
SymbolNotationDescription
Input variable ( u t ) m ˙ l e a k t Coolant mass flow leakage rate
Vector of system design parameters ( θ ) m ˙ κ , n o m h Nominal coolant mass flow rate in the h -th node
M κ , n o m h Nominal mass contained in the h -th node
M κ , n o m Nominal total mass of coolant
t f Failure time
k l e a k Scale parameter
A l e a k Shape parameter
B l e a k Scale parameter
ϵ Signal disturbance
Table 2. List of measurable variables ( x ) and system output ( y ).
Table 2. List of measurable variables ( x ) and system output ( y ).
NotationNotation
(HF Model)
Name
u ( t ) m ˙ l e a k Mass flow leakage rate
x ( t ) T κ , 12 e Outlet coolant temperature
T f , 12 e Outlet fuel temperature
T κ , i n Inlet coolant temperature
y ( t ) max h T w h Temperature of piping wall (PCT Proxy)
Table 3. GB model configuration, SMDFR case study.
Table 3. GB model configuration, SMDFR case study.
NotationVariableDescription
y W B T p , W B Vector of WB-based estimates of max h T w h (proxy of PCT)
y T p Vector of WB estimated errors of max h T w h (proxy of PCT)
y G B T p , G B Vector of GB-based estimates of max h T w h (proxy of PCT)
Table 4. GB and BB model parameters, SMDFR case study.
Table 4. GB and BB model parameters, SMDFR case study.
NotationDescriptionValue
r Memory data length r = 200
s Prediction window length s = 200
M Number of BB model inputs M = 4
J Number of BB model outputs J = 2
Δ y BB error correction term Δ y = Δ T p
u BB output signal u = m ˙ l e a k
Table 5. Parameters for accident scenario simulation.
Table 5. Parameters for accident scenario simulation.
NotationDescriptionValue [Units]
N d Number of simulated accident scenarios d N d = 150
Δ t Fixed sampling time Δ t = 0.5   [ s ]
t n Fixed observed time t n Δ t , 2 Δ t , , t N   [ s ]
n Fixed time index n 1 , ,   N   [ ]
t N Fixed mission time, for each scenario d t N = 1500   [ s ]
t f Failure time, sampled for each scenario d t f   ~   U n i f o r m 500 , t N   [ s ]
k l e a k Scale parameter, sampled for each scenario d k l e a k   ~   U n i f o r m 8,10   [ k g / s ]
A l e a k Shape parameter, sampled for each scenario d A l e a k   ~   U n i f o r m 9,15   [ ]
B l e a k Scale parameter, sampled for each scenario d B l e a k   ~   U n i f o r m 0.2,0.4   [ s 1 ]
Table 6. Extracted features from non-stationary signals.
Table 6. Extracted features from non-stationary signals.
FeatureExpression ( x m x ) Expression ( y B B , j y B B )
Mean of Instantaneous Amplitude for each IMF i (input) or j (output) a ¯ m , i = 1 r k = n r + 1 n a m , i t k a ¯ j , l = 1 s k = n + 1 n + s a j , l t k
Mean of instantaneous frequency for each IMF i (input) or j (output) ω ¯ m , i = 1 r k = n r + 1 n ω m , i t k ω ¯ j , l = 1 s k = n + 1 n + s ω j , l t k
Mean Energy for each IMF i (input) or j (output) e ¯ m , i = 1 r k = n r + 1 n a m , i 2 t k e ¯ j , l = 1 s k = n + 1 n + s a j , l 2 t k
Slope coefficient ( β ^ r , m , β ^ r , j ) of linear regression, mean ( r ^ m , 1 , r ^ j , 1 ) and standard deviation ( r ^ m , 1 , r ^ j , 2 ) of the residual ( r ^ m , r ^ j ). r m t = β ^ r , m t + γ ,   t t n r + 1 , , t n r ^ m , 1 = 1 r k = n r + 1 n r m t k r ^ m , 2 = 1 r k = n r + 1 n r m t k     r ^ m , 1 2 r j t = β ^ r , j t + γ ,   t t n + 1 , , t n + s r ^ j , 1 = 1 s k = n + 1 n + s r j t k r ^ j , 2 = 1 s k = n + 1 n + s r j t k     r ^ j , 1 2
Table 7. Extracted features from linear signals x m .
Table 7. Extracted features from linear signals x m .
FeatureExpression
Slope coefficient ( β ^ m ) of linear regression m -th signal x m ( t ) = β ^ m t + γ , t t n r + 1 , , t n
Mean ( x ^ m , 1 ) of m -th input signal x m x ^ m , 1 t n = 1 r k = n r + 1 n x m t k
Standard deviation ( x ^ m , 2 ) of the m -th input signal x m x ^ m , 2 t n = 1 r k = n r + 1 n x m t k x ^ m , 1 2
Table 8. Extracted features of BB model inputs ( f m ).
Table 8. Extracted features of BB model inputs ( f m ).
NotationType of Signal of BB ModelVariableExtracted FeaturesFeature Code
u Input, m = 1 Mass flow coolant leakage rate ( m ˙ l e a k ). Feature vector: f 1 IMFs 1 ( i = 1 )
  • Mean of IA
leak_mean_amp_imf1
IMFs 1 ( i = 1 )
  • Mean Power
leak_mean_power_imf1
IMFs 1 ( i = 1 )
  • Mean of IF
leak_mean_freq_imf1
IMFs 2 ( i = 2 )
  • Mean of IA
leak_mean_amp_imf2
IMFs 2 ( i = 2 )
  • Mean Power
leak_mean_power_imf2
IMFs 2 ( i = 2 )
  • Mean of IF
leak_mean_freq_imf2
Residual ( β ^ r , 1 )
  • Beta coefficient
leak_slope_residual
Residual ( r ^ 1,1 )
  • Mean
leak_mean_residual
Residual ( r ^ 1,2 )
  • Standard deviation
leak_std_residual
x Input, m = 2 Outlet coolant temperature ( T κ , 12 e ). Feature vector: f 2 Beta coefficient ( β ^ m )slope_Te_f _12
Mean ( x ^ m , 1 )mean_Te_f_12
Standard deviation ( x ^ m , 2 )std_Te_f_12
Input, m = 3 Outlet fuel temperature
( T f , 12 e ). Feature vector: f 3
Beta coefficient ( β ^ m )slope_Te_c_12
Mean ( x ^ m , 1 )mean_Te_c_12
Standard deviation ( x ^ m , 2 )std_Te_c_12
Input, m = 4 Inlet coolant temperature ( T κ , i n ) . Feature vector: f 4 Beta coefficient ( β ^ m )slope_Tin_c_12
Mean ( x ^ m , 1 )mean_Tin_c_12
Standard deviation ( x ^ m , 2 )std_Tin_c_12
Table 9. Extracted features of BB model output ( f j ).
Table 9. Extracted features of BB model output ( f j ).
Type of Signal of BB ModelVariableExtracted FeaturesFeature Code
Output j = 1 Mass flow coolant leakage rate ( m ˙ l e a k ).
Feature vector: f 1
IMFs 1 ( l = 1 )
  • Mean of IA
leak_mean_amp_imf1
IMFs 1 ( l = 1 )
  • Mean Power
leak_mean_power_imf1
IMFs 1 ( l = 1 )
  • Mean of IF
leak_mean_freq_imf1
IMFs 2 ( l = 2 )
  • Mean of IA
leak_mean_amp_imf2
IMFs 2 ( l = 2 )
  • Mean Power
leak_mean_power_imf2
IMFs 2 ( l = 2 )
  • Mean of IF
leak_mean_freq_imf2
Residual ( β ^ r , 1 )
  • Beta coefficient
leak_slope_residual
Residual ( r ^ 1,1 )
  • Mean
leak_mean_residual
Residual ( r ^ 1,2 )
  • Standard deviation
leak_std_residual
Output j = 2 Diff. of Peak Cladding Temperature ( T p ).
Feature vector: f 2
IMFs 1 ( l = 2 )
  • Mean of IA
deltaTp_mean_amp_imf2
IMFs 1 ( l = 2 )
  • Mean Power
deltaTp_mean_power_imf2
IMFs 1 ( l = 2 )
  • Mean of IF
deltaTp_mean_freq_imf2
IMFs 2 ( l = 3 )
  • Mean of IA
deltaTp_mean_amp_imf3
IMFs 2 ( l = 3 )
Mean Power
deltaTp_mean_power_imf3
IMFs 2 ( l = 3 )
  • Mean of IF
deltaTp_mean_freq_imf3
Residual ( β ^ 2 )
  • Beta coefficient
deltaTp_slope_residual
Residual ( r ^ 2,1 )
  • Mean
deltaTp_mean_residual
Residual ( r ^ 2,2 )
  • Standard deviation
deltaTp_std_residual
Table 10. Best scored Kucherenko indices b m , q 1 for features grouped by output j = 1 .
Table 10. Best scored Kucherenko indices b m , q 1 for features grouped by output j = 1 .
q = 1 q = 2 q = 3 q = 4 q = 5 q = 6 q = 7 q = 8 q = 9
b 1 , q 1 0.1570.2980.1950.2870.1540.2980.9830.4070.304
b 2 , q 1 0.1970.2720.1930.2240.2050.2710.9430.2590.163
b 3 , q 1 0.1580.1980.1460.1770.1570.2190.8850.2550.156
b 4 , q 1 0.2070.2350.1400.1440.1950.2240.8440.2590.163
Table 11. Best scored Kucherenko indices b m , q 2 for features grouped by output j = 2 .
Table 11. Best scored Kucherenko indices b m , q 2 for features grouped by output j = 2 .
q = 1 q = 2 q = 3 q = 4 q = 5 q = 6 q = 7 q = 8 q = 9
b 1 , q 2 0.0850.0340.0720.0460.0670.0130.4280.1710.176
b 2 , q 2 0.0690.0590.0980.1170.0560.0200.9650.3970.410
b 3 , q 2 0.0690.0610.0820.1180.0520.0200.9620.4390.446
b 4 , q 2 0.0680.0620.1060.1290.0550.0200.9650.4280.432
Table 12. Average scores K m j per input group.
Table 12. Average scores K m j per input group.
Output   ( j = 1 ) Output   ( j = 2 )
K 1 1 0.3425 K 1 2 0.1215
K 2 1 0.3028 K 2 2 0.2435
K 3 1 0.2611 K 3 2 0.2498
K 4 1 0.2679 K 4 2 0.2516
Table 13. Accident scenario parameters.
Table 13. Accident scenario parameters.
VariableDescriptionValue [Units]
r Memory data length r = 200   [ ]
s Prediction window length s = 200   [ ]
t N Mission time t N = 1500   [ s ]
t q Query time t q 1000 , , 1300 ;
t 1 = 1000   s : early stage of the LOCA scenario;
t 2 = 1100   [ s ] : activation of Protection System (Auxiliary Cooling System);
t 3 = 1300   [ s ] : late stage of the LOCA scenario, with enacted Protection System
t f Failure time t f = 500   [ s ]  
Table 14. Overall performance of GB model configurations.
Table 14. Overall performance of GB model configurations.
Proposed MethodFull GB Model with 4 InputsRegulatory Requirements
Accuracy (RMSE) [K] Accuracy below the tolerance error of current instrumentation (sensor: resistance temperature detector, Class A) [46,47]:
R M S E 1.89 K
Mean computing time per query t q [s] Response time of two minutes for risk monitors [1]:
t q   120 [s]
Explainability (SHAP values) Physical consistency of the model output, based on explainability techniques, to provide end-user trust [14]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Miqueles, L.; Ahmed, I.; Di Maio, F.; Zio, E. Risk Monitoring of Small Modular Reactors by Grey-Box Models: Feature Extraction and Global Sensitivity Analysis. J. Nucl. Eng. 2026, 7, 34. https://doi.org/10.3390/jne7020034

AMA Style

Miqueles L, Ahmed I, Di Maio F, Zio E. Risk Monitoring of Small Modular Reactors by Grey-Box Models: Feature Extraction and Global Sensitivity Analysis. Journal of Nuclear Engineering. 2026; 7(2):34. https://doi.org/10.3390/jne7020034

Chicago/Turabian Style

Miqueles, Leonardo, Ibrahim Ahmed, Francesco Di Maio, and Enrico Zio. 2026. "Risk Monitoring of Small Modular Reactors by Grey-Box Models: Feature Extraction and Global Sensitivity Analysis" Journal of Nuclear Engineering 7, no. 2: 34. https://doi.org/10.3390/jne7020034

APA Style

Miqueles, L., Ahmed, I., Di Maio, F., & Zio, E. (2026). Risk Monitoring of Small Modular Reactors by Grey-Box Models: Feature Extraction and Global Sensitivity Analysis. Journal of Nuclear Engineering, 7(2), 34. https://doi.org/10.3390/jne7020034

Article Metrics

Back to TopTop