Regularizing Temporal Explanations in Dynamic Neural Networks

Navakauskas, Dalius; Dumpis, Martynas

doi:10.3390/electronics15102200

Open AccessArticle

Regularizing Temporal Explanations in Dynamic Neural Networks

by

Dalius Navakauskas

^†

and

Martynas Dumpis

^*,†

Department of Electronic Systems, Vilnius Gediminas Technical University, Plytines g. 25-234, LT-10105 Vilnius, Lithuania

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2026, 15(10), 2200; https://doi.org/10.3390/electronics15102200

Submission received: 20 April 2026 / Revised: 14 May 2026 / Accepted: 15 May 2026 / Published: 20 May 2026

(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Using attribution-based priors to improve the temporal interpretability and robustness of dynamic neural networks provides a computationally efficient method that does not alter the model structure during inference. We explore explanation-guided training for timeseries classification through the introduction of attribution-sensitive loss terms that serve as regularizers for the evolution of input relevance over time. The main contributions are the Temporal Relevance Smoothness Index (TRSI) and a ratio-based loss that reduces irregular step-to-step changes in channel-aggregated absolute relevance. TRSI is compared against temporal total-variation penalties computed using Layer-wise Relevance Propagation Total Variation (LRP-TV) and Integrated Gradients Total Variation (IG-TV). Experiments on a controlled three-class subset of the Korean University Human Activity Recognition (KU-HAR) dataset using a finite impulse response neural network (FIRNN) show that TRSI yields the strongest smoothness improvement, reducing the total variation of the aggregated relevance signal from 0.768 to 0.447 (41.8%), compared with 0.667 (LRP-TV) and 0.677 (IG-TV). Robustness tests indicate a clear advantage for TRSI under impulsive and white Gaussian test-time noise.

Keywords:

explanation-guided training; attribution priors; temporal smoothness; layer-wise relevance propagation; integrated gradients; time-series classification; Human Activity Recognition

1. Introduction

Time-series classification models are increasingly deployed in settings in which decisions must be both accurate and inspectable, such as wearable sensing, biomedical monitoring, safety-critical automation, and behavior monitoring systems that rely on temporal cues for activity recognition [1,2,3]. Dynamic artificial neural network models are particularly well-suited for such tasks since they can implicitly learn to exploit raw temporal relationships [4]. However, the internal decision-making process of such models is not interpretable.

Explainable artificial intelligence (XAI) methods overcome this limitation by providing relevance or attribution scores for input features and time instances [5]. The layer-wise relevance propagation (LRP) technique and the integrated gradients (IG) method are widely used for visualizing what parts of the input drive a prediction [6,7]. Attribution-based diagnostics in a similar vein have also been used for turning sequence-model decisions into trait-resolved evidence via attention and IG-style checks [8]. However, post hoc explanations alone for a machine learning model do not guarantee the faithfulness of the underlying learned model or the robustness of explanations. That is, a model can still have high predictive performance while relying on brittle or false evidence and producing attribution maps that are noisy, change dramatically under minimal input alterations, or are inconsistent across similar samples [9,10]. This is particularly the case for sequential data, as the relevant information often spans multiple widely spaced time steps rather than consecutive time segments, rendering explanations hard to interpret and untrustworthy for downstream tasks.

A recent research direction considers explanations as things that can be trained and adds losses that regularize attribution and behavior explanations. Erion et al. introduced axiomatic attribution priors and, using expected gradients, demonstrated that explanation regularization could enhance performance and interpretability across various fields [11]. Other approaches constrain explanations to match prior knowledge or desired reasoning patterns [12], reduce saliency noise through training-time consistency objectives [13], or preserve attribution structure under compression [14]. More recently, LRP itself has been used as an optimization target to reduce background bias by discouraging irrelevant relevance allocation [15]. For time-series, contrastive frameworks have been proposed to improve the identifiability of attribution maps under specific assumptions [16]. Despite this progress, most attribution priors are formulated for spatial data (e.g., smoothness over image neighborhoods) or for feature-level stability, and comparatively fewer methods explicitly regularize the temporal smoothness of relevance trajectories produced by dynamic models.

This work focuses on attribution-based regularization for dynamic neural networks operating on temporal windows and targets interpretability in the time dimension. We introduce a Temporal Relevance Smoothness Index (TRSI) and a corresponding training prior that penalizes dispersion of the first-order temporal relevance differences relative to the mean relevance magnitude. The prior is defined on channel-aggregated absolute relevance, which makes it applicable when explanations are computed per channel and time step and then summarized for temporal analysis. In addition, we implement and evaluate two comparison objectives: (i) a temporal total-variation prior computed from IG-TV, based on the expected-gradients formulation [11], and (ii) an analogous temporal total-variation prior computed from LRP-TV. The methods are assessed under a controlled experimental protocol on a time-series classification, with performance metrics reported alongside explanation smoothness measures.

This study is designed as a controlled methodological evaluation rather than as a state-of-the-art Human Activity Recognition (HAR) benchmarking study. It aims to examine whether temporal attribution regularization can modify the time evolution of relevance maps while preserving predictive performance. Therefore, the experimental design intentionally uses a controlled three-class KU-HAR subset and a fixed FIRNN architecture. This setting allows the effect of the proposed TRSI prior to be isolated from confounding factors such as architecture selection, class-space complexity, and dataset heterogeneity. Broader validation on larger class sets, additional datasets, and recurrent or attention-based architectures is left as future work.

The paper continues as follows: Section 2 presents related work on attribution priors and explanation-guided training. We detail the dataset, model configuration, attribution computation, and the proposed TRSI regularizer with baseline smoothness objectives in Section 3. In Section 4, we report the experimental results, including predictive performance, and quantitative and qualitative analysis of explanation smoothness. Section 5 discusses practical implications and limitations. Section 6 summarizes the main findings.

2. Related Work

This section reviews previous work on XAI for time-series classification and on training-time objectives that directly control the explanations produced. We specifically concentrate on attribution-based methods and losses that affect the location and the nature of responsibility over time.

2.1. Explainability for Time-Series Classification

For sensor-based time-series tasks (including HAR), post hoc attribution methods are most commonly employed to determine which input channels and time intervals contribute to a prediction. Recent reviews have indicated that both gradient-based and relevance propagation methods are still the most prevalent families in time-series explainability, and also highlighted that the quality of explanations might drop due to instability and sensitivity to perturbations [5]. In sequential data, this issue becomes particularly apparent, as relevancy is often fragmented into isolated time instances rather than forming coherent and connected temporal segments. This diminishes interpretability and makes debugging harder.

A related practical issue is that single-time samples may be poorly interpretable even when the attribution method is faithful. One recent approach, therefore, inserts a virtual inspection layer to propagate relevance into an interpretable representation (e.g., frequency or time–frequency), enabling LRP- and IG-style explanations that better expose model strategies and spurious cues, and evaluating them via feature-flipping protocols adapted to time-series representations [17].

In time-series classification, including HAR settings, attribution analysis is also increasingly used comparatively across temporal model families [5]: LRP applied at both input and hidden layers has been shown to reveal not only which sensor axes dominate decisions, but also how relevance is distributed internally across recurrent units versus FIR-like structures, highlighting architecture-dependent explanation patterns that are not apparent from accuracy alone.

2.2. Explanation-Guided Training and Attribution Priors

A growing research direction treats explanations not only as diagnostic outputs but also as trainable objects that can be shaped during training. This direction is partly motivated by the instability of post hoc attributions, which has led to losses that explicitly constrain how relevance is distributed. One approach is to add attribution priors to the task loss so that the model is encouraged to rely on more desirable evidence patterns. Expected-gradients priors and related objectives have shown that attribution regularization can improve model behavior in multiple domains [11]. Other work constrains explanations to better match prior knowledge about relevant inputs (right-for-the-right-reasons) [12], or reduces noisy saliency patterns through training-time consistency objectives [13]. In addition, relevance-propagation signals themselves can be optimized as additional training targets. For example, LRP-based optimization has been used to discourage undesirable relevance allocation and improve robustness under specific biases [15]. Related work has also proposed explanation-constrained losses that enforce general desired attribution properties such as locality, fidelity, and symmetry during training, improving saliency-map reliability under evaluation protocols such as MoRF and ROAR [18].

More recent studies have sharpened this view by targeting explanation reliability as a first-class training goal. Chen et al. proposed the Residual RETargeting network, which trains models to produce stable explanations under perturbations by regularizing the ranking of top-k salient features, noting that

ℓ_{p}

distances can miss meaningful changes in which features are considered important [19]. From a complementary angle, Ferreira et al. revisited explanation regularization using human rationales and showed that reported out-of-domain gains do not necessarily imply a stronger reliance on the annotated plausible tokens when measured using attribution methods different from the one used during guidance [20]. Finally, although demonstrated in vision transformers, optimizing relevance maps during training has also been linked to robustness gains, reinforcing the broader principle that attribution-aligned objectives can shape both model behavior and its explanations [21]. Overall, these studies support the idea that explanation-guided losses can influence not only how explanations look but also the evidence the model learns to rely on.

2.3. Temporal Regularization as a Remaining Gap

Despite progress on attribution priors, many regularizers are designed for spatial smoothness (e.g., images) or for feature-level constraints, whereas fewer methods explicitly target the temporal behavior of relevance in time-series models. This gap matters in dynamic neural networks, where interpretability often depends on whether relevance evolves smoothly and consistently over time, rather than appearing as rapidly fluctuating step-to-step patterns. The present work addresses this gap by introducing TRSI as a temporal smoothness index and a corresponding training loss, and by comparing it against temporal total-variation penalties computed from LRP and IG attributions under a controlled HAR protocol.

Table 1 summarizes how the present work is positioned relative to recent explanation-guided and attribution-prior approaches. A direct numerical performance comparison across these studies would be misleading because they use different datasets, architectures, attribution methods, and evaluation protocols. Therefore, we compare the methodological focus instead and evaluate IG-TV and LRP-TV directly within the same FIRNN–KU-HAR setting used for TRSI.

3. Materials and Methods

This section introduces the dataset, preprocessing, FIRNN architecture, training process, and explainability framework. Raw inertial windows from KU-HAR were classified using an FIRNN, and the results were interpreted with LRP and IG to obtain time-resolved input relevance values. These attributions were used during training via temporal priors (LRP-TV, IG-TV), including the proposed TRSI objective. Results are reported together with quantitative measures of explanation smoothness and robustness under test-time sensor noise.

3.1. Dataset and Preprocessing

We used the Korean University Human Activity Recognition (KU-HAR) dataset [22] to conduct the experiments. This dataset involves inertial data collected via a smartphone sensor with a sampling frequency of 100 Hz. Each reading is composed of tri-axial accelerometer signals (m/s²) and tri-axial gyroscope signals (rad/s). In total, KU-HAR encompasses 18 activities carried out by 90 users (75 men and 15 women). The original recordings (1945 raw trials) were windowed into 3 s segments, generating 20,750 sub-sequences. Each window contains 100 timestamps per axis, i.e., 600 values per segment, considering all six dimensions.

We simplified our setting by selecting the sit-up, walk, and stairs-up classes since these activities have distinct and repetitive motions. This makes the classification problem easier and allows for clearer conditions to facilitate the subsequent interpretation of model predictions. We used the raw sensor signals without manual feature extraction to preserve temporal structure.

Before training the network, we performed data preprocessing. We first removed segments that included at least one zero-valued entry in any of the six channels to eliminate potentially corrupted or disconnected sensor intervals. We then applied class balancing by truncating the number of samples in each activity to the number of samples in the activity with the fewest samples to achieve equal representation in the three classes. The resulting data were randomly partitioned into training (70%), validation (15%), and test (15%) subsets.

Class balancing by truncation was used to keep equal class priors during this controlled comparison, and to prevent the temporal smoothness metrics from being dominated by the most frequent activity. This choice supports a direct method-to-method comparison under matched class distributions. In larger-scale evaluations, the same regularizer can be combined with class-weighted losses, stratified sampling, or data augmentation to preserve all available samples while controlling class imbalance.

Z-score normalization was utilized in the training phase to normalize the scale of the input data. This step was implemented independently for each sensor channel and activity class. For channel i at time t, the normalized input value

s_{i}^{(0)} (t)

was calculated as

s_{i}^{(0)} (t) = \frac{s_{i}^{(0), raw} (t) - μ_{i}}{σ_{i}},

(1)

where

s_{i}^{(0), raw} (t)

denotes the raw measurement, and

μ_{i}

and

σ_{i}

are the mean and standard deviation estimated from all data points of the corresponding class and channel [23]. This normalization centers each channel around zero and rescales it to unit variance, which supports stable optimization during neural network training [24].

3.2. Finite Impulse Response Neural Network

An FIRNN was used in this study because it implements temporal memory in the form of a fixed history of past samples using FIR synapses, with no recurrent state dynamics [25]. This somewhat unusual model structure makes its internal computation of predictions particularly transparent: each hidden unit is combined with specific input channels and a fixed set of past samples through direct, linear coupling. Unlike recurrent architectures, where hidden-state feedback makes the attribution path depend recursively on previous states, the FIRNN represents temporal memory through explicit finite-order synaptic filters. This makes it possible to trace how each input channel and delayed sample contributes to the output and to define temporal relevance priors in a controlled setting. Therefore, the FIRNN is used here as an interpretable dynamic testbed for the proposed regularizer, rather than as a claim of architectural superiority over LSTM, GRU, or Transformer-based models.

Let

s^{(0)} (t) \in R^{N^{(0)}}

denote the multichannel input at time t, where

N^{(0)}

is the number of input channels and

t = 1, \dots, N_{T}

indexes positions inside a window of length

N_{T}

. The FIR synapse from input channel i to hidden neuron j is defined by a filter of order

N_{M}

:

{\tilde{s}}_{j, i}^{(1)} (t) = \sum_{m = 0}^{N_{M} - 1} w_{j, i, m}^{(1)} s_{i}^{(0)} (t - m),

(2)

where

s_{i}^{(0)} (t) = 0

for

t \leq 0

. The quantity

N_{M}

denotes the FIR synapse order, thus specifying the temporal memory length of the model, while

m = 0, \dots, N_{M} - 1

indexes the FIR taps. Furthermore,

w_{j, i, m}^{(1)}

is the mth FIR synaptic weight from input channel i to hidden neuron j, and

s_{i}^{(0)} (t)

denotes the input signal of channel i at discrete time step t. The hidden-layer pre-activation of neuron j is then

{\overset{˚}{s}}_{j}^{(1)} (t) = \sum_{i = 1}^{N^{(0)}} {\tilde{s}}_{j, i}^{(1)} (t) + b_{j}^{(1)} = \sum_{i = 1}^{N^{(0)}} \sum_{m = 0}^{N_{M} - 1} w_{j, i, m}^{(1)} s_{i}^{(0)} (t - m) + b_{j}^{(1)},

(3)

for

j = 1, \dots, N^{(1)}

, where

N^{(1)}

is the number of hidden neurons. Hidden activations were computed using the logistic sigmoid

Φ (\cdot)

:

s_{j}^{(1)} (t) = Φ ({\overset{˚}{s}}_{j}^{(1)} (t)) .

(4)

The output-layer pre-activation for class k is

{\overset{˚}{s}}_{k}^{(2)} (t) = \sum_{j = 1}^{N^{(1)}} w_{k, j}^{(2)} s_{j}^{(1)} (t) + b_{k}^{(2)},

(5)

for

k = 1, \dots, N^{(2)}

, where

N^{(2)}

is the number of classes. The corresponding posterior probability is obtained by the softmax function:

{\hat{s}}_{k}^{(2)} (t) = \frac{exp ({\overset{˚}{s}}_{k}^{(2)} (t))}{\sum_{l = 1}^{N^{(2)}} exp ({\overset{˚}{s}}_{l}^{(2)} (t))} .

(6)

The predicted class is

s^{(2)} (t) = \underset{k}{arg max} {\hat{s}}_{k}^{(2)} (t) \in {1, \dots, N^{(2)}} .

(7)

where

w_{k, j}^{(2)}

denotes the weight connecting hidden neuron j to output neuron k,

b_{k}^{(2)} \in R

is the corresponding output bias, and

{\hat{s}}_{k}^{(2)} (t)

is the posterior probability assigned to class k at time index t. Accordingly,

s^{(2)} (t)

denotes the predicted class label, while

N^{(2)}

is the number of output classes. In the experiments, the FIRNN was applied to overlapping input windows of length

N_{T} = 100

extracted from the normalized six-channel sensor signal, and classification was performed at the window level.

3.3. Attribution Computation and Explanation-Guided Training

This subsection describes how input relevance was computed for FIRNN windows and how the resulting attributions were incorporated into training. We used two input-level attribution methods (LRP and IG) to produce time-resolved relevance maps, and then defined temporal penalties on these relevance sequences (LRP-TV, IG-TV) together with the proposed TRSI prior. All priors were computed for the ground-truth class on each training window and added to the cross-entropy objective.

3.3.1. LRP for FIRNN Input Relevance

LRP was used to redistribute the selected class prediction backward through the network and obtain a relevance value for each input sample. In this work, the

ε

-rule [6] was applied. Relevance was first assigned to the selected output class and then propagated backward through the output and hidden layers. Since these steps follow the standard LRP redistribution principle, they are not expanded here. The main difficulty in FIRNN arises in the final propagation to the input layer, because each hidden neuron receives contributions from multiple delayed samples through FIR taps.

To preserve the FIR structure during relevance propagation, the hidden FIR sum without bias was written as

S_{j}^{(1)} (t) = \sum_{i = 1}^{N^{(0)}} \sum_{m = 0}^{N_{M} - 1} w_{j, i, m}^{(1)} s_{i}^{(0)} (t - m),

(8)

where

s_{i}^{(0)} (t) = 0

for

t \leq 0

. Here,

N_{M}

denotes the FIR synapse order, thus specifying the temporal memory length of the model, while

m = 0, \dots, N_{M} - 1

indexes the FIR taps,

w_{j, i, m}^{(1)}

is the mth FIR synaptic weight from input channel i to hidden neuron j, and

s_{i}^{(0)} (t)

denotes the input signal of channel i at discrete time step t.

The resulting input relevance for channel i at time step t was then computed by accumulating all delayed contributions from hidden neurons and FIR taps:

r_{LRP, i}^{(0)} (t) = \sum_{j = 1}^{N^{(1)}} \sum_{m = 0}^{N_{M} - 1} 1 {1 \leq t + m \leq N_{T}} ρ_{j, i, m} (t + m, t),

(9)

where

1 {\cdot}

is the indicator function and

ρ_{j, i, m} (t + m, t)

denotes the relevance contribution assigned from hidden neuron j at time step

t + m

to input channel i at time step t through the mth FIR tap. Using the

ε

-stabilized LRP rule, this contribution was defined as

ρ_{j, i, m} (t + m, t) = \{\begin{matrix} \frac{r_{LRP, j}^{(1)} (t + m)}{S_{j}^{(1)} (t + m)} w_{j, i, m}^{(1)} s_{i}^{(0)} (t), & if |S_{j}^{(1)} (t + m)| > ε_{LRP}, \\ \frac{r_{LRP, j}^{(1)} (t + m)}{N_{M}}, & otherwise, \end{matrix}

(10)

where

r_{LRP, j}^{(1)} (t + m)

is the hidden-layer relevance of neuron j at time step

t + m

, and

ε_{LRP} > 0

is a small stabilizer used to avoid numerical instability when the hidden FIR sum is close to zero.

In the experiments, the LRP stabilizer was fixed to

ε_{LRP} = 10^{- 9}

. This value was used as a numerical safeguard for small denominators in the relevance redistribution rule. A separate sensitivity analysis over

ε_{LRP}

was not performed, and the effect of this parameter on temporal smoothness is therefore left for future work.

This formulation preserves the FIR memory structure during relevance propagation, because one input sample may contribute to several hidden-layer activations at later time steps through different FIR taps. The resulting input-level relevance tensor

r_{LRP, i}^{(0)} (t)

therefore provides a time-resolved attribution over all input channels and window positions, and was used in the subsequent smoothness analysis and in the definition of attribution-based temporal priors.

3.3.2. Integrated Gradients for FIRNN Input Relevance

IG was used as a gradient-based attribution method to obtain input relevance maps for FIRNN windows. For an input window

s^{(0)} \in R^{N^{(0)} \times N_{T}}

, a baseline

s_{ref}^{(0)}

, and the selected class

s_{true}^{(2)}

, the attribution assigned to input channel i at time step t is defined as

r_{IG, i}^{(0)} (t) = (s_{i}^{(0)} (t) - s_{ref, i}^{(0)} (t)) \int_{0}^{1} \frac{\partial {\hat{s}}_{true}^{(2)}}{\partial s_{i}^{(0)} (t)} |_{s_{α}^{(0)}} d α,

(11)

where

s_{α}^{(0)}

denotes the interpolated input between the baseline and the actual window. Using the adopted notation, the interpolated input sample for channel i at time step t is

s_{α, i}^{(0)} (t) = s_{ref, i}^{(0)} (t) + α (s_{i}^{(0)} (t) - s_{ref, i}^{(0)} (t)) .

(12)

In practice, the integral in (11) was approximated using K interpolation points:

r_{IG, i}^{(0)} (t) \approx (s_{i}^{(0)} (t) - s_{ref, i}^{(0)} (t)) \frac{1}{K} \sum_{k = 1}^{K} \frac{\partial {\hat{s}}_{true}^{(2)}}{\partial s_{i}^{(0)} (t)} |_{s_{α_{k}}^{(0)}},

(13)

where

α_{k} \in [0, 1]

denotes the kth interpolation point. Here,

s_{i}^{(0)} (t)

and

s_{ref, i}^{(0)} (t)

denote the actual and baseline values, respectively, for input channel i at time step t.

This formulation yields an input-level relevance tensor

r_{IG, i}^{(0)} (t)

with the same temporal and channel dimensions as the FIRNN input window. In the experiments, IG was computed per training window and per selected class using automatic differentiation. The baseline was the zero signal, and the number of interpolation points was fixed to

K = 20

.

3.3.3. Temporal Relevance Smoothness Index

A new Temporal Relevance Smoothness Index (TRSI) is proposed to quantify temporal regularity of attribution signals while preserving their overall magnitude. In this study, TRSI was defined from LRP input relevance. This choice fixes the present implementation of TRSI to the LRP relevance signal used throughout the study. However, the ratio-based structure of the index only requires a time-resolved relevance envelope; therefore, the same formulation can be adapted to other attribution methods that provide input-level temporal relevance maps. We first aggregated the absolute input relevance over all channels:

r_{agg} (t) = \sum_{i = 1}^{N^{(0)}} |r_{LRP, i}^{(0)} (t)|, t = 1, \dots, N_{T} .

(14)

Next, the first-order temporal difference was defined as

d_{agg} (t) = r_{agg} (t) - r_{agg} (t - 1), t = 2, \dots, N_{T},

(15)

and its absolute value as

a_{agg} (t) = |d_{agg} (t)|, t = 2, \dots, N_{T} .

(16)

The mean aggregated relevance was

m_{agg} = \frac{1}{N_{T}} \sum_{t = 1}^{N_{T}} r_{agg} (t) + ε_{m},

(17)

where

ε_{m} > 0

is a small stabilizer. The mean absolute temporal difference was

μ_{a} = \frac{1}{N_{T} - 1} \sum_{t = 2}^{N_{T}} a_{agg} (t),

(18)

and the corresponding standard deviation term was computed as

s_{agg} = \sqrt{\frac{1}{N_{T} - 1} \sum_{t = 2}^{N_{T}} {(a_{agg} (t) - μ_{a})}^{2} + ε_{σ}},

(19)

where

ε_{σ} > 0

is a numerical stabilizer.

The TRSI was then defined as

T R S I = \frac{m_{agg}}{s_{agg}} .

(20)

3.3.4. Temporal Explanation Priors for Training Regularization

To regularize temporal explanations during training, we augmented the standard cross-entropy objective with an attribution-dependent penalty computed from input relevance signals. For each input window, an attribution map was computed for the selected class and used to define a temporal prior term

E_{TP}

. The final training objective was

E = E_{CE} + λ E_{TP},

(21)

where

E_{CE}

is the cross-entropy loss and

λ

controls the strength of the temporal prior.

LRP-TV prior. The first prior was based on a total-variation penalty [26] applied to LRP input relevance. Given the input-level relevance scores

r_{LRP, i}^{(0)} (t)

obtained by LRP for the selected class, we penalized first-order temporal variation using the normalized

ℓ_{1}

form

E_{TP}^{LRP-TV} = \frac{1}{N^{(0)} (N_{T} - 1)} \sum_{i = 1}^{N^{(0)}} \sum_{t = 2}^{N_{T}} |r_{LRP, i}^{(0)} (t) - r_{LRP, i}^{(0)} (t - 1)| .

(22)

This loss was evaluated on the relevance sequence associated with the selected class for each training window.

IG-TV prior. The second prior used the same functional form, but with attributions computed by integrated gradients:

E_{TP}^{IG-TV} = \frac{1}{N^{(0)} (N_{T} - 1)} \sum_{i = 1}^{N^{(0)}} \sum_{t = 2}^{N_{T}} |r_{IG, i}^{(0)} (t) - r_{IG, i}^{(0)} (t - 1)| .

(23)

Here,

r_{IG, i}^{(0)} (t)

denotes the IG attribution of input channel i at time step t for the selected class.

TRSI prior. The third prior was derived from the TRSI, which explicitly balances attribution magnitude against temporal roughness. During training, we minimized the reciprocal form

E_{TP}^{TRSI} = \frac{1}{T R S I} = \frac{s_{agg}}{m_{agg}} .

(24)

This formulation penalizes dispersion of step-to-step relevance changes while discouraging trivial solutions that uniformly suppress relevance magnitudes. Small numerical stabilizers were added in implementation to avoid division by zero.

3.4. Network Training

We trained FIRNN models either without attribution regularization (

λ = 0

) or with attribution-based temporal priors (

λ > 0

), while keeping all other settings fixed to enable fair comparison.

3.4.1. Objective Function and Regularization Weighting

For each training window, optimization was performed using the total objective defined in (21). In the regularized setting, the temporal prior term

E_{TP}

was chosen as one of the three alternatives introduced above, namely LRP-TV, IG-TV, or TRSI. The weighting coefficient

λ

was fixed within each experiment and controlled the relative contribution of the temporal prior to the classification objective.

The cross-entropy term was

E_{CE} = - log ({\hat{s}}_{true}^{(2)}),

(25)

where

{\hat{s}}_{true}^{(2)}

denotes the predicted posterior probability of the selected class

s_{true}^{(2)}

.

This formulation enabled direct comparison between the baseline and explanation-guided training under otherwise identical conditions.

3.4.2. Optimization Details

Training was performed using sliding windows extracted from the input stream. For each window, a forward pass computed the class probabilities via the softmax output, and the attribution map was computed for the same window and selected class.

We used stochastic gradient descent (SGD) with an adaptive learning-rate heuristic based on changes in the mean training cross-entropy across epochs. Let

{\bar{E}}_{CE} (e)

denote the mean training cross-entropy at epoch e. The learning rate

η

was updated as

η \leftarrow \{\begin{matrix} \min (η γ_{1}, η_{\max}), & if {\bar{E}}_{CE} (e) - {\bar{E}}_{CE} (e - 1) < - τ, \\ \max (η γ_{0}, η_{\min}), & if {\bar{E}}_{CE} (e) - {\bar{E}}_{CE} (e - 1) > τ, \\ η, & otherwise, \end{matrix}

(26)

where

τ

is a tolerance threshold,

γ_{1} > 1

is the learning-rate increase factor, and

γ_{0} \in (0, 1)

is the decrease factor. If the training cross-entropy increased beyond

τ

, parameters were reverted to their pre-epoch values before continuing with the reduced learning rate.

In the experiments, the learning-rate parameters were fixed to

η_{\max} = 0.0011

,

η_{\min} = 10^{- 5}

,

τ = 5 \times 10^{- 4}

,

γ_{1} = 1.03

, and

γ_{0} = 0.5

. The patience parameter for early stopping was set to 5 epochs. This schedule was used as a practical stabilization heuristic for repeated attribution-regularized training and is not claimed to be theoretically optimal.

Early stopping was applied using the total validation loss [27]. Training was terminated if the validation loss did not improve for a fixed number of consecutive epochs (patience), and the best-performing parameters were restored at the end.

The complete training procedure is summarized in Algorithm 1. The same general procedure was used for all temporal priors; only the definition of

E_{TP}

was changed between LRP-TV, IG-TV, and TRSI. For TRSI, the temporal prior corresponds to the reciprocal form in (24).

Algorithm 1. Training with temporal explanation prior

1:: Input: Training windows $s^{(0)}$ , ground-truth labels $s_{true}^{(2)}$ , FIRNN parameters $θ$ , temporal-prior weight $λ$
2:: for each independent run do
3:: Initialize FIRNN parameters $θ$
4:: Set initial learning rate $η$
5:: for each epoch do
6:: Store the pre-epoch parameter values $θ_{prev}$
7:: for each training window do
8:: Compute class posterior probabilities ${\hat{s}}_{k}^{(2)}$ by the FIRNN forward pass
9:: Compute the cross-entropy loss $E_{CE}$ using (25)
10:: Compute the input relevance map $r_{LRP, i}^{(0)} (t)$ for the ground-truth class using (9)
11:: Compute the temporal prior $E_{TP}$ using LRP-TV, IG-TV, or TRSI
12:: Compute the total loss $E = E_{CE} + λ E_{TP}$
13:: Update FIRNN parameters $θ$ by SGD using the gradient of $E$
14:: end for
15:: Compute the mean training cross-entropy ${\bar{E}}_{CE} (e)$
16:: Adapt the learning rate according to (26); if the training cross-entropy increases beyond $τ$ , restore $θ_{prev}$
17:: Compute the total validation loss $E_{CE} + λ E_{TP}$
18:: Update the best parameter snapshot if the total validation loss improves
19:: Stop training if the validation patience limit is reached
20:: end for
21:: Restore the best parameter snapshot and evaluate the model on the test set
22:: end for

In the TRSI configuration,

E_{TP}

is computed from channel-aggregated absolute LRP relevance according to (14)–(24). Thus, the optimization minimizes the dispersion of temporal relevance changes relative to the mean relevance magnitude, while the model architecture itself remains unchanged during inference.

3.5. Evaluation Metrics

Model performance was evaluated at the window level using the parameter set that achieved the lowest validation objective. Let

N_{W}

denote the number of test windows,

y_{j} \in {1, \dots, N^{(2)}}

the ground-truth class of window j, and

{\hat{y}}_{j}

the predicted class. Overall test accuracy was computed as

A = \frac{1}{N_{W}} \sum_{j = 1}^{N_{W}} 1 \{{\hat{y}}_{j} = y_{j}\} .

(27)

Per-class accuracy for class k was computed as

A_{k} = \frac{1}{N_{k}} \sum_{j = 1}^{N_{W}} 1 \{y_{j} = k\} 1 \{{\hat{y}}_{j} = k\}, N_{k} = \sum_{j = 1}^{N_{W}} 1 \{y_{j} = k\} .

(28)

Across

N_{runs}

independent runs, a scalar metric

m_{r}

from run r was summarized by its sample mean and standard deviation:

μ_{m} = \frac{1}{N_{runs}} \sum_{r = 1}^{N_{runs}} m_{r}, σ_{m} = \sqrt{\frac{1}{N_{runs} - 1} \sum_{r = 1}^{N_{runs}} {(m_{r} - μ_{m})}^{2}} .

(29)

Because some training settings occasionally produced failed runs, the median accuracy was also reported. In addition to descriptive statistics, statistical comparisons across independent runs were performed for the all-class temporal relevance roughness values. Since the distributions may be non-Gaussian due to occasional unstable runs, the Kruskal–Wallis test was used to assess global differences between training objectives [28]. Pairwise comparisons between TRSI and the other objectives were then performed using the Mann–Whitney test, also known as the Wilcoxon rank-sum test, with Holm correction for multiple comparisons [29].

Explanation smoothness. To quantify the temporal roughness of explanations, LRP was first applied to each test window, yielding the input-level relevance tensor

r_{LRP, i}^{(0)} (t)

as defined in Section 3.3.1. For evaluation, the resulting relevance values were normalized within each window by their total absolute magnitude:

{\tilde{r}}_{LRP, i}^{(0)} (t) = \frac{r_{LRP, i}^{(0)} (t)}{\sum_{i^{'} = 1}^{N^{(0)}} \sum_{t^{'} = 1}^{N_{T}} |r_{LRP, i^{'}}^{(0)} (t^{'})| + ε_{norm}},

(30)

where

ε_{norm} > 0

is a small numerical stabilizer. The smoothness score was then computed as the mean first-order absolute temporal difference of

{\tilde{r}}_{LRP, i}^{(0)} (t)

across all input channels and time positions within the window. The resulting value was evaluated per window and then averaged across test windows and classes.

TRSI-based explanation metric. In addition to total variation, we also evaluated explanation smoothness using the proposed Temporal Relevance Smoothness Index, computed for each test window according to (20) and then averaged across the test set. Higher TRSI values indicate smoother relevance trajectories while preserving non-trivial attribution magnitude. Because TRSI is computed from the channel-aggregated absolute relevance envelope, it measures the temporal smoothness of the overall attribution magnitude rather than channel-specific signed attribution dynamics. This choice was made to obtain a compact temporal smoothness measure that is stable across samples and directly comparable with total-variation penalties. However, it also means that opposite-signed relevance contributions and axis-specific attribution patterns are not distinguished by the present TRSI formulation. A channel-wise or signed extension of TRSI would therefore be required when the goal is to analyze fine-grained sensor-axis contributions.

Perturbation-based relevance faithfulness. To complement the smoothness-based evaluation, we added a perturbation-based relevance faithfulness check. This evaluation follows perturbation-based attribution validation, where the importance of attributed input regions is assessed by measuring how model output changes after those regions are removed or corrupted [30]. For each clean test window, an attribution map was first computed for the true class. The temporal importance score was then obtained by aggregating the absolute attribution values over all input channels:

q (t) = \sum_{i = 1}^{N^{(0)}} |r_{i} (t)|, t = 1, \dots, N_{T},

(31)

where

r_{i} (t)

denotes the attribution value of channel i at time step t. The temporal positions were ranked according to

q (t)

, and three masked versions of each test window were created by setting all channels at selected time positions to zero. This masking was applied at the temporal-position level, meaning that all six sensor channels were removed at the selected time steps rather than masking individual channel values independently. The selected positions corresponded to the highest-relevance, lowest-relevance, or randomly selected time steps. The trained model was then re-evaluated on the masked windows. If the attribution map identifies behaviorally important temporal regions, masking the highest-relevance positions should reduce accuracy and true-class confidence more than random or low-relevance masking. For cross-entropy (CE), LRP-TV, and TRSI models, masking was guided by LRP relevance. For IG-TV, masking was guided by IG attribution to match the attribution signal used by its temporal prior.

Robustness under test-time sensor noise. Robustness was assessed by injecting additive noise only at inference time. Let

x \in R^{N^{(0)} \times N_{T}}

be a clean test window and

σ_{i}

the empirical standard deviation of channel i in the test set. For a relative noise level

ρ \geq 0

, the corrupted input was defined as

x_{ρ, i} (t) = x_{i} (t) + ρ σ_{i} ν_{i} (t),

(32)

where

ν_{i} (t)

is a zero-mean, unit-variance noise process. Three noise types were considered: white noise, band-limited noise, and high-pass impulsive noise.

For each value of

ρ

, the full test evaluation was repeated

N_{MC}

times with independent noise realizations, producing accuracies

{A^{(r)} (ρ)}_{r = 1}^{N_{MC}}

. The Monte Carlo mean and standard deviation were

μ_{acc} (ρ) = \frac{1}{N_{MC}} \sum_{r = 1}^{N_{MC}} A^{(r)} (ρ), σ_{acc} (ρ) = \sqrt{\frac{1}{N_{MC} - 1} \sum_{r = 1}^{N_{MC}} {(A^{(r)} (ρ) - μ_{acc} (ρ))}^{2}} .

(33)

For interval-based robustness comparison, two methods were treated as tied at a given noise level if their intervals

μ_{acc} (ρ) \pm σ_{acc} (ρ)

overlapped.

4. Results

4.1. FIRNN Performance

This subsection reports predictive performance of the FIRNN under the four training objectives: baseline CE (

λ = 0

), LRP-TV, TRSI, and IG-TV. Each configuration was tested 30 times with the same data split and hyperparameters. The test accuracy reported for each configuration was the result of early stopping based on validation loss on a model snapshot.

Thirty independent runs were used to estimate variability due to random initialization while keeping the computational cost of attribution-based training feasible. Since each regularized run requires repeated attribution computation, substantially increasing the number of runs would considerably increase the computational burden.

Table 2 provides an overview of test accuracies across runs. We present mean ± standard deviation (SD) as well as the median, as some attribution-regularized objectives lead to a small number of low-accuracy runs, which substantially influence the mean performance.

Across all experiments, both the baseline and IG-TV showed good performance, with small ranges and no failed runs (0/30 were less than 85%). Their mean and median accuracies were close to each other. By contrast, LRP-TV results were highly imbalanced: the median remained relatively high (92.93%), but the mean dropped to 86.16% because of the significant number of poor results (min was 53.87%; 8/30 were less than 85%). TRSI mitigated this problem (only 2/30 were less than 85%), and the median was close to 93.04%, with the mean being slightly lower (91.08%), caused by the occurrence of relatively low-accuracy result(s).

Figure 1 illustrates the row-normalized confusion matrices for these best runs and explains where the remaining errors lie. In all cases, the most dominant confusion is between the two locomotion classes: walk is most frequently confused with stairs-up (8.54% baseline; 8.07% IG-TV; 8.07% LRP-TV; 8.19% TRSI), while the reverse error (stairs-up → walk) is least prevalent (1.73% baseline; 0.91% IG-TV; 0.73% LRP-TV; 0.91% TRSI). Sit-up has the lowest rate of misclassification for both locomotion classes (usually ≤2–3%), which implies that the walk vs. stairs-up, not the three-way, classification appears to be the most difficult.

Table 3 lists the best overall test accuracy achieved for each method and its corresponding per-class breakdown. Overall test accuracies above the diagonal correspond to the best achievable operating points. In these models, walk is consistently the lowest-accuracy class (89.01–89.94%), whereas sit-up and stairs-up are the highest-accuracy classes (94.59–95.90% and 96.54–97.73%, respectively), partitioning the remaining classes into roughly middle- and high-accuracy groups. The shared class-wise structure of the best models provides a fair baseline for comparing the losses, LRP, and SMLRP methods in terms of global trends in explanation magnitudes, sensitivity to parameter choices, and the distribution of explanation statistics.

4.2. Effect on Relevance Smoothness

To confirm whether the attribution priors affected temporal explanations, we measured explanation roughness in terms of the mean first-order total variation (TV) of the channel-aggregated absolute relevance envelope

r (n)

(Equation (14)), where lower values indicate smoother temporal relevance. We present the single best run (with respect to test accuracy) for each configuration: FIRNN (Run 15), FIRNN+LRP-TV (Run 18), FIRNN+TRSI (Run 7), and FIRNN+IG-TV (Run 1). Table 4 provides the per-class and aggregated total variation (TV) values, scaled as TV ×

10^{3}

for better readability.

When compared to the baseline FIRNN, all three priors decrease the aggregated roughness score (“All classes”). FIRNN + TRSI achieves the highest reduction:

0.447

vs.

0.768

, i.e., a

41.8 %

decrease in TV. FIRNN + LRP-TV decreases the aggregated TV from

0.768

to

0.667

(

13.2 %

decrease), and FIRNN + IG-TV reduces it to

0.677

(

11.9 %

decrease). In this best-run comparison, TRSI produces relevance trajectories that are the smoothest among all methods when TV is considered at the same level of detail for the same number of iterations.

The smoothing effect varies for each class. TRSI has the highest reduction for Class 0 (

0.291

vs.

0.640

,

54.5 %

decrease), followed by Class 2 (

0.483

vs.

0.822

,

41.2 %

decrease), and Class 1 (

0.594

vs.

0.857

,

30.7 %

decrease). LRP-TV shows the highest reduction for Class 2 (

0.689

vs.

0.822

,

16.2 %

decrease) and lower reductions for Class 1 (

12.5 %

) and Class 0 (

9.4 %

). IG-TV has reductions similar to LRP-TV for Classes 0 and 2 (

13.1 %

and

13.3 %

) but with the lowest reduction for Class 1 (

8.5 %

). As such, it is not ideal to rely only on an aggregated score, and we recommend reporting the per-class smoothness, with the corresponding 30-run temporal relevance roughness statistics reported in Table 5.

We performed a Kruskal–Wallis test on the all-class TV values from the 30 runs. The global test indicated a statistically significant difference between training objectives (p < 0.001). Pairwise Mann–Whitney tests with Holm correction showed that TRSI produced significantly lower all-class TV than the baseline (p < 0.001), IG-TV (p < 0.001), and LRP-TV (p < 0.001).

The 30-run statistics confirm the trend observed in the best-run comparison. For the aggregated all-class score, TRSI reduces the mean TV from

0.728

to

0.477

, corresponding to a

34.5 %

decrease relative to the baseline. The reductions obtained by IG-TV and LRP-TV are smaller: IG-TV decreases the all-class TV from

0.728

to

0.661

(

9.2 %

decrease), while LRP-TV decreases it to

0.672

(

7.7 %

decrease). Thus, TRSI remains the strongest smoothing objective when the evaluation is averaged across repeated independent runs, not only when the best run is selected.

The class-wise results show the same pattern. TRSI gives the largest reduction for stairs-up (

0.476

vs.

0.789

,

39.7 %

decrease), followed by sit-up (

0.386

vs.

0.588

,

34.4 %

decrease) and walk (

0.591

vs.

0.826

,

28.5 %

decrease). IG-TV and LRP-TV provide more modest improvements, mostly below

13 %

at the class level. Notably, LRP-TV slightly increases the mean roughness for sit-up compared with the baseline (

0.596

vs.

0.588

), despite improving the aggregated all-class score. This supports the need to report both aggregated and class-wise smoothness statistics.

4.3. Perturbation-Based Relevance Faithfulness

A perturbation-based relevance masking check was performed to determine whether the attribution maps identify behaviorally important temporal regions. For each trained model, attribution scores were computed on clean test windows, and selected temporal positions were masked before re-evaluation. For CE, LRP-TV, and TRSI, masking was guided by LRP relevance. For IG-TV, masking was guided by IG attribution, matching the attribution signal used by its temporal prior. The 20% masking results are reported in Table 6.

Masking the most relevant temporal positions caused a substantially larger accuracy decrease than random masking for all evaluated training objectives. This supports the behavioral relevance of the attribution maps: temporal regions assigned high relevance had a stronger effect on the model decision than randomly selected regions. For TRSI, top-relevance masking reduced accuracy to 79.18%, whereas random masking preserved 89.23% accuracy, giving an additional degradation of 10.05 percentage points.

The effect was also consistent across attribution-regularized models: both IG-TV and TRSI showed clear separation between top-relevance and random masking. This indicates that the relevance signals shaped or used during temporal explanation regularization still correspond to decision-relevant temporal regions. Therefore, the smoothness improvements reported above are supported by a behavioral perturbation check, rather than by smoothness metrics alone.

4.4. Sensitivity to Sensor Noise

To probe test-time robustness, we evaluated the trained FIRNN variants under additive perturbations applied only at inference time (no retraining). For a test window

x \in R^{6 \times T}

, noise amplitude was scaled per channel using the test-set standard deviation

σ_{c}

and a relative noise level

ρ

, i.e., the injected noise had channel-wise scale

ρ σ_{c}

. We swept

ρ \in [0, 3]

in steps of

0.05

and computed window-level accuracy using the same sliding-window evaluation protocol as in training.

Because the injected noise is random, accuracy at a fixed

ρ

depends on the specific noise realization. Therefore, for each noise level, we used a Monte Carlo evaluation: the full test procedure was repeated

N_{MC}

times with independent noise draws, and we report the mean accuracy across repetitions as

μ_{acc} (ρ)

. The standard deviation across repetitions,

σ_{acc} (ρ)

, provides an estimate of variability due to noise sampling.

We considered three noise processes motivated by wearable sensing conditions: (i) train-like vibration modeled as band-limited Gaussian noise (0.5–20 Hz) obtained via the Butterworth band-pass filtering [31]; (ii) loose pocket or watch jitter modeled as impulsive burst noise with enhanced high-frequency content (high-pass filtered); and (iii) plain sensor noise modeled as additive white Gaussian noise. Figure 2 shows mean accuracy as a function of

ρ

for all four networks.

For all noise types, accuracy is observed to decrease approximately monotonically with

ρ

. The curves corresponding to band-limited vibration noise are close over most of the sweep, particularly at larger noise levels. This result highlights the fact that the corresponding perturbation regime does not provide sufficient separation for either model. Conversely, for impulsive pocket noise and white Gaussian noise, the TRSI-regularized model retains higher accuracy for most moderate-to-high noise levels, with the LRP-TV and IG-TV curves showing smaller but largely positive offsets. Overall, the results indicate that enforcing temporal structure in explanations during training can translate into improved robustness under realistic test-time perturbations, with the strongest effect observed for the TRSI objective in this setting.

At representative noise levels

ρ \in {0.3, 0.5, 1.0}

, the TRSI-regularized model yields the most consistent robustness gains across perturbation types, particularly at stronger corruption. Under train-like band-limited vibration, TRSI achieves the highest mean accuracy at moderate noise (

87.94 % \pm 0.38

at

ρ = 0.3

and

77.36 % \pm 0.50

at

ρ = 0.5

), improving over the baseline by

+ 2.29

and

+ 3.15

percentage points, respectively. At

ρ = 1.0

, both temporal-prior objectives remain clearly above the baseline (LRP-TV:

57.67 % \pm 0.88

, TRSI:

57.17 % \pm 0.55

vs. baseline

54.58 % \pm 0.93

), indicating that explanation-based temporal constraints improve resilience even when vibration noise becomes severe; the difference between the temporal prior and TRSI objectives at

ρ = 1.0

is within the Monte Carlo variability, suggesting near-tied performance in this regime.

For impulsive pocket or watch jitter and additive white Gaussian noise, TRSI provides the clearest advantage at high noise. At

ρ = 1.0

, TRSI reaches

87.98 % \pm 0.45

(pocket) and

87.72 % \pm 0.31

(Gaussian), compared with

84.67 % \pm 0.56

and

83.51 % \pm 0.50

for the baseline, corresponding to improvements of

+ 3.31

and

+ 4.21

percentage points. By contrast, IG regularization is consistently substantially weaker, and its performance often closely matches the unregularized baseline.

Table 7 provides a summary ranking of the four objectives within each noise type over increasing noise ranges. We can see a typical pattern for white Gaussian and impulsive pocket noise: with TRSI consistently ranked first, across all intervals, the baseline degrades the most with increasing corruption and ranks third in the highest-noise interval. For these two perturbations, IG-TV tends to be the weakest, often ranking last at low-to-moderate noise levels and improving only under strong corruption, where the performance of the competing methods converges.

For band-limited vibration that is train-like, the ranking differences between the methods are less pronounced, and the separation between accuracy curves collapses at high noise, as in Figure 2. For moderate levels of vibration (

ρ = 0.21

–

0.51

), all three methods share rank 1. This reinforces the interpretation that the TRSI yields the most consistently strong robustness profile, but the differences are amplified compared to the accuracy in the Gaussian and pocket scenarios with the proposed training set.

The robustness evaluation was intentionally based on controlled perturbation models because this allows the same trained networks to be compared under identical and repeatable corruption levels. The three selected noise processes represent different degradation regimes: band-limited vibration, impulsive jitter, and broadband sensor noise. Moreover, the perturbation amplitude was scaled by the empirical standard deviation of each channel, so the corruption was not imposed with the same absolute magnitude on all sensor axes. Real-device degradation, missing samples, and sensor dropout represent separate deployment scenarios and are therefore natural extensions of the present robustness protocol.

5. Discussion

Attribution maps for time-series models can strongly vary over time and change significantly with minimal input distortions despite the classifier’s high accuracy [32]. This motivates training-time regularization of explanations, rather than treating them as post hoc visualizations only. In the KU-HAR three-class setting, TRSI achieved the largest reduction of the temporal variation of the aggregated relevance signal (TV from

0.768

to

0.447

, i.e., ≈42% in the best-run comparison) while maintaining peak accuracy in the same range as the baseline. LRP-TV could also reach high accuracy, but it frequently failed to converge (8/30 runs below 85%). This confirms the more general observation that explanation-guided losses can improve explanation behavior, but their practical benefit depends on how much the attribution signal remains stable during optimization [15,18].

The observed LRP-TV instability is likely related to the direct penalization of step-to-step LRP variation during training. Since LRP relevance is recalculated from the current model parameters, the auxiliary loss can change sharply for some initializations, particularly when relevance redistribution involves small stabilized denominators. The TRSI objective reduced the number of low-accuracy runs from 8/30 to 2/30, suggesting that normalizing temporal roughness by the mean relevance magnitude provides a more stable optimization signal in this setting. Further stabilization, for example, through adaptive prior weighting or delayed activation of the attribution prior, remains a useful direction for broader validation.

The proposed regularizer should therefore not be interpreted as a method for increasing clean-test classification accuracy. In the present setting, the main practical effect is that TRSI substantially changes the temporal structure of the explanations while keeping peak predictive performance close to the unregularized baseline. This is relevant because explanation-guided training is useful only if interpretability improvements are not obtained at the cost of a large accuracy degradation.

TV is a direct penalty on the absolute size of step-to-step relevance changes. This can reduce fluctuations, but it can also tend to over-smooth solutions and/or create optimization issues when the relevance signal is very sharp as a function of the evolving model parameters. TRSI, by contrast, penalizes the irregularity of the step-to-step updates and normalizes by the mean relevance magnitude. This ratio form removes the preference for a degenerate solution in which all relevances are uniformly reduced and promotes smoother relevance patterns without making relevances small everywhere. In our experiments, this formulation showed more pronounced smoothing, as well as fewer unstable training behaviors than LRP-TV. The present work provides an empirical methodological evaluation of this objective rather than a formal convergence proof. Theoretical analysis of the optimization dynamics of attribution-dependent ratio losses remains an important direction for future work.

The robustness experiments indicate that test-time behavior can also be influenced by temporal explanation regularization. When impulsive pocket or watch jitter noise and additive white Gaussian noise were added, the TRSI-regularized model yielded higher accuracies over a large noise amplitude range. This is in line with our suggested interpretation, as the model relies more on information available across a wider part of the time window and less on isolated high-sensitivity time points.

The evaluation in this work focuses mainly on temporal regularity and perturbation stability of explanations, rather than on a full taxonomy of explanation quality. This choice follows from the purpose of TRSI: the proposed prior is designed to reduce irregular temporal relevance fluctuations during training. The added perturbation-based relevance masking check provides an initial faithfulness-oriented validation by testing whether highly attributed temporal regions have stronger behavioral influence than randomly selected regions. However, broader faithfulness analyses, including alternative perturbation operators, causal relevance tests, and channel-wise attribution validation, remain outside the scope of the present study and are left for future work.

There are two main limitations to this study. First, we worked with a reduced three-class subset and a single acquisition protocol; therefore, further datasets and larger label spaces are required to test generalization. Second, TRSI was formulated on channel-aggregated absolute relevance, which enhances readability but obfuscates channel-level effects and eliminates the sign of relevance. Recent work in the time-series XAI domain has underscored that explanations should not be evaluated solely based on visual measures (e.g., smoothness) but should also be evaluated for stability and robustness against perturbations [32,33]. In future work, we aim to extend TRSI to channel-wise or multivariate forms, to evaluate it on other dynamic architectures such as LSTM and GRU, and to integrate additional faithfulness checks based on controlled perturbations.

6. Conclusions

This study aimed to evaluate the potential of temporal explanation priors as effective training-time regularizers in dynamic neural networks trained on raw multichannel time windows. We introduced TRSI, a temporal prior that reduces irregular step-to-step changes in relevance over time while avoiding trivial relevance suppression. Two baseline priors were evaluated for comparison: a total-variation penalty computed on LRP-TV and an analogous penalty computed on IG-TV.

The key conclusions are as follows:

Peak performance was essentially unchanged by explanation priors. Across 30 runs, the best test accuracies were tightly clustered: 94.32% (baseline), 94.35% (IG-TV), 94.18% (LRP-TV), and 94.25% (TRSI). Overall, IG-TV reached the highest single-run accuracy, but the differences were negligible.
Training reliability differed strongly across priors. Baseline and IG-TV were the most stable in this setting, with 0/30 runs below 85% test accuracy. TRSI had 2/30 low-accuracy runs, whereas LRP-TV was the least reliable, with 8/30, indicating a higher risk of unfavorable convergence.
In terms of the main objective, TRSI delivered the strongest temporal smoothing both in the best-performing runs and across repeated independent runs. In the best-run comparison, the mean total variation of aggregated absolute relevance decreased from 0.768 (baseline) to 0.447 (TRSI), corresponding to a 41.8% reduction. Across 30 runs, the all-class TV decreased from $0.728 \pm 0.027$ to $0.477 \pm 0.033$ , a 34.5% reduction, and the difference was statistically significant compared with the baseline, IG-TV, and LRP-TV (p < 0.001).
The smoothing effect was class-dependent, but the trend was consistent both in the best-run comparison and across repeated runs. In the best-run case, TRSI reduced total variation from 0.640 to 0.291 for sit-up (54.5%), from 0.857 to 0.594 for walk (30.7%), and from 0.822 to 0.483 for stairs-up (41.2%). Across 30 runs, the corresponding mean reductions were from 0.588 to 0.386 for sit-up (34.4%), from 0.826 to 0.591 for walk (28.5%), and from 0.789 to 0.476 for stairs-up (39.7%). This confirms that the aggregate smoothing improvement is not caused by only one activity class.
Robustness tests showed the clearest advantage for TRSI under impulsive pocket or watch jitter noise and additive white Gaussian noise. Across the full noise sweeps, TRSI remained the top-ranked method in all evaluated low-to-high noise intervals, while the baseline degraded most strongly at higher corruption. Under band-limited train-like vibration noise, differences were smaller, and the methods tended to converge at stronger perturbations.

Overall, TRSI provided the largest improvement in the temporal smoothness of relevance while preserving peak accuracy, and it also produced the most consistent robustness gains under unstructured test-time noise in this setting. Future work should validate these findings in larger multi-class HAR tasks, extend TRSI to other dynamic architectures, and include complementary reliability checks, such as perturbation-based faithfulness tests and channel-wise analysis.

Author Contributions

Conceptualization, D.N. and M.D.; methodology, D.N.; software, M.D.; validation, M.D. and D.N.; formal analysis, M.D.; investigation, M.D.; resources, D.N.; data curation, M.D.; writing—original draft preparation, M.D.; writing—review and editing, D.N. and M.D.; visualization, M.D.; supervision, D.N.; project administration, D.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study can be made available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CE	Cross-Entropy
FIRNN	Finite Impulse Response Neural Network
HAR	Human Activity Recognition
IG	Integrated Gradients
IG-TV	Integrated Gradients Total Variation
KU-HAR	Korean University Human Activity Recognition
LRP	Layer-wise Relevance Propagation
LRP-TV	Layer-wise Relevance Propagation Total Variation
SD	Standard Deviation
SGD	Stochastic Gradient Descent
TRSI	Temporal Relevance Smoothness Index
TV	Total Variation
XAI	Explainable Artificial Intelligence

References

Plonis, D.; Kalinauskas, E.; Katkevičius, A.; Krukonis, A. Non-Invasive Analysis of the Bioelectrical Impedance of a Human Forearm. Acta Mech. Autom. 2024, 18, 496–504. [Google Scholar] [CrossRef]
Abdelaal, Y.; Aupetit, M.; Baggag, A.; Al-Thani, D. Exploring the Applications of Explainability in Wearable Data Analytics: Systematic Literature Review. J. Med. Internet Res. 2024, 26, e53863. [Google Scholar] [CrossRef]
Vdoviak, G.; Sledevič, T. Temporal Encoding Strategies for YOLO-Based Detection of Honeybee Trophallaxis Behavior in Precision Livestock Systems. Agriculture 2025, 15, 2338. [Google Scholar] [CrossRef]
Jankauskas, M.; Katkevičius, A.; Serackis, A. Investigation of Exponent-Free LSTM Cells for Virtual Sensing Applications. Electronics 2026, 15, 576. [Google Scholar] [CrossRef]
Theissler, A.; Spinnato, F.; Schlegel, U.; Guidotti, R. Explainable AI for Time Series Classification: A Review, Taxonomy and Research Directions. IEEE Access 2022, 10, 100700–100724. [Google Scholar] [CrossRef]
Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.R.; Samek, W. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS ONE 2015, 10, e0130140. [Google Scholar] [CrossRef] [PubMed]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2017; Volume 70, pp. 3319–3328. [Google Scholar]
Tang, X. Beyond the Black Box: Interpretable Multi-Trait Essay Scoring with Trait-Aware Transformer. Electronics 2026, 15, 1066. [Google Scholar] [CrossRef]
Meng, H.; Wagner, C.; Triguero, I. Explaining time series classifiers through meaningful perturbation and optimisation. Inf. Sci. 2023, 645, 119334. [Google Scholar] [CrossRef]
Xu, Y.; Xu, Z.; Dai, P. FLAMA: Frame-Level Alignment Margin Attack for Scene Text and Automatic Speech Recognition. Electronics 2026, 15, 1064. [Google Scholar] [CrossRef]
Erion, G.; Janizek, J.D.; Sturmfels, P.; Lundberg, S.M.; Lee, S.I. Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nat. Mach. Intell. 2021, 3, 620–631. [Google Scholar] [CrossRef]
Rieger, L.; Singh, C.; Murdoch, W.J.; Yu, B. Interpretations are Useful: Penalizing Explanations to Align Neural Networks with Prior Knowledge. In Proceedings of the 37th International Conference on Machine Learning (ICML); PMLR: Cambridge, MA, USA, 2020; Volume 119, pp. 8116–8126. [Google Scholar]
Ismail, A.A.; Corrada Bravo, H.; Feizi, S. Improving Deep Learning Interpretability by Saliency Guided Training. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 26726–26739. [Google Scholar]
Park, G.; Yang, J.Y.; Hwang, S.J.; Yang, E. Attribution Preservation in Network Compression for Reliable Network Interpretation. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 5093–5104. [Google Scholar]
Bassi, P.R.A.S.; Dertkigil, S.S.J.; Cavalli, A. Improving deep neural network generalization and robustness to background bias via layer-wise relevance propagation optimization. Nat. Commun. 2024, 15, 291. [Google Scholar] [CrossRef]
Schneider, S.; González Laiz, R.; Filippova, A.; Frey, M.; Weygandt Mathis, M. Time-series attribution maps with regularized contrastive learning. arXiv 2025, arXiv:2502.12977. [Google Scholar] [CrossRef]
Vielhaben, J.; Lapuschkin, S.; Montavon, G.; Samek, W. Explainable AI for Time Series via Virtual Inspection Layers. Pattern Recognit. 2024, 150, 110309. [Google Scholar] [CrossRef]
Ronco, M.; Camps-Valls, G. Role of locality, fidelity and symmetry regularization in learning explainable representations. Neurocomputing 2023, 562, 126884. [Google Scholar] [CrossRef]
Chen, C.; Guo, C.; Chen, R.; Ma, G.; Zeng, M.; Liao, X.; Zhang, X.; Xie, S. Training for Stable Explanation for Free. In Advances in Neural Information Processing Systems (NeurIPS 2024); Curran Associates, Inc.: Red Hook, NY, USA, 2024; p. 37. [Google Scholar]
Ferreira, P.; Titov, I.; Aziz, W. Explanation Regularisation through the Lens of Attributions. In Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates, 19–24 January 2025; pp. 6530–6551. [Google Scholar]
Chefer, H.; Schwartz, I.; Wolf, L. Optimizing Relevance Maps of Vision Transformers Improves Robustness. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2022; Volume 35, pp. 33618–33632. [Google Scholar]
Korean University. KU-HAR: A Human Activity Recognition Dataset. Mendeley Data, V5, 2021. Available online: https://data.mendeley.com/datasets/45f952y38r/5 (accessed on 14 May 2026).
Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques, 3rd ed.; Morgan Kaufmann: Burlington, MA, USA, 2011. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Wan, E.A. Temporal Backpropagation for FIR Neural Networks. In Proceedings of theInternational Joint Conference on Neural Networks (IJCNN), San Diego, CA, USA, 17–21 June 1990; pp. A575–A580. [Google Scholar]
Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear Total Variation Based Noise Removal Algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
Prechelt, L. Automatic Early Stopping Using Cross Validation: Quantifying the Criteria. Neural Netw. 1998, 11, 761–767. [Google Scholar] [CrossRef]
Kruskal, W.H.; Wallis, W.A. Use of Ranks in One-Criterion Variance Analysis. J. Am. Stat. Assoc. 1952, 47, 583–621. [Google Scholar] [CrossRef]
Holm, S. A Simple Sequentially Rejective Multiple Test Procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar]
Šimić, I.; Veas, E.; Sabol, V. A Comprehensive Analysis of Perturbation Methods in Explainable AI Feature Attribution Validation for Neural Time Series Classifiers. Sci. Rep. 2025, 15, 30. [Google Scholar] [CrossRef]
Butterworth, S. On the Theory of Filter Amplifiers. Exp. Wirel. Wirel. Eng. 1930, 7, 536–541. [Google Scholar]
Balestra, C.; Li, B.; Müller, E. On the Consistency and Robustness of Saliency Explanations for Time Series Classification. arXiv 2023, arXiv:2309.01457. [Google Scholar] [CrossRef]
Nguyen, T.T.; Nguyen, T.L.; Ifrim, G. Robust explainer recommendation for time series classification. Data Min. Knowl. Discov. 2024, 38, 3372–3413. [Google Scholar] [CrossRef]

Figure 1. Confusion matrices normalized by rows (%) for the best run of each training objective: (top-left) baseline (CE), (top-right) IG-TV, (bottom-left) LRP-TV, and (bottom-right) TRSI. The true classes are given by the rows and the predicted classes by the columns; the diagonal entries correspond to correct classifications.

Figure 2. Test accuracy versus relative noise level

ρ

for four FIRNN variants under three test-time perturbation models: (top) band-limited train-like vibration noise (0.5–20 Hz), (middle) impulsive pocket or watch jitter noise, (bottom) additive white Gaussian sensor noise. Curves show the mean accuracy over

N_{MC}

independent noise realizations at each

ρ

; dotted vertical lines indicate the noise-level intervals used in Table 7.

Figure 2. Test accuracy versus relative noise level

ρ

for four FIRNN variants under three test-time perturbation models: (top) band-limited train-like vibration noise (0.5–20 Hz), (middle) impulsive pocket or watch jitter noise, (bottom) additive white Gaussian sensor noise. Curves show the mean accuracy over

N_{MC}

independent noise realizations at each

ρ

; dotted vertical lines indicate the noise-level intervals used in Table 7.

Table 1. Methodological comparison with related explanation-guided training and attribution-prior approaches.

Study/Method	Attribution Basis	Training-Time Prior	Temporal Focus	Role in This Study
Erion et al. [11]	Expected gradients	Yes	No (not temporal)	Basis for IG-TV baseline
Rieger et al. [12]	Input relevance/rationale	Yes	No (feature-level rationale)	Prior-based reference
Ismail et al. [13]	Saliency consistency	Yes	No (stability, not smoothness)	Temporal XAI motivation
Bassi et al. [15]	LRP	Yes	No (not temporal trajectory)	LRP-prior reference
Chen et al. [19]	Top-k saliency ranking	Yes	No (feature-ranking stability)	Stability reference
Present TRSI	LRP relevance envelope	Yes	Yes (temporal smoothness)	Proposed method

Table 2. Test accuracy over 30 runs for each training objective. Median is reported alongside mean ± SD to reflect skewed distributions caused by occasional low-accuracy runs.

Method	Mean ± SD (%)	Median (%)	Min–Max (%)	Runs $< 85 %$
Baseline (CE)	93.49 ± 0.39	93.44	92.66–94.32	0/30
IG-TV	93.26 ± 0.40	93.34	92.53–94.35	0/30
LRP-TV	86.16 ± 12.16	92.93	53.87–94.18	8/30
TRSI	91.08 ± 5.04	93.04	71.28–94.25	2/30

Table 3. Best-run performance per method (i.e., run with maximum overall test accuracy over 30 runs), with per-class test accuracies.

Method	Overall (%)	Sit-Up (%)	Walk (%)	Stairs-Up (%)
Baseline (CE)	94.32	95.52	89.94	96.54
IG-TV	94.35	95.15	89.59	97.27
LRP-TV	94.18	95.90	89.01	96.54
TRSI	94.25	94.59	89.36	97.73

Table 4. Temporal relevance roughness measured as the mean total variation (TV) of the relevance envelope

r (n)

(lower is smoother). TV was calculated per test sequence with the best-accuracy runs: baseline Run 15, LRP-TV Run 18, TRSI Run 7, and IG-TV Run 1.

Table 4. Temporal relevance roughness measured as the mean total variation (TV) of the relevance envelope

r (n)

(lower is smoother). TV was calculated per test sequence with the best-accuracy runs: baseline Run 15, LRP-TV Run 18, TRSI Run 7, and IG-TV Run 1.

Method	Sit-Up	Walk	Stairs-Up	All Classes
Baseline (CE)	0.640	0.857	0.822	0.768
IG-TV	0.556	0.784	0.713	0.677
LRP-TV	0.580	0.750	0.689	0.667
TRSI	0.291	0.594	0.483	0.447

Table 5. Temporal relevance roughness over 30 independent runs. Values are reported as mean ± SD across all runs and scaled as TV ×

10^{3}

. Lower values indicate smoother relevance trajectories.

Table 5. Temporal relevance roughness over 30 independent runs. Values are reported as mean ± SD across all runs and scaled as TV ×

10^{3}

. Lower values indicate smoother relevance trajectories.

Method	Sit-Up	Walk	Stairs-Up	All Classes
Baseline (CE)	0.588 ± 0.028	0.826 ± 0.027	0.789 ± 0.031	0.728 ± 0.027
IG-TV	0.553 ± 0.019	0.751 ± 0.016	0.697 ± 0.014	0.661 ± 0.013
LRP-TV	0.596 ± 0.040	0.746 ± 0.031	0.689 ± 0.041	0.672 ± 0.027
TRSI	0.386 ± 0.072	0.591 ± 0.028	0.476 ± 0.015	0.477 ± 0.033

Table 6. Accuracy under 20% temporal relevance masking, reported as mean ± SD over 30 independent runs.

Method	Attribution	Clean Acc.	Top Masked Acc.	Random Masked Acc.	Top–Random Effect
CE	LRP	93.49 ± 0.39	80.53 ± 1.08	92.43 ± 0.34	11.90 pp
IG-TV	IG	93.26 ± 0.40	77.73 ± 1.03	91.84 ± 0.37	14.11 pp
LRP-TV	LRP	86.16 ± 12.16	76.42 ± 9.82	83.80 ± 13.48	7.39 pp
TRSI	LRP	91.08 ± 5.04	79.18 ± 5.07	89.23 ± 6.10	10.05 pp

Table 7. The relative robustness ranking (1 = best mean accuracy) of the four FIRNN variants across noise-level intervals. Ranks are computed within each noise type and interval; ties indicate indistinguishable performance under the applied criterion.

Noise Interval $ρ$	Baseline (CE)	IG-TV	LRP-TV	TRSI
Additive white Gaussian noise
$0.00$ – $0.24$	1	2	1	1
$0.24$ – $0.50$	2	3	2	1
$0.50$ – $1.04$	2	2	2	1
$1.04$ – $3.00$	3	2	2	1
Impulsive pocket or watch jitter noise
$0.00$ – $0.30$	1	2	1	1
$0.30$ – $0.53$	2	3	2	1
$0.53$ – $1.90$	2	2	2	1
$1.90$ – $3.00$	3	2	2	1
Band-limited train-like vibration noise
$0.00$ – $0.21$	1	2	1	1
$0.21$ – $0.51$	2	1	1	1
$2.00$ – $3.00$	1	1	1	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Navakauskas, D.; Dumpis, M. Regularizing Temporal Explanations in Dynamic Neural Networks. Electronics 2026, 15, 2200. https://doi.org/10.3390/electronics15102200

AMA Style

Navakauskas D, Dumpis M. Regularizing Temporal Explanations in Dynamic Neural Networks. Electronics. 2026; 15(10):2200. https://doi.org/10.3390/electronics15102200

Chicago/Turabian Style

Navakauskas, Dalius, and Martynas Dumpis. 2026. "Regularizing Temporal Explanations in Dynamic Neural Networks" Electronics 15, no. 10: 2200. https://doi.org/10.3390/electronics15102200

APA Style

Navakauskas, D., & Dumpis, M. (2026). Regularizing Temporal Explanations in Dynamic Neural Networks. Electronics, 15(10), 2200. https://doi.org/10.3390/electronics15102200

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Regularizing Temporal Explanations in Dynamic Neural Networks

Abstract

1. Introduction

2. Related Work

2.1. Explainability for Time-Series Classification

2.2. Explanation-Guided Training and Attribution Priors

2.3. Temporal Regularization as a Remaining Gap

3. Materials and Methods

3.1. Dataset and Preprocessing

3.2. Finite Impulse Response Neural Network

3.3. Attribution Computation and Explanation-Guided Training

3.3.1. LRP for FIRNN Input Relevance

3.3.2. Integrated Gradients for FIRNN Input Relevance

3.3.3. Temporal Relevance Smoothness Index

3.3.4. Temporal Explanation Priors for Training Regularization

3.4. Network Training

3.4.1. Objective Function and Regularization Weighting

3.4.2. Optimization Details

3.5. Evaluation Metrics

4. Results

4.1. FIRNN Performance

4.2. Effect on Relevance Smoothness

4.3. Perturbation-Based Relevance Faithfulness

4.4. Sensitivity to Sensor Noise

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI