In this section, we present a case study that applies the proposed AAG-DSVDD framework to real multi-sensor fire monitoring data. The goal is to detect the onset of hazardous fire conditions as early as possible while maintaining a low false-alarm rate. We first describe the fire-monitoring dataset and its preprocessing. We then detail the semi-supervised training protocol, baselines, and evaluation metrics used to assess detection performance in this realistic setting.
5.1. Fire-Monitoring Dataset Description
We evaluate the proposed method on a multi-sensor residential fire detection dataset derived from the National Institute of Standards and Technology (NIST) “Performance of Home Smoke Alarms” experiments [
41]. In these experiments, a full-scale residential structure was instrumented and subjected to a variety of controlled fire and non-fire conditions. Sensor responses were recorded over time in systematically designed scenarios.
In our setting, each scenario is represented as a five-channel multivariate time series sampled at 1 Hz. At each time step
t in scenario
s, the observation vector
contains measurements from five sensors in the following order: The first channel is a temperature sensor. The second channel measures CO concentration. The third channel records the output of an ionization smoke detector. The fourth channel records the output of a photoelectric smoke detector. The fifth channel measures smoke obscuration. Accordingly, each scenario
s is stored as a matrix:
where
denotes the duration of that scenario in seconds.
The dataset comprises multiple classes of fire scenarios. These include smoldering fires, such as the smoldering of upholstery or mattress experiments. They also include cooking-related heating and cooking fires and flaming fires. For each fire scenario, a true fire starting time (TFST) is annotated. This TFST indicates the earliest time at which the fire is deemed to have started based on the original NIST records and the observed sensor responses.
Table 8 shows the description of the fire scenarios.
To build intuition about the multi-sensor fire dynamics,
Figure 6 shows an excerpt from one fire scenario (Scenario 4 in
Table 8). The plots show the five sensor channels sampled at 1 Hz: temperature (top), CO concentration, ionization smoke, photoelectric smoke, and smoke obscuration (bottom). The vertical dotted line marks the annotated
s.
Before TFST, all channels remain close to their baseline levels, with only small fluctuations attributable to noise. Shortly after TFST, the smoke obscuration begins to rise, followed by increases in the photoelectric and ionization detector outputs, and then a delayed but pronounced growth in CO concentration and temperature. This sequence reflects the physical development of the fire and highlights the temporal coupling between channels. It also motivates the use of windowed multi-channel inputs and a boundary-focused model that leverages the joint evolution of these signals rather than relying on any single sensor in isolation.
5.2. Window Construction and Experiment Setup
5.2.1. Sliding-Window Observations
We adopt a sliding-window observation structure similar to that used in previous real-time fire detection studies on this dataset. For each fire scenario
s, let
denote the five-dimensional sensor vector at second
t, ordered as temperature, CO concentration, ionization detector output, photoelectric detector output, and smoke obscuration. We form an observation window of length
w at time
t as follows:
where
w is the window length in seconds. Each scenario
s thus produces a sequence of windows
.
For each fire scenario s, an annotated is given in seconds. We assign a binary label to each window based on the time of its last sample relative to this TFST. If the last time index of the window satisfies , the window is labeled as fire, and we set . If the window ends strictly before the fire starts, that is, , the window is labeled as normal, and we set . These end-of-window labels form the base labels on which we build the semi-supervised setting in the next subsection.
For notational consistency with
Section 2.1, we subsequently denote each window by
and its binary label by
. When we use an LSTM encoder,
is treated as an ordered length multivariate time series
w in
. For methods that require fixed-length vector inputs, such as OC-SVM and kernel SVDD, we additionally define a flattened version
with
by stacking the entries of
along the time and channel dimensions into a single vector.
5.2.2. Experiment Setup
We construct an LSTM-ready dataset of five-channel windows with separate training, validation, and test splits. The training set is derived from four fire scenarios (1, 2, 3, and 4). From each of these scenarios, we sample fixed-length windows of size
s using the sliding-window construction described in
Section 5.2.1. In each training scenario, we draw approximately 250 windows, with about 40% taken from the pre-TFST region and labeled as normal and about 60% taken from the post-TFST region and labeled as fire. This yields roughly 100 normal and 150 fire windows per training scenario, for a total of 1000 training windows across all four scenarios. In all cases, the binary label of a window is determined by the label at its last time step. For a semi-supervised setting, we randomly retain 10% of normal and fire windows as labeled samples, while the rest are considered unlabeled.
For evaluation, we generate dense sliding windows of length s with unit stride over complete scenarios. The validation windows are drawn from the same four scenarios (1, 2, 3, and 4), using all overlapping windows rather than the subsampled training set. The test windows are drawn from six additional fire scenarios (5, 6, 7, 8, 9, and 10) that do not appear in the training set. In all splits, a window is labeled as normal if its last time step occurs before the annotated for that scenario and as fire otherwise.
During the evaluation, each window is passed through the encoder and the AAG-DSVDD model to obtain an anomaly score . A window is classified as fire when its score exceeds a global threshold , which is selected using held-out validation windows from training scenarios. A scenario-level fire alarm is raised once consecutive windows are classified as fire. The estimated fire starting time (EFST) is defined as the time at which the model predicted a fire for q consecutive windows.
The same window-level train/validation/test partitions and semi-supervised label budgets are used for all methods to ensure a fair comparison. For the deep methods, AAG-DSVDD and DeepSAD are trained on the full training window set, using labeled normal windows () and labeled fire windows () together with the remaining unlabeled windows (). In contrast, DeepSVDD is trained only on the labeled normal windows (), representing a supervised deep one-class baseline. The classical OC-SVM and SVDD-RBF baselines are trained on the normal training windows only (), ignoring unlabeled and fire windows during training. Validation windows from the training scenarios are used exclusively for hyperparameter selection and for choosing the global decision threshold, while test windows from held-out scenarios are used only for final evaluation.
The fire monitoring experiments were carried out on a Windows workstation equipped with a 13th Gen Intel Core i9-13900K (3.00 GHz), 64 GB of RAM, and an NVIDIA RTX 4090 GPU.
5.2.3. Evaluation Metrics for Fire Monitoring Experiments
For each fire scenario
s, we obtain a sequence of ground-truth window labels:
and the corresponding predicted labels:
where
is the number of windows in that scenario, and
denotes a predicted fire window.
For each scenario
s, the
is provided by the dataset annotation and expressed in window indices. Equivalently, it can be written as the first time point at which the ground-truth label becomes fire:
Given the predicted labels
, we declare that a scenario-level fire alarm has been raised once
q consecutive windows are classified as fire. In our experiments, we use
. The
is defined as the last index of the first run of
q consecutive fire predictions, that is
This matches the implementation in which the alarm time is the index of the last window in the first run of q consecutive predicted fires.
The Fire Starting Time Accuracy (FSTA) for scenario
s measures the temporal deviation of the estimated start time from the true start time. It is defined as the absolute difference between the estimated and true fire starting times:
By this definition, quantifies the magnitude of the error in seconds, treating early false alarms () and late detections () symmetrically. Lower values indicate better performance.
To quantify false alarms in the pre-fire region, we compute the false alarm rate (FAR) for each scenario with an annotated
. Let
denote the set of pre-fire window indices. The per-scenario false alarm rate is
that is, the fraction of pre-TFST windows that are incorrectly classified as fire. In the reported tables, we express
as a percentage.
5.2.4. Model Architectures and Hyperparameters
For the fire-monitoring case study, the hyperparameter settings for AAG-DSVDD, DeepSVDD, DeepSAD, OC-SVM, and SVDD-RBF were obtained through validation-based tuning on the dense validation windows constructed from training scenarios (1–4) and then kept fixed for all reported runs. We first fixed the LSTM encoder architecture and window length based on preliminary experiments that balanced detection performance and computational cost. The scalar coefficients in the AAG-DSVDD objective, denoted by , were then tuned by a small grid search on the validation windows, using logarithmic or short linear grids. The neighborhood size k of the latent k-NN graph and the warm-up length for center initialization were chosen from discrete sets of candidates. The final configuration corresponds to the setting that achieves the best average EFST across the validation scenarios, subject to a reasonable trade-off between early detection (low EFST) and a low pre-TFST FAR. The deep baselines were tuned in a similar manner: DeepSVDD and DeepSAD use the same selected encoder architecture, and the labeled-term weight in DeepSAD is chosen from a logarithmic grid on the same validation windows. For OC-SVM and SVDD-RBF, we performed a small grid search over the RBF kernel width around the scale heuristic on flattened windows and selected the configuration according to the same EFST/FAR selection criterion. The one-class parameter was fixed at for all methods.
All deep methods share the same LSTM-based encoder for fairness. Each input window is a sequence in with and five sensor channels. We use an LSTM encoder with input size 5, hidden size 128, two layers, no bi-directionality, and dropout set to zero. Its last hidden state is projected to a 64-dimensional representation. Deep models are trained with Adam (DeepSVDD with AdamW) for 50 epochs, batch size 256, and learning rate . The weight decay is for AAG-DSVDD and for DeepSVDD and DeepSAD. Fire windows are treated as positive classes in all evaluations. Alarm logic uses consecutive fire windows to declare a fire at the scenario level.
For AAG-DSVDD, we use the soft-boundary formulation of
Section 3 with squared hinges (
), soft-boundary parameter
, anomaly weight
, and margin
. The unlabeled center-pull weight is
, and the graph smoothness weight is
, which imposes strong graph regularization. The latent
k-NN graph uses
neighbors with the locally scaled Gaussian affinities of
Section 3.2. The radius update and graph-refresh schedules follow
Section 3.4, with quantile-based updates of
at level
and reconstruction of the latent
k-NN graph every two epochs. The warm-up phase for center initialization runs for two epochs.
DeepSVDD uses the same encoder and the same soft-boundary parameter . It is trained only on labeled normal windows and does not use anomaly margins, unlabeled pull-in, or graph regularization. DeepSAD also uses the same architecture and optimizes the standard semi-supervised DeepSAD objective on the same labeled and unlabeled splits, with labeled-term weight . In both AAG-DSVDD and DeepSAD, of normal windows are labeled normals, and of fire windows are labeled anomalies. DeepSVDD uses only the labeled normal subset.
OC-SVM and SVDD-RBF are trained on flattened windows. Each window is reshaped into a 160-dimensional vector, and only labeled-normal windows are used for training. Both methods use an RBF kernel with . For OC-SVM, we set the kernel width parameter to , while for SVDD-RBF, we use . These classical one-class baselines operate directly in the input space and provide a non-deep comparison to the LSTM-based models.
5.3. Fire-Monitoring Results and Analysis
The per-scenario fire detection results for all methods are summarized in
Table 9. For each test scenario we report
,
,
(in seconds), and
. By construction of the EFST metric, if
, then the method has raised a false alarm. In contrast, if
is equal to the duration of the scenario
, the method never produces
q consecutive fire predictions before the scenario ends and therefore fails to raise any stable fire alarm. In this single-fire, per-scenario setting, the reported precision and recall values often degenerate to
or
and are therefore not very informative; accordingly, our discussion below focuses on the timing metrics (
,
,
) and
, which better capture the practical trade-off between early detection and false alarms.
SVDD-RBF is extremely conservative. In all six test scenarios, its is exactly equal to the duration of the scenario. This means that it never detects a fire event at the level of a stable alarm. Therefore, the high FSTA values (from 209 s to 3487 s) reflect missed detections rather than late but successful alarms, which is in line with the average FAR for the SVDD-RBF being . OC-SVM moves in the opposite direction. On average, it reduces FSTA to about 215 s, but this comes at the cost of non-negligible pre-fire activity. The mean FAR is , and in the longest scenarios, it is as high as (scenario 5) and (scenario 6). In these two scenarios, is smaller than . This indicates that OC-SVM raises false alarms well before the true fire onset. DeepSVDD achieves the second-smallest average FSTA (about 400 s) and detects the shorter flaming scenarios relatively quickly. However, it does so by triggering almost continuously in the pre-fire region. The average FAR is , and scenarios 5 and 6 both exhibit (an alarm essentially at the beginning of the scenario). Thus, the one-class baselines either miss fires altogether (SVDD-RBF) or raise persistently early alarms that would be unacceptable in a deployed monitoring system (OC-SVM and especially DeepSVDD).
The semi-supervised deep methods, DeepSAD and AAG-DSVDD, achieve a more favorable balance between timeliness and reliability. DeepSAD maintains a very low average FAR of , with at most in any scenario, but its timing behavior is mixed. In several scenarios (7, 8, 9, and 10), it produces moderate delays with FSTA between 100 and 170 s, while in scenario 5, it reacts extremely late (FSTA s). Moreover, in scenario 6, DeepSAD exhibits , which indicates that its first stable alarm is actually a false alarm in the pre-fire region, despite the low FAR reported there. In general, the mean FSTA for DeepSAD in all scenarios is about 689 s.
AAG-DSVDD yields the best overall trade-off. Across the six test scenarios, it reduces the mean FSTA to approximately 473 s, representing a substantial improvement over SVDD-RBF and a reduction of roughly relative to DeepSAD, while keeping the average FAR below (). Importantly, for AAG-DSVDD, we always have , so the first stable alarm in all scenarios occurs at or after the true onset of the fire. The gains are most pronounced in scenarios 5 and 9, where AAG-DSVDD triggers earlier than all baselines with negligible FAR, and in scenario 10, where it achieves the smallest FSTA of all methods (37 s) with zero pre-fire false alarms.
A critical observation from these results is the robustness of AAG-DSVDD against early false alarms, particularly when compared to the noisy behavior of DeepSVDD. DeepSVDD triggers almost continuously in the pre-fire regions of scenarios 5 and 6, suggesting that its decision boundary is overly sensitive to local fluctuations in the sensor noise floor. In contrast, the graph regularization in AAG-DSVDD explicitly enforces smoothness on the anomaly scores of neighboring latent representations. By coupling the scores of unlabeled pre-fire windows (which lie on the normal manifold) with those of the labeled normal set, the graph term effectively dampens high-frequency noise in the anomaly score trajectory. This smoothing effect prevents transient sensor spikes from crossing the decision threshold, resulting in a stable monitoring signal that only rises when the collective evidence from the multi-sensor array indicates a genuine departure from the normal manifold.
These results are consistent with the design of AAG-DSVDD. The baseline one-class models construct their decision regions using only sparsely labeled normals (and, for DeepSAD, a small number of labeled fires) and do not explicitly exploit the geometry of the unlabeled windows. Consequently, they tend either to tighten the normal region so aggressively that many pre-TFST windows are classified as anomalous, leading to , or to expand it so conservatively that no stable alarm is raised before the scenario ends, yielding .
In contrast, AAG-DSVDD regularizes the squared-distance field over a label-aware latent k-NN graph and couples labeled and unlabeled windows through its k-NN latent graph and unlabeled center-pull terms. This allows the model to propagate limited label information along the data manifold and to position the hypersphere boundary close to the true transition from normal to fire, resulting in earlier yet stable fire alarms with very few false activations on the NIST fire sensing dataset.
In particular, in scenario 5, which represents a smoldering scenario exhibiting a slow combustion process, both DeepSAD and AAG-DSVDD achieve and precision, yet their detection times differ substantially. DeepSAD triggers the alarm very late (EFST s, FSTA s), whereas AAG-DSVDD raises the alarm earlier at EFST s (FSTA s). DeepSAD relies on a point-wise distance loss with an unlabeled pull-in term. As a result, many weak or slowly developing fire windows are drawn toward the center and remain in the normal region under the tight FAR constraint, so the alarm is delayed. AAG-DSVDD adds a label-aware graph regularizer on squared distances together with the anomaly push-out hinge. In this smoldering scenario, the scarce labeled fire windows act as anchors in the latent k-NN graph, drawing structurally similar unlabeled windows away from the center. As a result, their anomaly scores rise earlier, without increasing the FAR. This improvement is therefore mainly attributable to the label-aware graph regularization module, which propagates the limited anomaly labels to structurally similar unlabeled windows in this scenario.
Moreover, AAG-DSVDD supports sample-level interpretation through neighbor queries in its graph-regularized latent space. For any window, we compute its embedding and perform a
k-NN search among the labeled training embeddings. The labels and scenarios of these neighbors provide a local, example-based view of the window’s anomaly score. To illustrate, we consider a late-stage smoldering-test window from scenario 5, taken around the EFST predicted by AAG-DSVDD (at 2067 s), which is correctly detected as fire.
Table 10 lists its
nearest labeled training windows in latent space. Eight of the nine neighbors come from a smoldering training scenario (scenario 1), and the remaining neighbor is a flaming training scenario (scenario 4) from an early fire stage. All neighbors lie at very small latent distances from the query. This neighborhood structure shows that the embedding of the queried smoldering window is tightly clustered with labeled smoldering fire windows from the training set and close to one early-stage flaming window, providing a concrete, example-based justification for its high anomaly score. In an operational setting, such neighbor-based views also help clarify the likely fire type, which is important for selecting appropriate evacuation and response procedures.
In our experiments on the NIST fire-monitoring dataset, AAG-DSVDD had training times of the same order as the DeepSVDD and DeepSAD baselines. All three deep methods completed training within a few tens of seconds, whereas the classical OC-SVM and SVDD-RBF baselines completed almost instantaneously. At inference, AAG-DSVDD processed test windows in roughly s per window, compared to s for DeepSVDD and s for DeepSAD. The OC-SVM and SVDD-RBF baselines achieved s and s per window, respectively. In all cases, this per-window cost is negligible relative to the 1 Hz sampling rate, as such, the graph regularization in AAG-DSVDD does not compromise real-time deployability.