A Novel Dynamic Edge-Adjusted Graph Attention Network for Fire Alarm Data Mining and Prediction

Ding, Yongkun; Xie, Zhenping; Jiang, Senlin

doi:10.3390/math13193111

Open AccessArticle

A Novel Dynamic Edge-Adjusted Graph Attention Network for Fire Alarm Data Mining and Prediction

by

Yongkun Ding

¹

,

Zhenping Xie

^1,*

and

Senlin Jiang

²

¹

School of Artificial Intelligence and Computer Science, Jiangnan University, 1800 Lihu Avenue, Wuxi 214112, China

²

School of Internet of Things Engineering, Wuxi Institute of Technology, Wuxi 214121, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(19), 3111; https://doi.org/10.3390/math13193111

Submission received: 23 August 2025 / Revised: 15 September 2025 / Accepted: 22 September 2025 / Published: 29 September 2025

Download

Browse Figures

Versions Notes

Abstract

Modern fire alarm systems are essential for public safety, yet they often fail to exploit the wealth of historical alarm data and the complex spatiotemporal dependencies inherent in urban environments. Graph Neural Networks (GNNs) are currently among the most popular methods for handling complex spatiotemporal dependencies. While a range of dynamic GNN approaches have been proposed, many existing GNN-based predictors still rely on a static topology, which limits their ability to fully capture the evolving nature of risk propagation. Furthermore, even among dynamic graph methods, most focus on temporal link prediction or social interaction modeling, with limited exploration in safety-critical applications such as fire alarm prediction. DeaGAT dynamically updates inter-building edge weights through an attention mechanism, enabling the graph structure to evolve in response to shifting risk patterns. A margin-based contrastive learning objective further enhances the quality of node embeddings by distinguishing subtle differences in risk states. In addition, DeaGAT jointly models static building attributes and dynamic alarm sequences, effectively integrating long-term semantic context with short-term temporal dynamics. Extensive experiments on real-world datasets, including comparisons with state-of-the-art baselines and comprehensive ablation studies, demonstrate that DeaGAT achieves superior accuracy and F1-score, validating the effectiveness of dynamic graph updating and contrastive learning in enhancing proactive fire early-warning capabilities.

Keywords:

fire alarm systems prediction; graph attention networks; spatio-temporal data mining; contrastive learning

MSC:

68T07; 68T09; 62H30

1. Introduction

Fire alarm systems are a critical component of public safety infrastructure, designed to detect signs of fire and promptly alert occupants and emergency responders [1]. Timely and accurate warnings are vital for safeguarding lives and protecting property, as delayed or missed alarms can lead to devastating consequences. However, most current systems remain reactive in nature; they trigger alerts only after sensor thresholds are breached without the ability to anticipate risks before incidents occur. Crucially, these systems often overlook the rich historical alarm records and the complex spatiotemporal dependencies inherent in urban environments, rendering fire prediction a largely unsolved spatiotemporal sequence forecasting problem. To enable proactive fire preparedness, there is a pressing need for predictive frameworks that can leverage historical patterns and evolving urban risk dynamics to forecast where and when fire alarms are likely to occur. GNNs have emerged as one of the most effective methods for spatio-temporal prediction tasks, owing to their ability to naturally model complex spatial interactions and temporal dependencies via message passing on graph-structured data [2]. By modeling entities (e.g., buildings or sensors) as nodes and their relationships as edges, GNNs can learn contextualized representations that incorporate both local and global information. Attention-based GNN variants, such as Graph Attention Networks (GATs), further enhance data-dependent weights to edges, improving interpretability and robustness under varying conditions. These capabilities have led to successful applications of GNNs in domains. However, most existing GNN models either assume a static topology (e.g., STSGCN [3], MTGNN [4]) or modify model parameters over time without explicitly updating the underlying graph structure (e.g., TGAT [5], TGN [6]). This limits their capacity to model dynamic changes in risk propagation across urban environments. Moreover, their black-box weight nature may obscure subtle signals tied to specific locations, reducing interpretability and responsiveness. These challenges call for a new method that can dynamically adapt the graph structure in response to evolving risk conditions while maintaining strong predictive and interpretive capabilities.

2. Related Work

Our proposed method is closely related to advancements in fire alarm data mining and spatio-temporal data analysis using the GNN method. This section provides a comprehensive overview of related work in both areas, with an emphasis on their limitations in modeling dynamic graph evolution. These limitations serve to motivate our proposed method.

2.1. Fire Alarm Data Mining and Prediction

Recent years have witnessed the increasing use of big data analytics to enhance the intelligence and responsiveness of fire warning systems. By analyzing historical fire incidents, climatic conditions, and environmental factors, traditional methods attempt to identify fire risk patterns. Traditional big data analytics [7] leverages historical alarm records and environmental variables to identify risk patterns but often fails to jointly model spatial adjacency and temporal evolution, resulting in coarse-grained forecasts that cannot capture real-time propagation dynamics. Conventional approaches, such as historical data mining [7], satellite remote sensing [8], and multi-sensor fusion [9], each have notable limitations. Specifically, big data methods often fail to jointly model spatial adjacency and temporal dynamics, resulting in coarse-grained forecasts that cannot capture real-time propagation. Remote sensing provides wide coverage but suffers from low spatial resolution and latency, compromising timeliness [8]. Multi-sensor fusion improves detection accuracy via techniques like weighted averaging, Kalman filtering, and Bayesian inference [9]. However, these methods incur significant system complexity, maintenance burden, and error accumulation.

Machine learning models, such as deep neural networks, random forests (RFs) [10], and support vector machines (SVMs) [11], show strong predictive power but lack interpretability and overlook the inherent graph structure in sensor deployments. Meanwhile, IoT- and AI-integrated systems [12] enhance response coordination and evacuation planning yet struggle with cross-agency resource allocation and real-time optimization.

Most existing emergency systems focus on post-event resource management, rather than proactive risk forecasting. This highlights the need for methods capable of simultaneously modeling spatial dependencies and temporal dynamics. GNNs offer a promising solution by modeling sensor networks as graphs, where spatial relationships are encoded via message passing and temporal changes are captured using recurrent or attention-based architectures. Unlike grid-based or sequence-based methods, GNNs preserve the irregular distribution of sensor nodes and support both local and global pattern learning. Attention-based variants further improve interpretability and robustness by dynamically adjusting edge weights.

Although GNNs are still underutilized in fire warning systems, they have demonstrated strong performance in spatio-temporal prediction tasks across various domains, offering valuable insights into fire propagation patterns and potential impacts.

2.2. Spatio-Temporal Data Analysis with GNN

Graph neural networks have shown considerable success in modeling spatio-temporal data, but existing models often fall short when dealing with the dynamic evolution of graph structures. Early models operate on static graphs, where message passing and feature aggregation are performed on a fixed topology to derive node or graph-level representations. Xu et al. proposed TGAT, which incorporates functional time encoding into an attention mechanism [5]. While effective for short sequences, it requires deep stacking for long-range dependencies, suffers from time aliasing, and incurs high latency. Rossi et al. introduced TGN, which maintains memory states for each node and updates them in an event-driven manner [6]. Although this enables streaming inference, it suffers from high memory usage, limited parallelism, and vulnerability to concept drift.

Subsequent methods have attempted to extend static GNNs to dynamic graph scenarios, where nodes and edges may emerge, disappear, or change over time. Zhang et al. introduced STSGCN with spatio-temporal graph convolutions and Huber loss, but its sparse representations and limited temporal depth reduce performance [3]. Pareja et al. developed EvolveGCN, which evolves GCN parameters using recurrent neural networks (RNNs). While adaptive, the method struggles with scalability due to recurrent computation overhead [13]. Gao et al. combined temporal convolutions with graph structure learning in MTGNN, but the model performs poorly with long-range dependencies and rapidly evolving graphs [4]. Zhou et al. developed STGormer, a transformer-based model [14]. Despite its flexibility and ability to capture long-range spatio-temporal dependencies, it suffers from quadratic complexity in sequence length and fixed spatial encodings, which limit scalability and adaptability to large dynamic graphs.

Recent advances have introduced more sophisticated spatio-temporal GNN variants, including Transformer-based methods, diffusion-based models, and event-driven architectures. Spatio-temporal Transformers, such as LVSTformer, leverage self-attention to capture long-range temporal and spatial dependencies, thereby enhancing the representation of complex spatio-temporal patterns [15]. However, they suffer from high computational and memory costs, especially for long sequences or large graphs. Diffusion-based models, such as DiffSTG, integrate denoising diffusion probabilistic models with spatio-temporal GNNs to capture uncertainty and intricate dynamics [16]; yet, the iterative denoising process imposes substantial computational overhead, limiting real-time applicability. Similarly, Wang et al. incorporated Hawkes processes into GNNs (HP-DGNN) to model event cascades, but this approach also entails high computational costs (

O (| E |)

per step) and relies on restrictive assumptions regarding event excitation [17].

Although these methods attempt to extend GNNs for spatio-temporal prediction, they still inadequately capture the evolving nature of graph structures. Key limitations include inefficient dynamic graph updates, challenges with long-term temporal dependencies, poor scalability on large-scale data, and high computational costs for Transformer or diffusion-based approaches. These gaps motivate the development of our Dynamic Edge-Adjusted Graph Attention Network DeaGAT, which explicitly addresses dynamic edge evolution while maintaining robust performance in fire alarm prediction.

3. Dynamic Edge-Adjusted Graph Attention Network

In this section, we propose DeaGAT, a novel dynamic edge adjusted graph attention network. Figure 1 illustrates the overall architecture of the proposed model. The figure highlights the flow of information through the network, showing how static building features and dynamic temporal signals are jointly processed. It also depicts the iterative update of edge weights and node embeddings over time, emphasizing the adaptive nature of the graph structure and the interaction between spatial and temporal information throughout the prediction horizon.

Compared with existing spatio-temporal prediction approaches [3,14], DeaGAT significantly enhances both prediction accuracy and interpretability by effectively integrating spatial and temporal dependencies through adaptive graph learning. This design effectively addresses a key limitation of these methods, namely their of the dynamic evolution of underlying graph structures in spatio-temporal forecasting tasks.

3.1. Task Formulation

We let

G_{t} = (V, E_{t})

denote the building–sensor graph at time step t, where

V = {1, \dots, N}

is the set of N building nodes and

E_{t} \subseteq V \times V

represents the undirected edge set at time t, which may vary over time to reflect dynamic changes in spatial relationships or risk propagation.

Each node

i \in V

is associated with static spatial features

x_{i}

, and at each time step t,

X_{t}

collects the dynamic temporal features for all nodes. The multi-class alarm prediction task aims to predict, at the next time step

T + 1

, the probability distribution over C possible alarm types for each node i:

{\hat{p}}_{i, T + 1} = F ({G_{t}}_{t = 1}^{T}, {x_{i}}_{i = 1}^{N}, {X_{t}}_{t = 1}^{T}),

(1)

where

{\hat{p}}_{i, T + 1} = ({\hat{p}}_{i, T + 1}^{(1)}, \dots, {\hat{p}}_{i, T + 1}^{(C)}) \in [0, 1]

with

\sum_{c = 1}^{C} {\hat{p}}_{i, T + 1}^{(c)} = 1

. The predicted class label is

{\hat{y}}_{i, T + 1} = arg {max}_{c} {\hat{p}}_{i, T + 1}^{(c)},

and

y_{i, T + 1} \in {1, \dots, C}

denotes the ground-truth alarm type for node i at time

T + 1

.

3.2. Spatial Feature Extraction Module

This component extracts static node embeddings that summarize the most informative neighbors and regularizes those embeddings so that they respect the topological structure of the graph.

We adopt the Graph Attention Network introduced in [18], as its attention mechanism assigns data-dependent weights to arbitrary-sized neighborhoods, offering improved adaptability over fixed-weight message passing in GCN [19], particularly for irregular sensor layouts. First, each building node is embedding into a latent space that captures static spatial correlations.

A shared linear layer produces the initial hidden vector followed by multi-head attention to aggregate neighbor information. Given a static attribute vector

x_{i} \in R^{F_{s}}

, where

F_{s}

denotes the dimensionality of the original static node features, a shared linear layer parameterized by weight matrix

W \in R^{F_{h} \times F_{s}}

projects the original features into a lower-dimensional embedding space of size

F_{h}

:

h_{i}^{space} = W x_{i}

(2)

To further encourage topology-aware representations, we adopt a margin-based contrastive loss inspired by [20]. This loss enlarges the distance between unconnected nodes while contracts the distance between connected ones.

This geometric constraint prevents embedding collapse, enhances class separability under limited supervision, and improves downstream alarm prediction:

L_{contrast} = \frac{1}{| P |} \sum_{(i, j) \in P} y_{i j} ∥ h_{i}^{space} - h_{j}^{space} ∥^{2} + (1 - y_{i j}) max {0, m - ∥ h_{i}^{space} - h_{j}^{space} {∥}}^{2}

(3)

where

y_{i j} = 1

if

(i, j) \in E

and 0 otherwise,

P

denotes the sampled set of node pairs, and m is a margin hyper-parameter. This spatial regularization term is later integrated with the supervised prediction loss (Equation (8)) to form the final training objective.

3.3. Temporal-Aware Graph Evolution Module

Beyond static spatial relationships, fire alarm events exhibit significant temporal dynamics. This module integrates time-series information and enables the graph structure to evolve over time in response to changing node features. We partition the historical timeline into T discrete intervals (e.g., monthly aggregations). For each node i at time t, we derive a temporal feature vector

h_{i, t}^{time}

from the alarm records, such as the number of alarms in interval t, the most recent alarm type, or other time-dependent indicators. This temporal feature is concatenated with the node’s spatial embedding

h_{i}^{space}

to form a joint representation.

Specifically, for each node i at time step t, we concatenate its spatial feature (obtained from the spatial module, which is generally time-invariant or computed from historical data up to t) with its current temporal feature

h_{i, t}^{time}

:

h_{i, t}^{combined} = h_{i}^{space} \oplus h_{i, t}^{time}

(4)

where ⊕ denotes vector concatenation.

We then use a graph attention mechanism (similar to that in the spatial module) [18] to re-evaluate the relationships between nodes based on these combined features. In particular, we compute new attention scores

{\tilde{e}}_{i j, t}

for node pairs

(i, j)

by applying the attention formula [18] on

h_{i, t}^{combined}

and

h_{j, t}^{combined}

, and obtain updated attention coefficients

{\tilde{α}}_{i j, t}

via a softmax [21]. These attention values

{\tilde{α}}_{i j, t}

now reflect the similarity or influence between nodes i and j at time t, taking into account the latest temporal information.

To allow the graph structure to adapt dynamically, we introduce a threshold-based mechanism: an edge between nodes i and j at time t is retained if and only if the attention coefficient

{\tilde{α}}_{i j, t}

exceeds a pre-defined threshold k.

Formally, we let

E_{t}

be the set of edges active at time t; then after computing

{\tilde{α}}_{i j, t}

, we update the edge set as

E_{t}^{'} = {(i, j) ∣ {\tilde{α}}_{i j, t} > k}

(5)

The threshold k serves as a hyperparameter controlling the sparsity of the dynamic graph. A higher k results in a graph sparser, preserving only the most connections, while a lower k retains more edges. In practice, k can be tuned based on validation performance or set heuristically.

Subsequently, we construct an updated graph

G_{t}^{'}

for the next time step:

G_{t}^{'} = UPDATE (G_{t}, E_{t}^{'})

(6)

where

UPDATE (G_{t}, E_{t}^{'})

produces a new graph

G_{t}^{'} = (V, E_{t}^{'})

with the same node set V but a modified edge set.

By iteratively updating the graph structure at each time step according to Equation (5), the model can introduce or remove connections between nodes as their feature similarity changes over time. The temporal module processes data sequentially from

t = 1

to T; it combines features (Equation (4)), updates node representations using attention, and adjusts the graph structure (Equations (5) and (6)). After the final time step T, each node possesses an updated feature

h_{i, T}^{'}

that encodes both spatial and recent temporal information, and the graph evolves to

G_{T}^{'}

. These final node features can then be utilized to predict the occurrence of a fire alarm at the next time step

T + 1

(e.g., by feeding

h_{i, T}^{'}

into a classifier or regression layer to output

{\hat{y}}_{i, T + 1}

).

3.4. Training Procedure and Loss Functions

As a result of the joint optimisation detailed below, the trained DeaGAT is able to issue precise and robust early-warning forecasts of future fire-alarm events.

During each training step, the input mini-batch is propagated through the spatial encoder, the contrastive branch, and the temporal module (Algorithm 1).

We let

{\hat{p}}_{i} = ({\hat{p}}_{i}^{(1)}, {\hat{p}}_{i}^{(2)}, \dots, {\hat{p}}_{i}^{(C)})

denote the predicted probability distribution over C alarm classes for node i and let

y_{i}

represent the corresponding one-hot encoded ground-truth label. The supervised prediction loss is computed using categorical cross-entropy [22], which effectively penalises predictions that diverge from the correct distribution, thus improving model calibration and discriminative power:

L_{pred} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i}^{(c)} log ({\hat{p}}_{i}^{(c)})

(7)

where

y_{i}^{(c)} = 1

if node i belongs to class c and 0 otherwise.

To regularize the node representations and encourage topology-aware embeddings, we incorporate a contrastive loss term

L_{contrast}

(Equation (3)). The final loss function is as follows:

L_{total} = L_{pred} + λ L_{contrast}

(8)

Here,

λ

is a hyperparameter that balances predictive accuracy and representation structure.

Gradients of

λ

are backpropagated to update all trainable parameters of the proposed model via the Adam optimizer.

Only these two loss components are required; no additional auxiliary terms or scheduling tricks are introduced, keeping the optimization pipeline straightforward and reproducible.

Algorithm 1 DeaGAT Model for Fire Alarm Prediction

Require: Static graph

G = (V, E)

with initial static node features

{x_{i} : i \in V}

; temporal input data

{X_{t} : t = 1, \dots, T}

, where each

X_{t} = {h_{i, t}^{time} : i \in V}

; attention threshold k.

Ensure: Updated graph

G^{'}

and node feature set

H^{'} = {h_{i}^{'} : i \in V}

after processing the current time step t.

1:: Spatial Feature Extraction: For each node $i \in V$ , apply GAT on G using static features $x_{i}$ to compute spatial embeddings $h_{i}^{space}$ .
2:: Temporal Feature Extraction: For each node $i \in V$ , obtain the current temporal embedding $h_{i, t}^{time}$ .
3:: Feature Combination: For each node i, concatenate its temporal embedding $h_{i, t}^{time}$ with spatial embedding $h_{i}^{space}$ to form the combined feature vector (Equation (4)).
4:: for each node $i \in V$ do
5:: Update node i’s feature to the combined vector: $X_{i} \leftarrow h_{i, t}^{combined}$ .
6:: end for
7:: Attention-based Feature Aggregation: For each node $i \in V$ , compute an updated feature $h_{i}^{'}$ by aggregating features of i’s neighbors through the attention mechanism.
8:: Graph Update: Determine important connections by thresholding attention coefficients with threshold k, updating edge set $E^{'}$ (Equation (5)), where $α_{i j}$ denotes the attention coefficient between node i and node j. This yields the updated graph $G^{'} = (V, E^{'})$ for the next time step.
9:: Return the updated graph $G^{'}$ and node features $H^{'} = {h_{i}^{'} : i \in V}$ .

Algorithm 1 outlines the prediction process of the proposed DeaGAT model. At each time step, the model first applies spatial feature extraction using a GAT to encode the static spatial context into node embeddings

h_{i}^{space}

. These spatial embeddings are then concatenated with the corresponding temporal input features

h_{i, t}^{time}

to form the combined representations

h_{i, t}^{combined}

(Equation (4)). Subsequently, another attention-based aggregation is performed to produce the new node embeddings

h_{i}^{'}

. The attention coefficients obtained during this aggregation are further used to prune and update the graph structure by pruning less significant edges (Equation (5)), thereby generating an updated graph

G^{'}

for the next time step (Equation (6)). Through this recurrent mechanism of feature updating and graph refinement, the DeaGAT effectively captures both spatial and temporal features of the fire alarm data.

4. Experiments

To comprehensively evaluate the practical effectiveness and robustness of the proposed framework, we conduct a series of experiments designed to validate its capability for spatio-temporal fire alarm prediction. We first compare our proposed DeaGAT model with state-of-the-art spatio-temporal baseline methods to verify its advantage in accurately modeling complex spatial interactions and temporal dependencies identified in related work. Then, we perform detailed ablation studies to quantitatively assess the individual contributions of each major component—the static spatial encoder, temporal graph updater, and the joint training objective—demonstrating how these components collaboratively enhance prediction performance. The details of the experimental dataset, selected baselines, evaluation metrics, and implementation settings are presented in the following subsections.

4.1. Dataset and Experimental Setup

Dataset: We evaluate our method on a private, large-scale fire alarm dataset collected from an IoT-based fire alarm platform in a major city. The dataset comprises 371,632 fire alarm records spanning from November 2019 to May 2023.

Each record includes the alarm occurrence time and alarm type as dynamic features, along with static attributes of the associated buildings, such as building area, height, and the number of floors.

An initial building-sensor graph is constructed, where each node represents a building. Edges between buildings are established based on geographic proximity; specifically, an edge is created if the differences in latitude and longitude between two buildings are less than 0.001 degrees, reflecting potential spatial correlations in fire incidents. Each node’s initial features consist of static attributes like fire safety rating and hazard levels, while dynamic alarm events constitute the temporal, time-varying features.

Experimental Setup: All models are trained and evaluated on the same split of the fire alarm dataset to ensure strict comparability. Specifically, we allocate 80% of the 371,632 labeled records to the training set (297,305 samples) and the remaining 20% (74,327 samples) to the test set. We use six metrics: accuracy, precision, recall, F1-score, area under the ROC curve (AUC) and average precision (AP) [23], and run every model 10 independent times with different random seeds, presenting the mean ± standard deviation.

In our DeaGAT model, the spatial module utilizes a GAT with 8 hidden units and 8 attention heads [18]. A dropout rate of 0.6 is applied to the attention coefficients to mitigate overfitting, and the attention weights are normalized using a softmax function. The temporal module begins by linearly projecting the dynamic event features of each time step into a 64-dimensional embedding. This projected vector is then element-wise summed with both the static node embeddings and the embeddings from the previous time step to generate a fused representation. The resulting fused vectors are passed through a graph attention convolution layer configured to map 64 input dimensions to 64 output dimensions using four attention heads. This layer includes automatic self-loop addition and averages the outputs of all heads, followed by a ReLU activation. The process is repeated at each time step to capture the evolving temporal dynamics of the nodes. The initial attention threshold for dynamic graph updates is set to

k = 0.3

[24]. Training is conducted for 200 epochs with a learning rate of 0.005 (with stepwise decay) and a weight decay of

5 \times 10^{- 4}

[25]. Each baseline model is either run with the hyper-parameters recommended in its original publication or tuned on a held-out validation subset drawn from the training data.

4.2. Baseline Methods

To ensure a comprehensive evaluation, we select baselines models that are widely recognized in the spatio-temporal prediction literature.

(1) Traditional non-graph models. Random forest (RF) [10] and support vector machine (SVM) [11] have been extensively employed as reference points in recent spatio-temporal studies [26]. Both models operate directly on tabular features without leveraging explicit graph structures.

(2) Static-graph GNNs. GAT [18] applies attention weighting on a fixed adjacency matrix. TGAT [5] extends attention mechanism to continuous-time edges through functional time encoding. TGN [6] enhances message passing with node memories, facilitating learning from streaming event graphs.

(3) Dynamic-graph GNNs. STSGCN [3] performs localized spatial–temporal convolutions on synchronous graph snapshots. EvolveGCN [13] treats GCN parameters as a recurrent state that evolves over discrete time steps. MTGNN [4] learns a data-driven adjacency matrix coupled with temporal convolutions for multivariate forecasting. STGormer [14] integrates graph structures with temporal position encodings in a Transformer-style architecture. HP-DGNN [17] embeds a Hawkes point process into a dynamic GNN to model mutually exciting event sequences.

(4) Proposed Model. The DeaGAT model dynamically prunes and grows edges via attention threshold k and further sharpens node embeddings with a contrastive objective, jointly leveraging static building attributes and streaming alarm signals.

4.3. Performance Analysis of Fire Alarm Data Prediction

Table 1 reports the classification performance of our DeaGAT and baseline models. As shown, DeaGAT consistently achieves the best results across all evaluation metrics, with an accuracy of

88.64 %

and an F1-score of

87.31 %

. In contrast, STGormer [14] achieves

87.7 %

accuracy and

86.4 %

F1-score, while HP-DGNN [17] reaches

87 %

accuracy and

85.5 %

F1-score. DeaGAT also outperforms all other models in terms of AUC (91.47%) and AP (90.25%), surpassing the closest competitors by a clear margin.

Traditional machine learning models perform significantly worse. RF [10] achieves

74.15 %

accuracy and

70.2 %

F1-score, while SVM [11] yields

75.68 %

accuracy and

72.26 %

F1-score. These results highlight the limitations of flat classifiers in modeling the complex spatiotemporal dependencies inherent in fire-alarm data.

Graph-based methods show notable improvements over traditional models. For instance, a vanilla GAT [18], which captures static neighborhood relationships, attains 84.6% accuracy and 81.2% F1, yet still falls short of approaches that incorporate temporal dynamics. STSGCN [3], a spatiotemporal convolutional model, reaches 85.53% accuracy and 81.13% F1 but is limited by its fixed adjacency structure. Temporal GNNs such as TGAT [5] and TGN [6] further improve performance through time-aware attention yet still lag behind the top-performing models.

More advanced approaches demonstrate the benefits of adaptive graph structures and dynamic learning. MTGNN [4] achieves 86.82% accuracy and 84.5% F1, while EvolveGCN [13] achieves 86.03% accuracy and 82.13% F1, leveraging evolutionary weight updates. The transformer-based STGormer further boosts performance by modeling long-range dependencies, but its metrics still fall short of DeaGAT. Likewise, HP-DGNN, which incorporates Hawkes process modeling, improves AUC and AP but does not surpass DeaGAT’s performance.

Overall, these results demonstrate that DeaGAT’s integration of dynamic edge attention and contrastive learning effectively captures evolving spatial-temporal patterns, enabling the most accurate and robust predictions on the fire-alarm dataset.

4.4. Ablation Studies

We conduct ablation studies to evaluate the individual contributions of key components in the DeaGAT model. Specifically, we investigate the following elements: (i) the dynamic graph structure updating mechanism in the spatial module, (ii) the contrastive learning component for node representation enhancement, and (iii) the integration of static input features. To this end, we construct variant models in which each of these components is removed or disabled, and we assess the resulting impact on model performance.

First, we evaluate two simplified versions of the spatial module: one that retains only the GAT message-passing mechanism without contrastive learning and another that applies contrastive learning on node features without incorporating neighbor message passing.

Table 2 presents the classification metrics for each ablated variant and DeaGAT. As shown, removing either component leads to a noticeable drop in performance.

Specifically, the model without message passing achieves an accuracy of 68.74%, a recall of 61.78%, and an F1-score of 64.01%, indicating that without neighborhood aggregation, the model fails to capture crucial relational information. On the other hand, the variant without contrastive learning yields an accuracy of

75.23 %

and an F1-score of

73.29 %

, suggesting that while structural aggregation enables local pattern learning, it lacks the discriminative strength provided by contrastive guidance. By comparison, the full DeaGAT model achieves an accuracy of

88.64 %

, an F1-score of

87.31 %

, and a recall of

86.80 %

, confirming that the synergy between message passing and contrastive learning is essential for achieving better performance in fire alarm prediction. Notably, the larger performance drop resulting from the removal of contrastive learning underscores its critical role in producing robust and discriminative node embeddings.

This table also analyzes the individual contributions of static features to predictive performance, thereby assessing whether the model can still predict alarms solely based on temporal feature variations in the absence of building-specific attributes.

To this end, we construct a “dynamic-only” model that omits static building attributes (using only time-series inputs). The dynamic graph updater and contrastive learning remain active, but the input feature set is restricted. The dynamic-only variant achieves an accuracy of

84.21 %

, a recall of

82.50 %

, and an F1-score of

83.00 %

, indicating that omitting static contextual information such as building size or hazard level significantly hampers overall performance. These findings confirm that static attributes provide essential risk priors while dynamic features convey real-time hazard evolution, and that DeaGAT’s strength derives from their joint exploitation within an adaptive graph structure and contrastive learning framework.

4.5. Hyperparameter Sensitivity

We further investigate the sensitivity of the DeaGAT model to key hyperparameters (Figure 2), including the learning rate

η

, the attention threshold k for graph updating, and the contrastive loss margin m. For each hyperparameter, we tune its value based on performance on a held-out validation set while keeping all other settings fixed and report the resulting model performance on the validation data. This procedure allows us to systematically assess how each hyperparameter influences the predictive accuracy of DeaGAT.

Learning rate ( $η$ ): We train the model using different learning rates to examine their impact on convergence and accuracy. As shown in Figure 2a, very small learning rates (below 0.01) yield only marginal accuracy improvements but significantly prolong the training time, indicating slow convergence. In contrast, excessively large values (above 0.05) cause the model to diverge or converge to suboptimal solutions due to overshooting.

An intermediate learning rate around

η = 0.05

provides the best trade-off, achieving the highest accuracy while ensuring stable convergence. Thus, we set

η \approx 0.05

as the optimal value for our proposed DeaGAT.

Attention threshold ( $k$ ): This threshold in the spatial module determines how dynamically updated edges are selected based on attention weights. A higher k results in a sparser graph (fewer retained edges), while a lower k retains more connections. We test k in the range of 0.1 to 0.6. As shown in Figure 2b, the model achieves peak performance around

k = 0.3

, where the attention mechanism effectively filters out weak or noisy connections while preserving the most informative relationships. Higher values (e.g.,

k > 0.3

) lead to excessive sparsity and the loss of crucial inter-node interactions, while lower values (e.g.,

k = 0.1

) risk overfitting to irrelevant or weak relationships. Thus, a moderate threshold (

k = 0.3

) offers the best balance between informative structure and noise suppression.

Contrastive margin ( $m$ ): The margin m in the contrastive loss determines how far apart representations of unconnected nodes are pushed in the embedding space. We experiment with various values of m. As illustrated in Figure 2c, performance improves as m increases, peaking at

m = 0.9

. This value provides a clear separation between unrelated node pairs without destabilizing training. Smaller margins weaken the contrastive effect, resulting in embeddings of dissimilar nodes being insufficiently separated. Larger margins overemphasize negative pairs, potentially degrading the learning of subtle similarities among nodes. We can find that

m = 0.9

achieves the highest F1-score, demonstrating a well-balanced separation that supports both generalization and discrimination. Overall, the DeaGAT model is not overly sensitive to small deviations in hyperparameters around their optimal values. Appropriate tuning of the learning rate, attention threshold, and contrastive margin can enhance performance, but the model remains robust within a reasonable range. Our empirical results confirm that the chosen settings (

η = 0.05

,

k = 0.3

,

m = 0.9

) represent a near-optimal configuration for the task, offering strong and stable performance.

4.6. Case Study on Interpretability

To investigate the interpretability of DeaGAT, we conducted a case study focusing on Node 1350. Specifically, we analyzed a fire alarm event that occurred on 29 July 2022 and tracked the 15 nodes with the highest attention weights in the graph both before and after the alarm. Statistical analysis of these key attention values was performed to examine how the model dynamically shifts focus across nodes in response to evolving conditions. The results, which illustrate the temporal redistribution of attention and highlight nodes that became critical leading up to the alarm, are presented in Figure 3.

From the above figure, several insights can be drawn:

Early detection of critical nodes: Node 1350 exhibits a prominently high attention weight of 0.62 prior to the alarm event, which is substantially higher than that of other nodes. This observation demonstrates that DeaGAT can successfully identify pivotal risk nodes in advance, underscoring the model’s effective early-warning capability in spatio-temporal prediction tasks.

Dynamic adaptation of attention: Following the alarm, the attention weight of Node 1350 decreases slightly to 0.60 afterward, while nodes closely associated with it, such as 1346 and 1351, experience an increase. Concurrently, Nodes 922, 701, and 373 show substantial attention growth, indicating the model’s ability to dynamically redistribute focus to emergent risk nodes as the event unfolds.

Context-sensitive modulation of node importance: Nodes that initially had elevated attention, such as 1347, demonstrate a reduction post-alarm, suggesting that their relative influence diminishes once primary risk nodes are activated. Conversely, nodes with previously low attention, including 54 and 105, maintain minimal weights, reflecting the model’s selective focus on nodes most relevant to the evolving risk scenario.

Collectively, this case study demonstrates the strong interpretability of DeaGAT, as the evolution of attention weights provides clear evidence of how the model prioritizes nodes under different risk conditions. By highlighting critical nodes before the alarm and adaptively redistributing attention to correlated nodes after the event, DeaGAT offers transparent insights into the mechanisms driving its predictions, thereby enhancing trust and practical applicability in safety-critical scenarios such as fire hazard forecasting.

5. Conclusions

In this study, we proposed a novel predictive model, DeaGAT, designed to enhance the early warning capabilities of fire alarm systems. Built upon the GAT architecture, the method effectively captures complex inter-node relationships and dynamically adjusts edge weights to reflect evolving spatial dependencies. By integrating contrastive learning, DeaGAT improves the discrimination of fire risk states and highlights key environmental factors contributing to alarm events. Extensive experimental results demonstrated that our method outperforms existing spatio-temporal data mining and machine learning approaches, particularly in dynamic edge adaptation, risk factor analysis, and generalization across diverse scenarios.

Looking forward, DeaGAT has the potential to be extended to a broader range of domains, including traffic flow management, public health monitoring, and environmental protection. To apply the model in these domains, certain adaptations may be required: for example, incorporating domain-specific node and edge features, handling heterogeneous or missing data, and ensuring scalability to larger and more complex networks. Moreover, practical deployment may face challenges such as real-time data streaming, system integration, and robustness to noisy or incomplete information. In particular, the reliability of DeaGAT under perturbations—such as noise in node features or temporal signals—requires careful consideration, as such uncertainties could affect attention-based edge refinement or contrastive loss components. Future work could explore strategies like adversarial training, robust attention mechanisms, or uncertainty-aware contrastive losses to enhance robustness. By addressing these challenges, DeaGAT can provide interpretable and reliable predictions, offering valuable insights for decision-making and policy support in diverse urban and societal contexts.

Author Contributions

Validation, S.J.; Data curation, S.J.; Writing—original draft, Y.D.; Writing—review and editing, Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset used in this study was obtained from internal company records and contains sensitive information. Due to confidentiality and privacy restrictions, the data cannot be made publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gupta, S.; Kanwar, S.; Kashyap, M. Performance characteristics and assessment of fire alarm system. Mater. Today Proc. 2022, 57, 2036–2040. [Google Scholar] [CrossRef]
Seo, Y.; Defferrard, M.; Vandergheynst, P.; Bresson, X. Structured sequence modeling with graph convolutional recurrent networks. In Proceedings of the Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, 13–16 December 2018; proceedings, part I 25. Springer: Berlin/Heidelberg, Germany, 2018; pp. 362–373. [Google Scholar]
Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 914–921. [Google Scholar]
Gao, J.; Zhang, X.; Tian, L.; Liu, Y.; Wang, J.; Li, Z.; Hu, X. MTGNN: Multi-Task Graph Neural Network based few-shot learning for disease similarity measurement. Methods 2022, 198, 88–95. [Google Scholar] [CrossRef] [PubMed]
Xu, D.; Ruan, C.; Korpeoglu, E.; Kumar, S.; Achan, K. Inductive representation learning on temporal graphs. arXiv 2020, arXiv:2002.07962. [Google Scholar] [CrossRef]
Rossi, E.; Chamberlain, B.; Frasca, F.; Eynard, D.; Monti, F.; Bronstein, M. Temporal graph networks for deep learning on dynamic graphs. arXiv 2020, arXiv:2006.10637. [Google Scholar] [CrossRef]
Zhang, Y.; Geng, P.; Sivaparthipan, C.; Muthu, B.A. Big data and artificial intelligence based early risk warning system of fire hazard for smart cities. Sustain. Energy Technol. Assessments 2021, 45, 100986. [Google Scholar] [CrossRef]
Liu, H.H.; Chang, R.Y.; Chen, Y.Y.; Fu, I.K.; Poor, H.V. Sensor deployment and link analysis in satellite IoT systems for wildfire detection. In Proceedings of the GLOBECOM 2022—2022 IEEE Global Communications Conference, Rio de Janeiro, RJ, Brazil, 4–8 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 5631–5636. [Google Scholar]
Wang, R.; Li, Y.; Sun, H.; Yang, K. Multisensor-Weighted Fusion Algorithm Based on Improved AHP for Aircraft Fire Detection. Complexity 2021, 2021, 8704924. [Google Scholar] [CrossRef]
Hengl, T.; Nussbaum, M.; Wright, M.N.; Heuvelink, G.B.; Gräler, B. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 2018, 6, e5518. [Google Scholar] [CrossRef] [PubMed]
Mourao-Miranda, J.; Friston, K.J.; Brammer, M. Dynamic discrimination analysis: A spatial–temporal SVM. Neuroimage 2007, 36, 88–99. [Google Scholar] [CrossRef] [PubMed]
Chang, D.; Cui, L.; Huang, Z. A cellular-automaton agent-hybrid model for emergency evacuation of people in public places. IEEE Access 2020, 8, 79541–79551. [Google Scholar] [CrossRef]
Pareja, A.; Domeniconi, G.; Chen, J.; Ma, T.; Suzumura, T.; Kanezashi, H.; Kaler, T.; Schardl, T.; Leiserson, C. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In Proceedings of the AAAI conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 5363–5370. [Google Scholar]
Zhou, J.; Liu, E.; Chen, W.; Zhong, S.; Liang, Y. Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting. arXiv 2024, arXiv:2408.10822. [Google Scholar] [CrossRef]
Lin, J.; Ren, Q. Rethinking Spatio-Temporal Transformer for Traffic Prediction: Multi-level Multi-view Augmented Learning Framework. arXiv 2024, arXiv:2406.11921. [Google Scholar]
Wen, H.; Lin, Y.; Xia, Y.; Wan, H.; Wen, Q.; Zimmermann, R.; Liang, Y. Diffstg: Probabilistic spatio-temporal graph forecasting with denoising diffusion models. In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems, Hamburg, Germany, 13–16 November 2023; pp. 1–12. [Google Scholar]
Wang, Z.; Hu, B.; Yao, K.; Liang, J. Hawkes Point Process-enhanced Dynamic Graph Neural Network. In Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, Hannover, Germany, 10–14 March 2025; pp. 401–409. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Chopra, S.; Hadsell, R.; LeCun, Y. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 21–23 September 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 1, pp. 539–546. [Google Scholar]
Bridle, J. Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. Adv. Neural Inf. Process. Syst. 1989, 2, 211–217. [Google Scholar]
Hinton, G.E.; Dayan, P.; Frey, B.J.; Neal, R.M. The“ wake-sleep” algorithm for unsupervised neural networks. Science 1995, 268, 1158–1161. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Li, J.; Zhao, W.; Han, Z.; Zhao, H.; Wang, L.; He, X. N-STGAT: Spatio-temporal graph neural network based network intrusion detection for near-earth remote sensing. Remote Sens. 2023, 15, 3611. [Google Scholar] [CrossRef]
Wang, L.; Huang, W.; Zhang, M.; Pan, S.; Chang, X.; Su, S.W. Pruning graph neural networks by evaluating edge properties. Knowl.-Based Syst. 2022, 256, 109847. [Google Scholar] [CrossRef]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
Sahili, Z.A.; Awad, M. Spatio-temporal graph neural networks: A survey. arXiv 2023, arXiv:2301.10569. [Google Scholar] [CrossRef]

Figure 1. The model consists of two key components: a spatial feature extraction module and a temporal-aware graph update module. The spatial module captures spatial correlations by embedding static node attributes using a graph attention network. The temporal module predicts future alarms by modeling temporal dependencies with dynamic graph updates, where the graph structure is adaptively refined at each time step based on the evolving node embeddings.

Figure 2. Hyperparameter sensitivity analysis. Each curve represents the average of five independent runs per hyperparameter setting to ensure robustness and generalizability of the conclusions. (a) Impact of different learning rates on model accuracy. (b) Impact of the attention threshold k (graph sparsity) on model performance. (c) Impact of the contrastive margin m on model performance.

Figure 3. Visualization of attention weight distribution among key nodes before and after the alarm event. (a) Top 15 nodes with highest attention weights before the alarm event; (b) Top 15 nodes with highest attention weights after the alarm event.

Table 1. Comparison of prediction performance (mean ± standard deviation) with baseline methods. Boldface indicates the best result in each column. We added divider lines in the table to separate traditional machine learning methods, static graph network methods, dynamic graph network methods, and our proposed method.

Method	Accuracy	Precision	Recall	F1-Score	AUC	AP
Random Forest [10]	0.7415 ± 0.004	0.7202 ± 0.005	0.6850 ± 0.006	0.7020 ± 0.005	0.7814 ± 0.007	0.7453 ± 0.008
SVM [11]	0.7568 ± 0.006	0.7437 ± 0.007	0.7026 ± 0.009	0.7226 ± 0.008	0.8012 ± 0.010	0.7645 ± 0.011
GAT [18]	0.8460 ± 0.005	0.8250 ± 0.006	0.8000 ± 0.008	0.8120 ± 0.007	0.8655 ± 0.007	0.8321 ± 0.008
TGAT [5]	0.8580 ± 0.006	0.8400 ± 0.005	0.8205 ± 0.007	0.8300 ± 0.006	0.8850 ± 0.008	0.8173 ± 0.007
TGN [6]	0.8620 ± 0.005	0.8450 ± 0.005	0.8302 ± 0.006	0.8370 ± 0.006	0.8891 ± 0.006	0.8265 ± 0.007
STSGCN [3]	0.8553 ± 0.003	0.8307 ± 0.005	0.7928 ± 0.006	0.8113 ± 0.005	0.8840 ± 0.005	0.8025 ± 0.006
EvolveGCN [13]	0.8603 ± 0.005	0.8381 ± 0.006	0.8051 ± 0.007	0.8213 ± 0.006	0.8875 ± 0.007	0.8204 ± 0.008
MTGNN [4]	0.8682 ± 0.004	0.8528 ± 0.004	0.8374 ± 0.005	0.8450 ± 0.005	0.8922 ± 0.006	0.8541 ± 0.005
STGormer [14]	0.8770 ± 0.004	0.8680 ± 0.003	0.8600 ± 0.005	0.8640 ± 0.004	0.9025 ± 0.005	0.8742 ± 0.004
HP-DGNN [17]	0.8700 ± 0.005	0.8600 ± 0.005	0.8505 ± 0.006	0.8550 ± 0.005	0.8984 ± 0.006	0.8690 ± 0.006
DeaGAT	0.8864 ± 0.002	0.8783 ± 0.003	0.8680 ± 0.004	0.8731 ± 0.003	0.9147 ± 0.004	0.9025 ± 0.003

Table 2. Ablation study results with mean ± standard deviation.

Model Variant	Accuracy	Precision	Recall	F1-Score
Without GAT Message Passing	0.6874 ± 0.006	0.6640 ± 0.007	0.6178 ± 0.008	0.6401 ± 0.007
Without Contrastive Learning	0.7523 ± 0.005	0.7475 ± 0.006	0.7190 ± 0.006	0.7329 ± 0.005
Without Static Features	0.8421 ± 0.004	0.8350 ± 0.005	0.8250 ± 0.005	0.8300 ± 0.004
DeaGAT	0.8864 ± 0.002	0.8783 ± 0.003	0.8680 ± 0.004	0.8731 ± 0.003

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ding, Y.; Xie, Z.; Jiang, S. A Novel Dynamic Edge-Adjusted Graph Attention Network for Fire Alarm Data Mining and Prediction. Mathematics 2025, 13, 3111. https://doi.org/10.3390/math13193111

AMA Style

Ding Y, Xie Z, Jiang S. A Novel Dynamic Edge-Adjusted Graph Attention Network for Fire Alarm Data Mining and Prediction. Mathematics. 2025; 13(19):3111. https://doi.org/10.3390/math13193111

Chicago/Turabian Style

Ding, Yongkun, Zhenping Xie, and Senlin Jiang. 2025. "A Novel Dynamic Edge-Adjusted Graph Attention Network for Fire Alarm Data Mining and Prediction" Mathematics 13, no. 19: 3111. https://doi.org/10.3390/math13193111

APA Style

Ding, Y., Xie, Z., & Jiang, S. (2025). A Novel Dynamic Edge-Adjusted Graph Attention Network for Fire Alarm Data Mining and Prediction. Mathematics, 13(19), 3111. https://doi.org/10.3390/math13193111

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Dynamic Edge-Adjusted Graph Attention Network for Fire Alarm Data Mining and Prediction

Abstract

1. Introduction

2. Related Work

2.1. Fire Alarm Data Mining and Prediction

2.2. Spatio-Temporal Data Analysis with GNN

3. Dynamic Edge-Adjusted Graph Attention Network

3.1. Task Formulation

3.2. Spatial Feature Extraction Module

3.3. Temporal-Aware Graph Evolution Module

3.4. Training Procedure and Loss Functions

4. Experiments

4.1. Dataset and Experimental Setup

4.2. Baseline Methods

4.3. Performance Analysis of Fire Alarm Data Prediction

4.4. Ablation Studies

4.5. Hyperparameter Sensitivity

4.6. Case Study on Interpretability

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI