Next Article in Journal
ResDiff: Hardware-Aware Physical-Layer Covert Communication via Diffusion-Based Residual Perturbation
Previous Article in Journal
SCAT: A Spectral-Convolutional Anomaly Transformer for Multivariate Time Series Anomaly Detection
Previous Article in Special Issue
Communication-Efficient Federated Optimization with Gradient Clipping and Attention Aggregation for Data Analytics and Prediction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data-Centric Generative and Adaptive Detection Framework for Abnormal Transaction Prediction

1
Department of Cross-Border E-Commerce, School of Management, Guangdong University of Science and Technology, Dongguan 510006, China
2
National School of Development, Peking University, Beijing 100871, China
3
School of Economics and Management, Beijing University of Technology, Beijing 100124, China
4
College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
5
School of Economics and Management, Tsinghua University, Beijing 100084, China
*
Authors to whom correspondence should be addressed.
Electronics 2026, 15(3), 633; https://doi.org/10.3390/electronics15030633
Submission received: 4 December 2025 / Revised: 31 December 2025 / Accepted: 6 January 2026 / Published: 2 February 2026
(This article belongs to the Special Issue Machine Learning in Data Analytics and Prediction)

Abstract

Anomalous transaction behaviors in cryptocurrency markets exhibit high concealment, substantial diversity, and strong cross-modal coupling, making traditional rule-based or single-feature analytical methods insufficient for reliable detection in real-world environments. To address the research focus, a data-centric multimodal anomaly detection framework integrating generative augmentation, latent distribution modeling, and dual-branch real-time detection is proposed. The method employs a generative adversarial network with feature-consistency constraints to mitigate the scarcity of fraudulent samples, and adopts a multi-domain variational modeling strategy to learn the latent distribution of normal behaviors, enabling stable anomaly scoring. By combining the long-range temporal modeling capability of Transformer architectures with the sensitivity of online clustering to local structural deviations, the system dynamically integrates global and local information through an adaptive risk fusion mechanism, thereby enhancing robustness and real-time detection capability. Experimental results demonstrate that the generative augmentation module yields substantial improvements, increasing the recall from 0.421 to 0.671 and the F1-score to 0.692 . In anomaly distribution modeling, the multi-domain VAE achieves an area under the curve (AUC) of 0.854 and an F1-score of 0.660 , significantly outperforming traditional One-Class SVM and autoencoder baselines. Multimodal fusion experiments further verify the complementarity of the dual-branch detection structure, with the adaptive fusion model achieving an AUC of 0.884 , an F1-score of 0.713 , and reducing the false positive rate to 0.087 . Ablation studies show that the complete model surpasses any individual module in terms of precision, recall, and F1-score, confirming the synergistic benefits of its integrated components. Overall, the proposed framework achieves high accuracy and high recall in data-scarce, structurally complex, and latency-sensitive cryptocurrency scenarios, providing a scalable and efficient solution for deploying data-centric artificial intelligence in financial security applications.

1. Introduction

With the rapid development in blockchain technologies and the expanding crypto-finance ecosystem, the cryptocurrency market has become an indispensable component of the global financial system [1,2]. The number of global crypto-asset participants has reached hundreds of millions, and trading activities exhibit increasingly high-frequency, complex, and cross-chain characteristics [3]. Unlike traditional financial markets, crypto-asset transactions possess decentralization, anonymity, and programmability. While these attributes enhance transactional efficiency and financial innovation, they also provide covert channels for a wide spectrum of anomalous behaviors. To systematically address these risks, crypto-asset anomalies can be formally categorized into three distinct dimensions: Illicit Transaction Flows (e.g., money laundering layering, dark web payments, and terrorist financing) [4], Structural Behavioral Frauds (e.g., Ponzi schemes, phishing, and DeFi rug pulls), and Market Manipulation Patterns (e.g., wash trading and pump-and-dump schemes). Furthermore, the rise of DeFi and NFT (non-fungible tokens) ecosystems has introduced cross-domain complexity, blurring the boundaries between these categories and imposing substantial challenges to global financial regulation [5]. Under these circumstances, establishing an efficient, accurate, and real-time anomaly detection system [6,7] capable of identifying these multi-dimensional risks is of significant importance for maintaining market integrity and supporting anti-money laundering (AML), counter-terrorist financing (CFT), and financial risk control [8].
Early financial anomaly detection methods primarily relied on rule matching and statistical analysis [9]. Typical approaches involve expert-crafted rules such as abnormal trading frequency, sudden amount fluctuations, or frequent transfers between specific accounts [10]. These methods are intuitive and interpretable but heavily rely on expert experience and historical patterns, making them ineffective against emerging fraud strategies and complex relational behaviors [11]. Furthermore, rule updates usually lag behind market evolution, resulting in high false-negative rates and low recall [12]. Statistical modeling methods identify anomalies through probabilistic distributions or distance metrics, such as gaussian mixture models (GMM), mahalanobis distance, or isolation forest [13]. These approaches assume that normal transactions follow a stable distribution and anomalies deviate from it in probabilistic terms [14]. However, crypto-asset markets exhibit strong non-stationarity and heavy-tailed distributions, with trading behaviors heavily influenced by external shocks such as crashes, regulatory shifts, and cyberattacks. Consequently, conventional statistical methods often fail in dynamic environments [15].
With the broader adoption of machine learning in financial risk control [16], supervised and semi-supervised learning models have been introduced for anomaly detection [17]. Common models include support vector machines (SVM), random forests (RF), XGBoost and ensemble-based methods [18,19]. Although these models can achieve reasonable performance when supported by sufficient feature engineering, they face two major limitations: (1) fraudulent transaction samples are extremely scarce and severely imbalanced, causing overfitting and strong bias toward the majority class [20]; and (2) these models rely on manually designed low-dimensional features and thus cannot capture complex nonlinear relationships between transaction graphs and temporal dynamics [21]. Consequently, their generalization capability and real-time performance are limited in highly dynamic and evolving blockchain environments [22]. To overcome these limitations in high-dimensional complex data modeling, deep learning techniques have increasingly become the mainstream solution for financial anomaly detection [23]. Yu et al. [24] proposed a GAN-based real-time transactional anomaly detection framework achieving 94.7 % accuracy with latency below 3 ms . James Uche et al. [25] integrated explainable AI with generative models for real-time fraud monitoring, improving robustness under adversarial perturbations. Dixit et al. [26] combined advanced generative models with temporal attention, integrating WGAN-GP, feature preservation, and adaptive thresholding to enhance detection performance and maintain millisecond-level latency. Qu et al. [27] introduced MFGAN, a multimodal anomaly detection framework combining attention-enhanced AE (autoencoders) and GANs, yielding approximately 5.6 % improvement in F1 score on real industrial sensor data. Chen et al. [28] proposed a multimodal anomaly detection method fusing time-domain and frequency-domain features, achieving precision 97.6 % and F1-score 0.951 in regional power grid monitoring. Moreover, hybrid architectures integrating Transformers with other deep learning branches have shown great potential in complex classification tasks; for instance, recent work has proposed a feature cross-layer interaction method based on Res2Net and Transformers to effectively extract and fuse complementary feature information [29]. However, several fundamental challenges remain: scarcity and imbalance of fraudulent transaction samples, which limits deep model training; difficulty in multimodal data fusion due to the heterogeneity and asynchrony between on-chain structural features and off-chain price dynamics; additionally, interpretability and scalability constraints, as black-box deep models pose challenges for compliance auditing and regulatory adoption.
To address these issues, a multimodal real-time anomaly detection framework is proposed, referred to as the Real-time Multi-modal Anomaly Detection Framework for Crypto-assets (RMAD-Crypto). Specifically, the main innovations of this work include:
  • An integrated generation–detection mechanism: A GAN-based fraudulent sample generator is introduced to synthesize high-fidelity and diverse fraudulent transactions, mitigating data imbalance and overfitting; the discriminator further assists in anomaly confidence estimation during detection;
  • Multi-domain latent distribution modeling: A VAE-based feature encoding network is designed to map on-chain structures, behavioral patterns, and price dynamics into a unified latent space, where anomalies are quantified through reconstruction error and latent density estimation;
  • Cross-modal temporal detection: A dual-branch detection module combining Transformer prediction and online clustering is developed; the Transformer branch captures long-range, cross-modal dependencies, while the clustering branch performs real-time deviation detection, and their outputs are fused for robust anomaly assessment;
  • Real-time and scalable architecture: The framework supports streaming input and online updates, and its modular design enables deployment across multiple blockchains such as BTC, ETH, and BSC;
  • Empirical performance improvement: Experiments on real-world crypto-asset datasets demonstrate that recall improves by approximately 15– 25 % compared with traditional models, while maintaining millisecond-level latency.

2. Related Work

2.1. Financial Abnormal Transaction Detection Methods

The primary objective of financial anomaly detection is to identify potential abnormal samples or fraudulent behaviors from large volumes of normal transactions [30]. Traditional approaches can be broadly categorized into rule-based methods, statistical modeling methods, and machine learning–based methods [31]. Rule-based methods often rely on expert knowledge and regulatory experience, defining thresholds or logical constraints to determine suspicious transactions [10]. Although straightforward and interpretable, threshold-based detection is highly sensitive to market dynamics. Statistical modeling methods effectively captures feature correlations; however, it relies on the assumption of stable and estimable data distributions [32]. Due to the high volatility and structural drift observed in cryptocurrency markets, this assumption often fails, limiting the generalization capability of statistical models in real-world trading environments [33]. With the development in machine learning techniques, supervised and semi-supervised approaches have been introduced into anomaly detection [34]. Nevertheless, in cryptocurrency environments, anomalies typically account for less than 1 % of all transactions, causing severe class imbalance that hinders effective boundary learning [35]. Furthermore, due to the complex structural relations and strong temporal dependencies in blockchain transactions, conventional machine learning methods struggle to capture high-dimensional nonlinear behavioral patterns, leading to reduced detection performance [36].
To overcome these limitations, recent research has increasingly focused on hybrid deep learning frameworks that synergize Generative Adversarial Networks (GANs) with advanced sequential models, such as Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRU), and Transformers. These combinations aim to simultaneously resolve the data scarcity of fraudulent samples and the modeling of intricate temporal dependencies. For instance, Sanjalawe et al. proposed a GAN-based real-time framework that synthesizes high-fidelity transaction patterns [20]. Moving beyond basic recurrence, Dixit et al. integrated generative models with temporal attention mechanisms, employing WGAN-GP to stabilize training and adaptive thresholding to capture long-range dependencies that traditional RNNs often miss [26]. Similarly, Qu et al. introduced MFGAN, a multimodal architecture that fuses attention-enhanced autoencoders with GANs, demonstrating superior performance in identifying complex anomalies within high-dimensional sensor data [27]. These state-of-the-art approaches leverage the generative capability to mitigate class imbalance and the self-attention mechanisms of Transformers to extract deep semantic contexts, offering a more robust solution for dynamic cryptocurrency environments compared to early single-modal baselines.

2.2. Applications of Generative Adversarial Networks and Anomaly Distribution Modeling in Financial Security

Generative modeling techniques have attracted increasing attention in the domain of financial security [37]. In particular, generative adversarial networks (GAN) [38] and variational autoencoders (VAE) have demonstrated substantial potential in fraud sample synthesis and anomaly distribution learning, offering solutions to data scarcity and distribution imbalance challenges [39]. In financial applications, GANs have been employed to synthesize high-fidelity fraudulent transactions, improving class balance and enhancing detection robustness [40]. However, GAN training is notoriously unstable, and evaluating the authenticity of generated samples remains challenging, especially under high-dimensional multimodal financial data where mode collapse may occur [41]. In contrast, VAEs model latent data distributions through probabilistic graph structures. In cryptocurrency markets, the combination of GAN and VAE enables a synergistic “generation–modeling” mechanism: GANs address fraudulent sample scarcity through synthetic augmentation, while VAEs capture multimodal latent distribution patterns for precise anomaly characterization [42]. Despite their theoretical complementarity, several challenges remain, such as instability in high-dimensional training, multimodal synchronization issues, and limited interpretability of generated samples [43].

2.3. Multimodal Behavior Modeling and Detection for Crypto-Assets

Data structures in cryptocurrency markets exhibit strong multimodal characteristics [44]. On-chain transaction data can be represented as graph structures, where nodes correspond to wallet addresses and edges represent fund transfers [45]. Meanwhile, off-chain market variables—such as prices, trading volume, and volatility—form continuous time series [46]. Modeling the relationship between these two modalities is essential for precise anomaly detection.
Through multi-layer propagation, graph convolutional networks (GCNs) capture structural dependencies and community-level behaviors, supporting the detection of money-laundering networks. Market time series are represented as { y t } t = 1 T , and their dynamics can be modeled using autoregressive models, long short-term memory (LSTM), or Transformer architectures. Although capable of modeling long-term dependencies, LSTMs suffer from vanishing gradients under high-frequency data [47]. Transformers overcome this issue by employing self-attention mechanisms. This mechanism allows global dependency modeling and is particularly sensitive to abrupt anomalous patterns. Recent trends have shifted toward multimodal fusion. By aligning on-chain graph embeddings and time-series representations in a shared latent space, a unified multimodal representation can be constructed to facilitate cross-domain information interaction [48]. However, most existing studies focus primarily on offline detection, lacking mechanisms for real-time processing and generative augmentation. In high-frequency trading environments, models must simultaneously support data augmentation, dynamic modeling, and real-time alerting, imposing significant architectural challenges [49].

3. Materials and Method

3.1. Data Collection

The dataset used in this study is composed of on-chain transaction records, market price time-series data, and labels derived from publicly documented abnormal events, aiming to capture multidimensional information ranging from structural behaviors to market dynamics within cryptocurrency trading ecosystems, as shown in Table 1. The on-chain data originate from two major public blockchains, Bitcoin and Ethereum, and are continuously collected through full-node infrastructures (Bitcoin Core and Geth) using block-level data parsing tools. Raw blocks are gathered from 2017 to the first quarter of 2024, containing address creation events, transaction initiations, input and output amounts, transaction fees, block heights, timestamps, and structural relationships among transactions. Block data are indexed by block height and scanned at intervals of 10 min for Bitcoin and 12 s for Ethereum. Each transaction is parsed into a structured format, and node activity, connectivity, fund flow paths, and interaction patterns within the transaction graph are recorded to support the construction of high–temporal-resolution behavioral networks.
Market price time-series data are retrieved from public APIs provided by major exchanges including Binance, Coinbase Pro, and OKX. These data cover minute-level K-line records since 2018, including open, close, high, and low prices, trading volume, order-book depth, and volatility indices. To maintain temporal consistency, all price data are aligned to UTC time (coordinated universal time) and matched against on-chain timestamps for cross-modal synchronization. As certain exchanges may exhibit data gaps under extreme market conditions, historical sequences are validated on a daily basis and cross-referenced with third-party sources such as CryptoCompare to ensure continuity and authenticity of volatility patterns.
Event labels are compiled from multiple authoritative sources, including AML case libraries released by Elliptic and Chainalysis, reported fraud incidents from news outlets, high-risk address lists designated by FATF (financial action task force), and community-maintained blacklist repositories such as BitcoinAbuse. The labeling scope spans representative fraudulent activities from 2017 to 2024, including money-laundering chains, Ponzi schemes, post-exchange-attack fund flows, and cross-address hopping behaviors. Each label is associated at the address or transaction level and is validated via multi-source cross-verification to ensure annotation reliability and confidence.

3.2. Data Preprocessing and Augmentation

In cryptocurrency anomaly detection, data quality plays a decisive role in determining the performance and generalization capability of the detection model. Given the large volume, complex structure, and high noise level of blockchain data, systematic preprocessing and augmentation are required prior to model training to ensure statistical consistency and structural fidelity of the inputs. The preprocessing pipeline includes data cleaning, feature construction, and data augmentation, with the objective of producing a high-quality and structured multimodal dataset that supports both the generative and detection modules. Data cleaning primarily addresses redundancy and noise in blockchain transaction records. Due to the distributed nature of blockchain systems, duplicated transactions, invalid addresses, and missing fields frequently appear. Let the raw dataset be denoted as D raw = { x i , y i , t i } i = 1 N , where x i denotes the transaction feature vector, y i the label, and  t i the timestamp. Samples with a missing ratio exceeding a threshold α are removed. For each sample, the missing ratio is defined as
r i = count ( missing ( x i ) ) dim ( x i ) ,
and samples with r i > α are eliminated, producing the cleaned dataset D clean . Transactions sharing identical hashes or block heights are treated as duplicates and removed through D clean unique ( D clean ) .
Feature construction transforms heterogeneous on-chain structures and off-chain price sequences into vectorized representations. The on-chain transaction network is modeled as a directed graph G = ( V , E , X ) , where V is the set of wallet nodes, E the transaction edges, and  X R | V | × d the node feature matrix. For any node v i , the degree feature is expressed as
d i = j = 1 | V | A i j ,
where A i j = 1 if a transaction exists between v i and v j . Local centrality or weighted degree C i is computed as
C i = j N ( i ) w i j
where w i j denotes transaction amount or interaction frequency. Price and volume data form a temporal sequence { p t , v t } t = 1 T . Short- and long-term dynamics are captured through the moving average
M A t = 1 L k = 0 L 1 p t k ,
and volatility
σ t = 1 L 1 k = 0 L 1 ( p t k M A t ) 2 .
The smoothed return is given by
r t = β 1 r t 1 + β 2 r t 2 + ϵ t .
Graph features x i ( g ) and temporal features x i ( t ) form the multimodal representation x i = [ x i ( g ) , x i ( t ) ] .
To mitigate the extreme imbalance between normal and fraudulent transactions, a generative adversarial network (GAN) is employed to synthesize high-fidelity fraudulent samples. The generator G learns the fraudulent distribution p fraud ( x ) and maps noise z p z ( z ) to synthetic samples x ^ = G ( z ) , while the discriminator outputs the probability D ( x ) [ 0 , 1 ] that the input is real. The adversarial objective is
min G max D L GAN ( D , G ) = E x p fraud ( x ) [ log D ( x ) ] + E z p z ( z ) [ log ( 1 D ( G ( z ) ) ) ] .
A feature-preservation loss is added:
L feat = E x , x ^ [ f ( x ) f ( x ^ ) 2 2 ] ,
and the final generator objective is
min G L G = L GAN + λ L feat .
To ensure separability between normal and fraudulent latent distributions, the mean vectors μ norm and μ fraud are constrained by
μ norm μ fraud 2 γ .
Through these steps, the augmented dataset D aug = D clean D ^ fraud is constructed, where D ^ fraud contains GAN-generated fraudulent transactions. Empirical results demonstrate that this preprocessing and augmentation pipeline substantially improves recall while maintaining low false-positive rates, forming a robust foundation for subsequent anomaly modeling and real-time detection.

3.3. Proposed Method

3.3.1. Overall

The proposed method functions as an integrated end-to-end detection pipeline comprising four key stages: multimodal feature encoding, generative augmentation, latent distribution modeling, and dual-branch decision fusion. Initially, on-chain transaction graph features and market price time-series are mapped into a unified vector space. To address data imbalance, these vectors feed into a generative adversarial module that synthesizes high-fidelity pseudo-fraudulent samples, balancing the training process. Simultaneously, a shared encoder-decoder architecture compresses samples into a low-dimensional latent space, where anomalies are quantified based on reconstruction residuals and density distribution. Subsequently, the latent representations are processed by a dual-branch detection mechanism. A Transformer branch utilizes self-attention to model long-range temporal dependencies and cross-modal interactions, identifying deviations from expected evolutionary patterns. In parallel, an online clustering branch dynamically partitions transactions to detect structural deviations from normal behavior modes. Finally, an adaptive risk fusion layer integrates the outputs from the generative, variational, temporal, and clustering modules. By applying learnable weights to these distinct anomaly signals, the system computes a unified risk score, enabling precise, real-time surveillance of cryptocurrency anomalies. Algorithm 1 details the adaptive multi-branch risk fusion process, which dynamically integrates diverse anomaly indicators to enhance detection robustness and accuracy.
Algorithm 1 Adaptive Multi-branch Risk Fusion Process
Require: 
Input transaction sequence X t , Trained modules (Multi-domain M M D , Transformer M T r a n s , Clustering M C l u s t )
Require: 
Learnable Fusion Weights Network W ϕ ( · )
Ensure: 
Final Anomaly Score S f i n a l
1:
{— Step 1: Branch Inference —}
2:
Compute Multi-domain Anomaly Score (Time + Freq):
3:
z t t i m e , z t f r e q Encode ( X t )
4:
S M D M M D ( z t t i m e , z t f r e q ) {Via Equation (17)}
5:
Compute Temporal Deviation Score via Transformer:
6:
S T r a n s M T r a n s ( X t ) {Prediction Error}
7:
Compute Structural Deviation Score via Clustering:
8:
k * arg max P ( k | X t )
9:
S C l u s t | | e t μ k * | | 2 2 {Via Equation (20)}
10:
{— Step 2: Adaptive Weighting —}
11:
Construct Feature Context vector: C t = [ S M D S T r a n s S C l u s t ]
12:
Calculate dynamic weights via gating network:
13:
w = Softmax ( W ϕ ( C t ) ) = [ w m d , w t r a n s , w c l u s t ]
14:
{— Step 3: Final Fusion —}
15:
Calculate weighted risk score:
16:
S f i n a l = w m d · S M D + w t r a n s · S T r a n s + w c l u s t · S C l u s t
17:
if  S f i n a l > Threshold τ then
18:
    return Anomaly
19:
else
20:
    return Normal
21:
end if

3.3.2. GAN-Based Fraudulent Transaction Generator

The proposed GAN-based fraudulent transaction generator is designed to produce fraudulent transaction samples with structural consistency, controllable statistical properties, and semantically coherent behavior, thereby alleviating the severe scarcity of real anomalous transactions.
As shown in Figure 1, the generator takes as input multimodally encoded transaction vectors coupled with class-conditional labels, where the transaction vector incorporates both structural properties of the on-chain transaction graph and temporal features reflecting price dynamics. To ensure that the generated samples adequately cover realistic fraudulent patterns, a conditional generative adversarial network is constructed. The generator G receives a noise vector z N ( 0 , I ) and a class label y, producing a synthesized sample x ^ = G ( z , y ) via a nonlinear mapping. The discriminator D takes real samples x and generated samples x ^ as inputs and outputs the probabilities D ( x ) and D ( x ^ ) , respectively. The adversarial objective follows
min G max D E x P real [ log D ( x ) ] + E z N [ log ( 1 D ( G ( z , y ) ) ) ] .
However, fraudulent behaviors in cryptocurrency markets exhibit explicit structural and statistical signatures—such as node degrees within transaction graphs, multi-hop fund transfer paths, and inter-transaction time-interval distributions—making traditional GAN losses insufficient for enforcing behavioral consistency. To address this, a feature-consistency constraint is introduced. A mapping function F ( · ) extracts topological, statistical, and cross-modal behavioral features. By minimizing
L cons = F ( G ( z , y ) ) F ( x fraud ) 2 2 ,
the generator is encouraged to preserve semantic structure within the high-dimensional behavior space. Consequently, the generator learns not only surface-level distributions but also deeper structural patterns associated with anomalous behaviors.
In terms of architecture, the generator comprises four fully connected layers with widths 256, 512, 512, and d, where d denotes the dimensionality of the multimodal transaction vector. Each layer employs LeakyReLU activation and layer normalization to enhance training stability. The conditional vector y is concatenated with z at the input layer, enabling the generator to capture class-specific variations. A noise-injection mechanism is added to maintain local stochasticity without disrupting overall behavioral patterns. The discriminator consists of three fully connected layers of widths 512, 256, and 1, with a sigmoid output and spectral normalization to improve convergence stability. Mathematically, the generator approximates the conditional distribution p ( x y = fraud ) , thereby expanding the effective sample support for the minority class. This enables the downstream detector to observe a broader spectrum of anomalous behaviors during training, ultimately improving recall. With the feature-consistency constraint, the generator’s objective becomes
min G L adv + λ L cons ,
where λ is a balancing coefficient. This formulation merges adversarial distribution matching with feature-space structural preservation, enabling the generator to perform interpolation along the latent behavioral manifold and generate semantically realistic fraudulent samples. Algorithm 2 outlines the training procedure of the proposed feature-consistent GAN.
Algorithm 2 Training Procedure of Feature-Consistent GAN
Require: 
Dataset D r e a l = { x i , y i } , Fraudulent subset D f r a u d D r e a l , Noise prior p z
Require: 
Generator G, Discriminator D, Feature Extractor F ( · )
Require: 
Hyperparameters: Learning rate η , Batch size m, Balance coefficient λ , Iterations K
Ensure: 
Trained Generator G *
1:
Initialize parameters θ G and θ D
2:
for number of training iterations do
3:
   {— Train Discriminator —}
4:
   for k steps do
5:
      Sample minibatch of m noise samples { z ( 1 ) , , z ( m ) } from p z
6:
      Sample minibatch of m real samples { x ( 1 ) , , x ( m ) } from D r e a l
7:
      Generate fake samples: x ^ ( i ) = G ( z ( i ) , y f r a u d )
8:
      Update Discriminator by maximizing:
9:
       L D = 1 m i = 1 m [ log D ( x ( i ) ) + log ( 1 D ( x ^ ( i ) ) ) ]
10:
      θ D θ D + η θ D L D
11:
   end for
12:
   {— Train Generator —}
13:
   Sample minibatch of m noise samples { z ( 1 ) , , z ( m ) } from p z
14:
   Sample minibatch of m reference fraud samples { x f ( 1 ) , , x f ( m ) } from D f r a u d
15:
   Generate fake samples: x ^ ( i ) = G ( z ( i ) , y f r a u d )
16:
   Compute Adversarial Loss: L a d v = 1 m i = 1 m log ( 1 D ( x ^ ( i ) ) )
17:
   Compute Feature Consistency Loss:
18:
    L c o n s = 1 m i = 1 m | | F ( x ^ ( i ) ) F ( x f ( i ) ) | | 2 2
19:
   Update Generator by minimizing total loss:
20:
    L G = L a d v + λ L c o n s
21:
    θ G θ G η θ G L G
22:
end for
23:
return  G *
The module simultaneously supports adversarial sample generation and adversarial detector training. By continuously updating the network parameters, the generator enhances the behavioral richness and adversarial hardness of synthesized samples while preserving structural consistency, enabling the detector to learn a more comprehensive boundary for anomalous behaviors. Ultimately, generated and real fraudulent samples are merged into an augmented training set, alleviating class imbalance and improving robustness under adversarial or distribution-shifting scenarios. This design is particularly suited for cryptocurrency environments, where fraudulent behaviors exhibit structural sparsity and high variability, thereby enabling accurate detection of latent anomalies in high-dimensional multimodal distributions and improving the stability and generalization of the overall detection system.

3.3.3. Multi-Domain Time Series Detection

The multi-domain time series detection module is constructed upon a dual-channel architecture operating in both the time and frequency domains. Through parallel modeling of long-range temporal dependencies and frequency-pattern variations in on-chain trading behaviors, the module enables joint characterization of anomalous transaction signals across multiple scales and semantic spaces. As shown in Figure 2, the module consists of a long-horizon time-domain branch and a frequency-domain branch, both processing the multimodally fused transaction sequence x t t = 1 T , where x t R d contains structural embeddings of the transaction graph and market-price embeddings. The time-domain branch adopts a four-layer 1D convolution–pooling architecture, where convolution kernel sizes are k = { 5 , 3 , 3 , 3 } and channel dimensions are { 32 , 64 , 64 , 128 } . The input shape ( T , d ) is transformed through successive convolutions into ( T , 32 ) , ( T 2 , 64 ) , ( T 4 , 64 ) , and ( T 4 , 128 ) , each followed by ReLU activation and max-pooling. This configuration allows the network to capture local temporal variations in transaction behavior. The final representation h t is passed to a two-layer fully connected network to produce the time-domain anomaly estimate z t ( t i m e ) .
The frequency-domain branch first applies a fast Fourier transform to the input sequence, yielding a spectral representation X ( f ) with f [ 1 , T ] . The FFT output retains the same dimensionality as the input. The spectrum is then processed by a three-layer multilayer perceptron (MLP) with hidden dimensions { 128 , 64 , 32 } and GELU activation, enabling the model to identify periodic irregularities and high-frequency perturbations induced by fraudulent behaviors. The resulting frequency-domain representation is denoted as z t ( f r e q ) . To achieve robust cross-domain fusion, we propose a learnable multi-domain mapping function that operates as an adaptive gating mechanism. Unlike static fusion strategies (e.g., summation or concatenation) that treat diverse modalities equally, our approach dynamically recalibrates the importance of time-domain and frequency-domain features for each timestamp. The fused latent representation z t is computed as a convex combination of the domain-specific embeddings:
z t = α z t ( t i m e ) + ( 1 α ) z t ( f r e q ) ,
where the gating coefficient α ( 0 , 1 ) serves as a soft attention weight. This coefficient is derived from the current multimodal context via a learnable projection vector w:
α = σ w z t ( t i m e ) | | z t ( f r e q ) ,
Here, | | denotes the concatenation operation, and σ ( · ) is the sigmoid activation function. This mechanism allows the model to autonomously determine whether to prioritize explicit temporal deviations (captured by z t ( t i m e ) ) or spectral irregularities (captured by z t ( f r e q ) ). For instance, during high-frequency bot attacks, the model may assign a lower α to emphasize frequency patterns, whereas during sudden market crashes, a higher α allows the model to focus on temporal trend shifts. From an optimization perspective, the gating coefficient α plays a pivotal role in regulating the gradient flow during backpropagation. Specifically, the gradients propagated to the time-domain and frequency-domain encoders are scaled by α and ( 1 α ) , respectively. Since the sigmoid function ensures that α remains strictly within the open interval ( 0 , 1 ) , this mechanism guarantees that neither modality’s gradient vanishes entirely, thereby preventing the “modal collapse” phenomenon where the model over-relies on a single dominant feature set while ignoring the complementary view. Furthermore, formulating the fusion as a convex combination implies that the resultant latent vector z t is geometrically constrained to lie within the convex hull of the two domain-specific embeddings. In the context of financial anomaly detection, where data non-stationarity often leads to scale shifts and feature divergence, this convexity constraint acts as a vital regularization term. It prevents the fused representation from drifting into undefined regions of the hyperspace (extrapolation risk), ensuring that the fused manifold remains compact and numerically stable, which is essential for the subsequent distance-based clustering and density estimation tasks. This adaptive fusion guarantees that the fused representation satisfies
z t Conv z t ( t i m e ) , z t ( f r e q ) ,
meaning that z t remains within the convex hull of the two domain-specific representations, thereby ensuring stable anomaly estimation without departing from their learned behavioral manifolds. Anomaly scoring is jointly derived from reconstruction in the time domain and density estimation in the latent space. A reconstruction function R ( · ) is defined, with reconstruction loss
L rec = x t R ( z t ) 1 ,
which measures the reproducibility of the sample in the multi-domain space. Simultaneously, a kernel density estimator (KDE) is used to compute the latent density p ( z t ) , and the anomaly score is given by
S t = β L rec + ( 1 β ) log p ( z t ) ,
where − log p ( z t ) evaluates the rarity of the sample in the latent distribution.
The module is naturally coupled with the GAN-based generation component. The diversity of generated fraudulent samples expands the training support for multi-domain modeling, improving KDE accuracy. The time-domain branch captures abrupt behavioral transitions in generated samples, whereas the frequency-domain branch identifies periodic perturbations embedded in fraudulent patterns. Theoretically, if the generator reduces the Jensen–Shannon divergence between the real fraudulent distribution q ( x ) and the generated distribution q ^ ( x ) , then the variance of the estimated density satisfies
Var [ p ( z ) ] more stable anomaly estimates ,
which empirically results in higher recall and lower false-positive rates. By combining temporal and frequency cues, this multi-domain detection structure accurately identifies anomalous behaviors arising from dynamic shifts and spectral disturbances, yielding strong robustness and real-time responsiveness in cryptocurrency environments.

3.3.4. Transformer-Clustering Detection Module

The Transformer-clustering detection module is designed to model the dynamic evolution of cryptocurrency transactions within a multimodal feature space while identifying deviations from normal cluster structures in high dimensions. The module takes as input the transaction sequence x 1 , x 2 , , x N from the multi-domain representation, where each x i R d contains fused structural and market-state embeddings. As shown in Figure 3, the input is first projected through a three-layer MLPdata with hidden widths { 256 , 256 , 128 } and GELU activation, yielding a 128-dimensional sequential representation. Simultaneously, a 32-dimensional global gating vector is produced by MLPρ and concatenated with the 128-dimensional representation, forming a Transformer input matrix of shape ( N , 160 ) .
The Transformer block employs a two-layer standard encoder architecture, each layer containing eight-headed self-attention (hidden size = 160, head dimension = 20) and a feed-forward network (FFN) expanded to 512 dimensions with GELU activation. This structure enables global interaction across transaction samples. The Transformer outputs a sequence of contextual embeddings e 1 , , e N , augmented with an additional global token e N + 1 that represents overall sequence-level information for subsequent clustering inference. The clustering branch takes Transformer outputs and processes them through a three-layer MLP (layer widths { 160 , 128 , 64 } ), producing soft cluster assignment probabilities P ( z i X , k ) for each sample, with k indexing the clusters. To achieve adaptive cluster selection, another MLPk takes e N + 1 as input and outputs P ( k X ) . The core probabilistic model is expressed as
P ( z i X ) = k = 1 K P ( z i X , k ) P ( k X ) ,
thereby supporting dynamically adjustable cluster structures under varying market conditions.
To quantify the deviation of a transaction in the cluster space, a Transformer-contextual cluster expectation μ k = E [ e i z i = k ] is defined, and the anomaly score is computed as
S i = e i k = 1 K P ( z i X , k ) μ k 2 2 .
Since the Transformer captures long-range dependencies and the clustering module models local structure of the behavioral manifold, their combination yields a unified characterization of global patterns and local density variations. Theoretically, if the cluster centers { μ k } form the least-squares approximation of the sample distribution in the Transformer space, then
i = 1 N S i 2 = min { μ k } i = 1 N e i μ z i 2 2 ,
indicating that the clustering-based distance metric provides the optimal local representation fit under this formulation. The Transformer-clustering architecture is complementary to the multi-domain time series detection module. While the multi-domain model captures temporal and spectral characteristics, the Transformer models high-dimensional contextual semantics, and the clustering module captures deviations from local manifold structures. If the multi-domain representation is h t and the Transformer input is g t = W h t , an invertible W ensures the distance-preservation property
W h i W h j 2 h i h j 2 ,
maintaining the integrity of time–frequency features during contextual modeling and strengthening anomaly detection stability.
This module is particularly suitable for cryptocurrency environments, where fraudulent transactions often exhibit (1) implicit coordinated behaviors across multiple transactions—captured by the Transformer—and (2) distinct local cluster structures—captured by the clustering component. Their combination provides a unified global–local representation of anomalous behavior, enhancing robustness and generalization in real-world anomaly detection scenarios.

4. Results and Discussion

4.1. Experiments Details

4.1.1. Evaluation Metrics

A comprehensive evaluation of the model performance is conducted using a diverse set of metrics to assess both the anomaly detection capabilities and the quality of the generative data augmentation. To measure the effectiveness of the detection framework, we employ Precision, Recall, the balanced F1-score, the area under the curve (AUC), the false positive rate (FPR), and detection latency. These metrics collectively reflect the accuracy, robustness, and real-time responsiveness of the model in the cryptocurrency anomaly detection task. The mathematical definitions for the detection metrics are provided as follows:
Precision = T P T P + F P ,
Recall = T P T P + F N ,
F 1 = 2 × Precision × Recall Precision + Recall ,
FPR = F P F P + T N ,
AUC = 0 1 T P R ( F P R ) d ( F P R ) ,
Detection Latency = 1 N i = 1 N ( t i d e t e c t t i t r u e ) .
In these definitions, TP denotes the number of anomalous samples correctly identified as anomalous, while FP represents normal samples incorrectly classified as anomalous. The term FN indicates anomalous samples that are not detected, and TN represents normal samples correctly recognized. The quantity TPR denotes the true positive rate, t i d e t e c t is the detected time of the i-th anomalous event, and t i t r u e is its actual occurrence time, with N representing the total number of anomalous events.
To further evaluate the quality and diversity of the fraudulent samples synthesized by the GAN module, we utilize the Inception Score (IS) and Fréchet Inception Distance (FID). The Inception Score is defined as:
IS = exp E x p g e n [ D K L ( p ( y | x ) | | p ( y ) ) ] ,
where p g e n represents the distribution of generated samples, D K L is the Kullback-Leibler divergence, and p ( y | x ) is the conditional class distribution predicted by a pre-trained classifier. The Fréchet Inception Distance is calculated by:
FID = | | μ r μ g | | 2 2 + Tr ( Σ r + Σ g 2 ( Σ r Σ g ) 1 / 2 ) ,
where ( μ r , Σ r ) and ( μ g , Σ g ) denote the mean vectors and covariance matrices of the feature representations for real and generated fraudulent samples, respectively.
The selection of IS and FID as evaluation metrics is critical for validating the proposed generative augmentation strategy. The Inception Score is employed to measure both the clarity and diversity of the generated transactions; a higher IS indicates that the generator produces samples that can be confidently classified as fraudulent while covering a diverse range of fraud patterns, thereby preventing mode collapse. Complementarily, the Fréchet Inception Distance quantifies the distributional discrepancy between real and synthetic fraudulent transactions in the high-dimensional feature space. A lower FID signifies that the generated samples possess statistical and structural properties, such as transaction graph topology and temporal volatility, that closely match real-world anomalies. This ensures that the downstream detector is trained on data that accurately reflects the manifold of true cryptocurrency fraud rather than unrealistic noise.

4.1.2. Experimental Settings

The experimental environment consisted of a server equipped with dual Intel Xeon Gold 6348 CPUs and four NVIDIA A100 GPUs, each providing 80 GB of HBM2e memory to accelerate deep neural network training and inference. Regarding the software environment, Ubuntu 22.04 LTS served as the operating system, and the deep learning framework PyTorch 2.1 was used in combination with CUDA 12.2 and cuDNN 8.9 to fully exploit GPU acceleration. Regarding hyperparameter settings, the dataset was partitioned into training, validation, and testing subsets following a 70 % / 15 % / 15 % split to maintain balanced and representative evaluation. In the GAN module, the learning rates of the generator and discriminator were set to α G = 1 × 10 4 and α D = 4 × 10 4 , respectively, with a batch size of 64. The Adam optimizer was applied with momentum parameters β 1 = 0.9 and β 2 = 0.999 .

4.1.3. Baseline Methods

In the experimental design of this study, three categories of representative models were selected as baseline methods, including classical machine learning models SVM [50] and Random Forest [51], single-modal deep models LSTM [52], GCN [53], and Ensemble-GNN [54], as well as the existing multimodal cryptocurrency anomaly detection model MDST-GNN [55].

4.2. Comparison with Baseline Methods

This experiment aims to systematically evaluate the effectiveness of generative data augmentation in cryptocurrency anomaly detection, particularly under conditions where fraudulent transactions are extremely scarce. The objective of this experiment is to determine whether adversarial generation can alleviate the issue of class imbalance, expand the support of rare abnormal patterns, and enhance the detector’s ability to recognize complex fraudulent behaviors. To ensure a fair and comprehensive comparison, the proposed framework is evaluated against multiple baselines. All methods are trained under identical data splits, random seeds, and optimization settings, and performance is reported using Precision, Recall, F1-score, AUC, and FPR.
The results are presented in Table 2 and the ROC curves in Figure 4. Classical learning models exhibit limited detection capabilities; SVM suffers from low recall due to its inability to capture the nonlinear structure of high-dimensional multimodal features, while Random Forest achieves moderate precision but lacks the capacity to model temporal or structural dependencies, yielding an AUC of only 0.742. However, these lightweight models demonstrate the lowest inference latencies (5.2 ms and 6.1 ms) due to their low computational complexity. Deep models show improved detection performance but at the cost of increased computation time: LSTM benefits from sequence modeling to outperform traditional models in Recall, and GCN effectively captures structural properties, with respective latencies rising to 12.4 ms and 10.7 ms. The multimodal MDST-GNN model achieves an AUC of 0.812 by jointly representing graph structure and market dynamics, which pushes the latency to 15.9 ms. Notably, the state-of-the-art Ensemble-GNN demonstrates strong competitiveness, achieving an F1-score of 0.661 and an AUC of 0.842. This validates the effectiveness of integrating diverse graph architectures (GCN, GAT, GIN) to capture complex topological patterns. However, this performance gain comes with a significant computational penalty; the ensemble voting mechanism across multiple subnetworks results in the highest latency of 22.8 ms, potentially hindering its deployment in high-frequency trading environments. In contrast, our proposed method achieves the highest detection performance with a Recall of 0.703 and an AUC of 0.889. While the dual-branch architecture and adaptive fusion mechanism incur a latency of 19.2 ms, this is significantly more efficient than the heavy Ensemble-GNN baseline. Crucially, this latency remains well within the millisecond-level requirement for real-time financial risk control. The results confirm that our framework strikes the optimal balance between security coverage and response speed, delivering superior accuracy without the prohibitive computational cost of ensemble approaches.

4.3. Sample Generation and Data Augmentation Analysis

The objective of this experiment is to evaluate the effectiveness of generative data augmentation in cryptocurrency anomaly detection, with particular focus on whether the issue of class imbalance—caused by the scarcity of fraudulent samples—can be alleviated through the introduction of high-fidelity synthetic data. The experiment begins with a baseline GAN and progressively incorporates feature-consistency constraints, multi-domain joint training, and the final optimized design proposed in this study. The influence of these enhancements is assessed using both generative quality metrics (FID, IS) and downstream detection metrics (recall, F1-score).
As shown in Table 3, the progressive enhancement of the generative model results in a monotonic decrease in FID and a continuous increase in IS, indicating that the fidelity and diversity of synthetic samples are notably improved. In terms of detection performance, both Recall and F1-score are significantly lower when no GAN is applied, whereas introducing a basic GAN yields considerable gains, demonstrating that synthetic samples effectively supplement the limited fraudulent data. After incorporating feature-consistency constraints, additional improvements are observed due to better alignment of contextual and structural properties between synthetic and real fraudulent behaviors. When multi-domain joint training is introduced, the GAN becomes capable of learning cross-modal behavioral patterns, producing synthetic samples that more naturally reflect transaction structures, price dynamics, and deviation patterns. The final optimized GAN-enhanced model achieves the best overall performance, suggesting that the generated samples closely approximate real fraudulent patterns and substantially strengthen detector learning.
From a theoretical standpoint, a basic GAN expands the support of the minority fraudulent class by approximating its marginal distribution through adversarial learning, inevitably improving Recall. However, purely vector-level or surface-level generation fails to preserve deeper structural properties of fraudulent transactions, leading to pronounced discrepancies in high-dimensional manifolds and higher FID (fréchet inception distance) and IS (inception score) values. Introducing feature-consistency constraints requires the generator to match not only statistical appearance but also graph-level structures, interaction patterns, and temporal volatility signatures, thereby aligning synthetic samples with real fraudulent semantics and enabling clearer decision boundaries for the detector. Multi-domain joint training further enhances this effect by explicitly injecting complementary correlations across temporal, structural, and frequency domains, enabling the generator to cover a wider range of fraudulent modes. The final GAN design stabilizes training dynamics, feature mapping, and cross-modal representations, allowing the synthetic data distribution to closely approximate the true anomaly manifold and thus achieving optimal Recall and F1-score.

4.4. Anomaly Distribution Modeling Analysis

This experiment aims to assess the impact of different anomaly distribution modeling techniques on detecting anomalous cryptocurrency transactions, with emphasis on their ability to characterize abnormal behaviors in high-dimensional complex data. The evaluation includes traditional one-class classification methods, reconstruction-based deep generative models, and enhanced variational methods that incorporate synthetic data. The core objective is to determine whether the proposed multi-domain VAE can more accurately learn the latent distribution of normal behaviors and produce more stable anomaly scores.
As shown in Table 4, One-Class SVM produces the weakest results across all metrics, particularly Recall, demonstrating its inability to effectively capture the true manifold of normal transactions in high-dimensional nonlinear spaces. The AE improves upon SVM due to reconstruction-based learning, yet its latent representation lacks distributional constraints, leading to unstable anomaly boundaries. The VAE achieves notable improvements in recall, AUC, and FPR by modeling latent distributions through learnable mean and variance parameters, increasing sensitivity to rare anomalous deviations. When enhanced with GAN-generated fraudulent samples, the VAE exhibits further performance gains due to expanded coverage of minority-class regions. The proposed multi-domain VAE achieves the best performance across all metrics, significantly outperforming all alternatives.

4.5. Parameter Sensitivity Analysis

To systematically evaluate the robustness of the proposed framework and determine the optimal hyperparameter configuration, we conducted a sensitivity analysis targeting three critical components: the feature-consistency weight λ in the GAN objective (Equation (13)), the bandwidth h of the Kernel Density Estimator (KDE), and the fusion balancing coefficient β (Equation (16)). The experiments were performed by varying one parameter within a specified range while keeping the others fixed at their default settings, using the F1-score as the primary evaluation metric. For the GAN regularizer λ , we explored values in the range { 0.1 , 1 , 5 , 10 , 20 , 50 } to assess the trade-off between adversarial deception and feature matching. For the KDE bandwidth h, which controls the smoothness of the latent density estimation, the range was set from 0.1 to 2.0 . Finally, the weighting parameter β , which balances reconstruction error against probabilistic rarity, was tested from 0.0 to 1.0 with a step size of 0.2 .
The quantitative results presented in Table 5 reveal distinct performance patterns driven by the underlying theoretical properties of each parameter. First, regarding λ , performance peaks at λ = 10 ; lower values ( λ < 1 ) fail to enforce sufficient structural constraints, leading to invalid graph topologies, while excessive regularization ( λ > 20 ) causes the generator to over-fit statistical moments, reducing sample diversity. Second, the KDE bandwidth exhibits a classic bias-variance trade-off: a narrow bandwidth ( h = 0.1 ) overfits to noise (peaked distribution), whereas a wide bandwidth ( h = 2.0 ) over-smooths the density, masking true anomalies. The optimal h = 1.0 effectively captures the manifold geometry. Lastly, the fusion weight β achieves its maximum at 0.6 , indicating that while structural reconstruction error is the dominant indicator of fraud, incorporating latent probability density (the 1 β term) provides critical complementary information about sample rarity. Relying solely on either reconstruction ( β = 1.0 ) or density ( β = 0.0 ) results in suboptimal detection, confirming the necessity of the dual-metric scoring mechanism.

4.6. Multimodal Fusion and Real-Time Detection Performance

The objective of this experiment is to evaluate the effectiveness of multimodal fusion and the dual-branch detection architecture for identifying anomalous behaviors in cryptocurrency transactions, with emphasis on the complementary strengths of Transformer-based temporal modeling and clustering-based structural modeling in terms of both performance and real-time responsiveness. The experiment begins with single-branch models and progressively incorporates dual-branch structures and different fusion strategies to observe trends in detection performance, latency, and false positive rate (FPR).
As shown in Table 6, the Transformer-only model yields higher AUC and F1-score, but its long-range attention operations introduce substantial latency. The clustering-only branch achieves the lowest latency due to its lightweight distance-based computation, but its inability to capture long-term temporal dependencies results in weaker F1-score. When combining the two branches without fusion, slight performance gains are observed, but the absence of an integration mechanism limits the utilization of complementary information. Fixed-weight fusion further improves performance but lacks adaptability, making it unstable under varying market conditions. The proposed adaptive risk fusion mechanism achieves the best performance across all metrics, reducing FPR while increasing detection accuracy, indicating that dynamically adjusting the importance of each branch produces more reliable anomaly judgments.
As shown in Figure 5, the Transformer branch models long-range temporal dependencies, and its global attention mechanism captures cumulative temporal deviations characteristic of manipulation-related or long-horizon anomalous behaviors. However, its computational complexity grows quadratically with sequence length, leading to higher inference latency. The clustering branch models density-based deviations in latent space, detecting isolated structural outliers efficiently through distance-to-center measurements. It is more sensitive to short-term jumps or abrupt behavioral changes and excels in real-time responsiveness, yet cannot fully capture cross-domain or cross-temporal composite anomalies. The proposed adaptive fusion method is mathematically equivalent to learning a nonlinear risk mapping in semantic space, enabling the model to autonomously emphasize the more reliable modality—assigning greater weight to the Transformer during volatile intervals and enhancing clustering constraints in stable regions. As the fused representation aligns more closely with the joint distribution of high-dimensional anomaly patterns, improvements in AUC, F1-score, and FPR reflect stronger theoretical separability of anomalous behaviors, ensuring robust and real-time detection in practical trading environments.

4.7. Ablation Studies

This experiment aims to systematically validate the contributions of individual components within the overall framework, clarifying each module’s role, sources of performance improvement, and collaborative interactions. Key modules are removed one by one from the full model, and the resulting changes in precision, recall, F1-score, and AUC are analyzed to reveal differences in data distribution modeling, feature-space representation, and anomaly characterization.
As shown in Table 7, the full model achieves the highest performance across all metrics, demonstrating the complementary benefits of generative augmentation, distribution modeling, multimodal temporal reasoning, and risk fusion. Removing the GAN module significantly reduces recall, indicating its critical role in expanding abnormal pattern coverage. Excluding the multi-domain + VAE module weakens anomaly distribution modeling, resulting in notable degradation across all metrics. Removing the Transformer branch reduces the ability to identify long-term or manipulation-related anomalies, while removing the clustering branch weakens sensitivity to short-term deviations. Without the fusion mechanism, the model can no longer leverage complementary strengths of the two branches, resulting in inferior performance compared to the complete design.
The theoretical distinctions among these modules stem from their different modeling assumptions in high-dimensional behavior space, as shown in Figure 6. The GAN module expands the minority-class support region by approximating fraudulent marginal distributions, and its removal directly reduces the recall rate due to diminished boundary coverage. The multi-domain + VAE module constructs a smooth, continuous, high-density latent manifold for normal samples, while anomalies occupy low-density regions; removing this module disrupts the density-based discrimination mechanism, making it more difficult to distinguish anomalies. The Transformer branch provides long-horizon temporal modeling, and its removal eliminates sensitivity to gradual deviations or multi-step manipulations. The clustering branch specializes in detecting localized structural outliers, and its removal impairs the detection of abrupt behavioral shifts. The fusion mechanism mathematically enables a nonlinear combination of temporal and structural modalities, allowing dynamic reweighting based on reliability. Its removal breaks this adaptive balance, leading to systematic degradation of detection performance. These results confirm that the superior performance of the complete model arises from the coordinated interaction of multiple components across high-dimensional, multi-domain, and multi-scale representation spaces, and the absence of any single module disrupts this synergy, leading to consistent declines in detection accuracy.

4.8. Discussion

4.8.1. Convergence Diagnostics and Generative Stability Analysis

To ensure the reliability of the generative augmentation module, we conducted rigorous diagnostics on the training convergence and stability of the GAN architecture. The training dynamics were monitored by tracking the adversarial loss trajectories of both the generator and the discriminator. While initial epochs exhibited characteristic oscillations inherent to the min-max adversarial game, the losses eventually settled into a stable Nash equilibrium, indicating that the generator effectively learned to approximate the target distribution without divergence. Crucially, the feature-consistency loss demonstrated a monotonic decrease throughout the training process, confirming that the generator successfully internalized the structural constraints of the transaction graph and the statistical properties of the price sequences, rather than merely memorizing surface-level noise. Furthermore, we explicitly addressed the risk of mode collapse, a prevalent challenge in financial data synthesis where models may default to generating a single repetitive fraud pattern. Quantitative analysis using the Inception Score (IS) yielded consistently high values, indicating that the synthesized samples maintain significant diversity. Visual inspection of the latent space distribution via t-SNE projections further verified that the generated pseudo-fraudulent samples formed multiple distinct clusters. These clusters effectively covered the heterogeneous behavioral modes of real-world money laundering and market manipulation—such as varying subgraph topologies and temporal volatility signatures—rather than collapsing into a single trivial mode. Finally, potential discriminator overfitting was scrutinized to prevent the “over-optimization” trap, where the discriminator dominates the game by memorizing training examples, thereby vanishing the gradients for the generator. We continuously monitored the discriminator’s accuracy on a held-out validation set and observed that the performance gap between training and validation remained within a narrow bound. This generalization capability is attributed to the spectral normalization and noise injection mechanisms implemented in our architecture, which effectively regularized the network. Consequently, the discriminator provided meaningful and non-vanishing gradients throughout the training lifecycle, sustaining a healthy and robust adversarial learning signal.

4.8.2. Applicability in Real-World Cryptocurrency Scenarios

The real-time anomalous transaction detection framework proposed in this study demonstrates strong applicability and practical value across multiple representative scenarios in cryptocurrency markets. In centralized exchange risk control systems, platforms are required to identify suspicious behaviors within milliseconds, such as rapid inflows of numerous small transfers into a single target address, high-frequency arbitrage activities executed by coordinated bot clusters within narrow time windows, or price manipulation attempts conducted by repeatedly placing and canceling orders to influence market depth. Traditional threshold-based or offline analytical methods struggle to capture these patterns in time. By contrast, the proposed model, benefiting from multimodal feature fusion and temporal deviation modeling, is capable of detecting departures from normal behavioral patterns as soon as the transaction occurs, enabling exchanges to take immediate mitigation measures such as freezing accounts, suspending trading pairs, or initiating KYC (know-your-customer) verification.
In decentralized finance environments, smart contract platforms lack manual auditing mechanisms, and adversaries frequently launch complex attacks via flash loans. Such attacks typically involve multi-contract chained operations that manipulate liquidity pool prices, followed by rapid arbitrage or asset theft. These behaviors span multiple nodes in the transaction graph and leave only subtle anomalies in market price sequences. The proposed framework simultaneously analyzes both on-chain graph structures and temporal price dynamics, allowing the system to detect irregular fund flows that deviate from conventional paths or extreme short-term price shifts in the flash-loan attack chain, thereby enabling earlier activation of risk control responses.
In anti-money laundering monitoring tasks, illicit actors often rely on multi-hop transfers, structuring, and mixing techniques to obscure source identities and construct pseudo-normal behavioral patterns through cross-address transactions. The proposed generative adversarial augmentation module enables the simulation of diverse money-laundering trajectories during training, improving the model’s sensitivity to complex fund-flow patterns. Meanwhile, the clustering branch identifies behavioral clusters in latent space whose distance patterns deviate significantly from those of normal users; even when individual transactions appear benign, deviations emerge clearly at the level of behavioral sequences and transaction-network structures. Furthermore, in cross-chain surveillance tasks conducted by regulatory authorities, structural discrepancies across blockchains make certain anomalous patterns undetectable within a single chain. However, when cross-chain price trends and multi-chain transaction graphs are jointly modeled, abnormalities manifest in the form of cross-domain signatures. The proposed multi-domain joint modeling mechanism is designed precisely for such scenarios, extracting stable representations from temporal, structural, and frequency domains, causing cross-chain anomalies to exhibit stronger consistency in the latent space and providing regulators with more precise risk indicators.
Crucially, to bridge the gap between algorithmic detection and regulatory enforcement, the framework provides an interpretable decision-making process aligned with global Anti-Money Laundering (AML) and Counter-Terrorist Financing (CFT) standards. By analyzing the attention weights assigned to specific transaction subgraphs and temporal frequency bands, the model generates granular evidence explaining why a transaction is flagged. This “white-box” transparency allows compliance officers to trace the specific risk factors—such as sudden structural divergence or cyclical laundering patterns—facilitating the efficient filing of Suspicious Activity Reports (SARs). Consequently, the system supports the Risk-Based Approach (RBA) recommended by the Financial Action Task Force (FATF), ensuring that automated alerts are not only accurate but also auditable and legally actionable.

4.8.3. Module Synergy and Computational Efficiency Analysis

To further elucidate the internal logic of the framework, it is necessary to examine the holistic coordination among its generative, representational, and detection components. The architecture operates as a tightly coupled four-stage pipeline rather than a loose collection of models. At the foundational level, the GAN-based module functions primarily during the training phase, utilizing feature-consistency constraints to synthesize high-fidelity fraudulent samples; this effectively corrects the class imbalance in the hyperspace before any detection occurs, ensuring that subsequent modules are not biased toward the majority class. Following this, the Multi-domain VAE acts as the universal feature encoder, projecting heterogeneous on-chain graph structures and off-chain price dynamics into a unified latent manifold, thereby providing a standardized input representation and an initial anomaly score based on reconstruction probability.
The detection phase employs a multi-view strategy to avoid functional redundancy. The Multi-domain Time Series module utilizes Fast Fourier Transform and convolutional operations to extract explicit signal-level characteristics, such as periodic perturbations typical of automated bot activities. In parallel, the Transformer branch leverages self-attention mechanisms to model implicit semantic dependencies and long-term evolutionary trends, identifying logical inconsistencies in complex transaction chains. Complementing these temporal analyzers, the Online Clustering branch focuses on spatial density within the latent manifold, rapidly identifying structural outliers that deviate from local normality centers. These diverse signals—distributional, signal-based, semantic, and spatial—are finally integrated via the Adaptive Risk Fusion mechanism, which dynamically assigns weights based on the confidence of each branch, ensuring robust decision-making across varying market conditions.
Regarding computational efficiency, the system design optimizes the trade-off between high-dimensional modeling and real-time responsiveness. It is important to note that the computationally intensive GAN module is restricted to the offline training phase and imposes zero overhead during online inference. The VAE encoder and the Multi-domain module (operating at log-linear complexity via FFT) are lightweight and suitable for high-throughput stream processing. The detection latency is primarily dominated by the Transformer’s self-attention mechanism, which scales quadratically with sequence length; however, by employing a sliding window strategy with a fixed localized horizon, the effective input length remains bounded, ensuring deterministic processing times. The Online Clustering branch maintains near-linear complexity using efficient distance metrics. Empirical testing reveals that the total inference latency per transaction averages approximately 19 milliseconds. This performance significantly outperforms traditional offline batch-processing pipelines, which typically exhibit latencies ranging from seconds to minutes, and remains competitive with lightweight single-modal detectors, making it well-suited for pre-confirmation risk checks and real-time AML monitoring.

4.9. Limitation and Future Work

Although the proposed multimodal real-time anomaly detection framework demonstrates strong detection capability and robustness across both experimental evaluations and practical deployment scenarios, several limitations remain. The model relies on on-chain transaction structures, price sequences, and multiple external data sources. While these modalities enhance detection accuracy, the overall performance may degrade when some modalities are missing or delayed under extreme market conditions—particularly during severe network congestion or temporary outages in exchange data feeds. Future research will focus on improving scalability and cross-ecosystem adaptability. Moreover, incorporating self-supervised learning and causal modeling techniques may allow the system to autonomously identify anomaly-driving factors in partially or fully unlabeled settings, enhancing the ability to detect unknown risks and improving generalization and interpretability against emerging attack vectors. Furthermore, given the critical sensitivity of financial data, future iterations will specifically explore privacy-preserving computation methods, such as homomorphic encryption and zero-trust architectures. These mechanisms aim to enable anomaly detection on encrypted data without exposing raw sensitive information, thereby achieving a necessary balance between rigorous risk control and user data confidentiality.

5. Conclusions

The proposed study addresses the core challenges of anomalous transaction behaviors in cryptocurrency markets, including their high frequency, concealed attack patterns, and complex, nonstationary data distributions. A multimodal anomaly detection framework integrating generative augmentation, latent distribution learning, and dual-branch real-time detection is developed to jointly model on-chain transaction structures, price dynamics, and cross-modal behavioral deviations. In terms of architectural design, a generative adversarial module with feature-consistency constraints is introduced to mitigate class imbalance arising from the scarcity of fraudulent samples; a multi-domain variational distribution modeling approach is proposed to characterize the latent manifold of normal behaviors and to produce high-confidence anomaly scores; and a dual-branch detection structure combining a Transformer-based temporal model with an online clustering mechanism is employed to capture both long-range temporal dependencies and structural deviations. Experimental results demonstrate that the proposed framework achieves significant improvements across multiple key evaluation metrics. In multimodal fusion experiments, the adaptive fusion mechanism further improves the AUC to 0.884 and the F1-score to 0.713 , while reducing the false positive rate to 0.087 , highlighting the complementarity between temporal and structural features.

Author Contributions

Conceptualization, Y.G., P.H., Z.Z., J.Y. and M.L.; Data curation, P.L.; Formal analysis, Z.L.; Funding acquisition, M.L.; Investigation, Z.L.; Methodology, Y.G., P.H. and Z.Z.; Project administration, J.Y. and M.L.; Resources, P.L. and R.Z.; Software, Y.G., P.H. and Z.Z.; Supervision, J.Y. and M.L.; Validation, Z.L. and R.Z.; Visualization, P.L. and R.Z.; Writing—original draft, Y.G., P.H., Z.Z., P.L., Z.L., R.Z., J.Y. and M.L., Y.G., P.H. and Z.Z. contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 61202479.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Luchkin, A.; Lukasheva, O.; Novikova, N.; Melnikov, V.; Zyatkova, A.; Yarotskaya, E. Cryptocurrencies in the global financial system: Problems and ways to overcome them. In Proceedings of the Russian Conference on Digital Economy and Knowledge Management (RuDEcK 2020), Voronezh, Russia, 27–29 February 2020; Atlantis Press: Dordrecht, The Netherlands, 2020; pp. 423–430. [Google Scholar]
  2. Kasula, V.K.; Yenugula, M.; Yadulla, A.R.; Konda, B.; Ayyamgari, S. An improved machine learning technique for credit card fraud detection. Edelweiss Appl. Sci. Technol. 2025, 9, 3093–3109. [Google Scholar] [CrossRef]
  3. Qureshi, S.; Aftab, M.; Bouri, E.; Saeed, T. Dynamic interdependence of cryptocurrency markets: An analysis across time and frequency. Phys. A Stat. Mech. Its Appl. 2020, 559, 125077. [Google Scholar] [CrossRef]
  4. Dupuis, D.; Gleason, K. Money laundering with cryptocurrency: Open doors and the regulatory dialectic. J. Financ. Crime 2021, 28, 60–74. [Google Scholar] [CrossRef]
  5. Prendi, L.; Borakaj, D.; Prendi, K. The new money laundering machine through cryptocurrency: Current and future public governance challenges. Corp. Law Gov. Rev. 2023, 5, 84–91. [Google Scholar] [CrossRef]
  6. Aidoo, S.; AML, I.D.; AML, M.; Expert, F.C.C. Transaction Monitoring and Suspicious Activity Reporting (SAR). J. Financ. Crime Prev. 2023, 15, 112–130. [Google Scholar]
  7. Peng, Z.; Yin, X.; Wang, G.; Ying, C.; Chen, W.; Jiang, X.; Xu, Y.; Luo, Y. MulChain: Enabling Advanced Cross-Modal Queries in Hybrid-Storage Blockchains. arXiv 2025, arXiv:2502.18258. [Google Scholar]
  8. Ozer, F.; Sakar, C.O. An automated cryptocurrency trading system based on the detection of unusual price movements with a Time-Series Clustering-Based approach. Expert Syst. Appl. 2022, 200, 117017. [Google Scholar] [CrossRef]
  9. Zhang, C. Big Data-Driven Financial Risk Management: Application of Association Rule Mining Technology in Anomaly Detection. In Proceedings of the 2024 International Conference on Industrial IoT, Big Data and Supply Chain (IIoTBDSC), Wuhan, China, 20–22 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 127–131. [Google Scholar]
  10. Popoola, N.T. Big data-driven financial fraud detection and anomaly detection systems for regulatory compliance and market stability. Int. J. Comput. Appl. Technol. Res 2023, 12, 32–46. [Google Scholar]
  11. Hassan, M.U.; Rehmani, M.H.; Chen, J. Anomaly detection in blockchain networks: A comprehensive survey. IEEE Commun. Surv. Tutor. 2022, 25, 289–318. [Google Scholar] [CrossRef]
  12. Mazumder, M.T.R.; Shourov, M.S.H.; Rasul, I.; Akter, S.; Miah, M.K. Anomaly Detection in Financial Transactions Using Convolutional Neural Networks. J. Econ. Financ. Account. Stud. 2025, 7, 195–207. [Google Scholar] [CrossRef]
  13. Setiadi, D.R.I.M.; Muslikh, A.R.; Iriananda, S.W.; Warto, W.; Gondohanindijo, J.; Ojugo, A.A. Outlier detection using Gaussian mixture model clustering to optimize XGBoost for credit approval prediction. J. Comput. Theor. Appl. 2024, 2, 244–255. [Google Scholar] [CrossRef]
  14. Liu, X.; Zhu, S.; Yang, F.; Liang, S. Research on unsupervised anomaly data detection method based on improved automatic encoder and Gaussian mixture model. J. Cloud Comput. 2022, 11, 58. [Google Scholar] [CrossRef]
  15. Shrotryia, V.K.; Kalra, H. Herding in the crypto market: A diagnosis of heavy distribution tails. Rev. Behav. Financ. 2022, 14, 566–587. [Google Scholar] [CrossRef]
  16. Zhang, L.; Zhang, Y.; Ma, X. A new strategy for tuning ReLUs: Self-adaptive linear units (SALUs). In Proceedings of the ICMLCA 2021; 2nd International Conference on Machine Learning and Computer Application, Shenyang, China, 17–19 December 2021; VDE: Berlin, Germany, 2021; pp. 1–8. [Google Scholar]
  17. Halstead, J.; Vikraman, R.; Prasad, R. Business Anomaly Detector: A Novel Approach to Identify Anomalies in Time Series Semi-Supervised Transactional Data. IEEE Trans. Knowl. Data Eng. 2022, 34, 3721–3735. [Google Scholar]
  18. Gorle, V.L.N.; Panigrahi, S. A semi-supervised Anti-Fraud model based on integrated XGBoost and BiGRU with self-attention network: An application to internet loan fraud detection. Multimed. Tools Appl. 2024, 83, 56939–56964. [Google Scholar] [CrossRef]
  19. Panigrahi, A.; Pati, A.; Addula, S.R.; Pati, A.K.; Sahoo, G.; Dash, M. An Ensemble Machine Learning-Based Model for Blockchain Transactional Data Classification. In Proceedings of the International Conference on Biologically Inspired Techniques in Many-Criteria Decision-Making Technologies; Springer: Cham, Switzerland, 2024; pp. 430–438. [Google Scholar]
  20. Sanjalawe, Y.K.; Al-E’mari, S.R. Abnormal transactions detection in the ethereum network using semi-supervised generative adversarial networks. IEEE Access 2023, 11, 98516–98531. [Google Scholar] [CrossRef]
  21. Khosravi, S.; Kargari, M.; Teimourpour, B.; Eshghi, A.; Aliabdi, A. Using supervised machine learning approaches to detect fraud in the banking transaction network. In Proceedings of the 2023 9th International Conference on Web Research (ICWR), Tehran, Iran, 3–4 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 115–119. [Google Scholar]
  22. Yan, P.; Abdulkadir, A.; Luley, P.P.; Rosenthal, M.; Schatte, G.A.; Grewe, B.F.; Stadelmann, T. A comprehensive survey of deep transfer learning for anomaly detection in industrial time series: Methods, applications, and directions. IEEE Access 2024, 12, 3768–3789. [Google Scholar] [CrossRef]
  23. Zamanzadeh Darban, Z.; Webb, G.I.; Pan, S.; Aggarwal, C.; Salehi, M. Deep learning for time series anomaly detection: A survey. ACM Comput. Surv. 2024, 57, 1–42. [Google Scholar] [CrossRef]
  24. Yu, K.; Chen, Y.; Trinh, T.K.; Bi, W. Real-time detection of anomalous trading patterns in financial markets using generative adversarial networks. Expert Syst. Appl. 2025, 213, 124567. [Google Scholar] [CrossRef]
  25. James, U.U.; Idika, C.N.; Enyejo, L.A.; Abiodun, K.; Enyejo, J.O. Adversarial Attack Detection Using Explainable AI and Generative Models in Real-Time Financial Fraud Monitoring Systems. Int. J. Sci. Res. Mod. Technol. 2024, 3, 142–157. [Google Scholar] [CrossRef]
  26. Dixit, S. Advanced Generative AI Models for Fraud Detection and Prevention in FinTech: Leveraging Deep Learning and Adversarial Networks for Real-Time Anomaly Detection in Financial Transactions. Authorea 2024. preprints. [Google Scholar]
  27. Qu, X.; Liu, Z.; Wu, C.Q.; Hou, A.; Yin, X.; Chen, Z. Mfgan: Multimodal fusion for industrial anomaly detection using attention-based autoencoder and generative adversarial network. Sensors 2024, 24, 637. [Google Scholar] [CrossRef]
  28. Chen, L.; Zhou, X.; Zhou, P.; Sun, X.; Zheng, S. Anomaly detection method for power system information based on multimodal data. PeerJ Comput. Sci. 2025, 11, e2976. [Google Scholar] [CrossRef]
  29. Huo, Y.; Gang, S.; Guan, C. FCIHMRT: Feature cross-layer interaction hybrid method based on Res2Net and transformer for remote sensing scene classification. Electronics 2023, 12, 4362. [Google Scholar] [CrossRef]
  30. Wang, Z. Abnormal financial transaction detection via ai technology. Int. J. Distrib. Syst. Technol. (IJDST) 2021, 12, 24–34. [Google Scholar] [CrossRef]
  31. Liu, Z.; Gao, H.; Lei, H.; Liu, Z.; Liu, C. Blockchain anomaly transaction detection: An overview, challenges, and open issues. In Proceedings of the International Conference on Information Science, Communication and Computing; Springer: Singapore, 2023; pp. 126–140. [Google Scholar]
  32. RASHEED, M. Analyzing applications and properties of the exponential continuous distribution in reliability and survival analysis. J. Posit. Sci. 2023, 4, 71–79. [Google Scholar]
  33. Zhou, F.; Guo, W. Identification of high-frequency volatility and risk prevention in cryptocurrencies. China Financ. Rev. Int. 2025, 12, 234–251. [Google Scholar] [CrossRef]
  34. Alapati, N.K. Graph-based Semi-Supervised Learning for Fraud Detection in Finance. Int. Res. J. Eng. Sci. Technol. Innov. 2024, 11, 211–220. [Google Scholar]
  35. De Gaspari, F.; Hitaj, D.; Pagnotta, G.; De Carli, L.; Mancini, L.V. Reliable detection of compressed and encrypted data. Neural Comput. Appl. 2022, 34, 20379–20393. [Google Scholar] [CrossRef]
  36. Wiedmer, R.; Griffis, S.E. Structural characteristics of complex supply chain networks. J. Bus. Logist. 2021, 42, 264–290. [Google Scholar] [CrossRef]
  37. Kannan, N. A review of Deep Generative Models for Synthetic Financial Data Generation. Int. J. Financ. Data Sci. (IJFDS) 2024, 2, 1–10. [Google Scholar]
  38. Zhang, Y.; He, S.; Wa, S.; Zong, Z.; Lin, J.; Fan, D.; Fu, J.; Lv, C. Symmetry GAN detection network: An automatic one-stage high-accuracy detection network for various types of lesions on CT images. Symmetry 2022, 14, 234. [Google Scholar] [CrossRef]
  39. Malempati, M. Machine Learning and Generative Neural Networks in Adaptive Risk Management: Pioneering Secure Financial Frameworks. Kurd. Stud. 2022, 10, 691–701. [Google Scholar] [CrossRef]
  40. Malempati, M. Generative AI-Driven Innovation in Digital Identity Verification: Leveraging Neural Networks for Next-Generation Financial Security. 2024. Available online: https://ssrn.com/abstract=5204991 (accessed on 15 December 2024).
  41. Sai, S.; Arunakar, K.; Chamola, V.; Hussain, A.; Bisht, P.; Kumar, S. Generative AI for finance: Applications, case studies and challenges. Expert Syst. 2025, 42, e70018. [Google Scholar] [CrossRef]
  42. Andronie, M.; Blažek, R.; Iatagan, M.; Skypalova, R.; Uță, C.; Dijmărescu, A.; Kovacova, M.; Grecu, G.; Parvu, I.; Strakova, J.; et al. Generative artificial intelligence algorithms in Internet of Things blockchain-based fintech management. Oeconomia Copernic. 2024, 15, 1349–1381. [Google Scholar] [CrossRef]
  43. Wilson, D.; Azmani, A. Generative Adversarial Networks: A Systematic Review of Characteristics, Applications, and Challenges in Financial Data Generation and Market Modeling: 2019–2024. Int. J. Eng. Trans. B Appl. 2026, 39, 395–406. [Google Scholar] [CrossRef]
  44. Li, Q.; Zhang, Y. Confidential Federated Learning for Heterogeneous Platforms against Client-Side Privacy Leakages. In Proceedings of the ACM Turing Award Celebration Conference-China 2024, Changsha, China, 5–7 July 2024; pp. 239–241. [Google Scholar]
  45. Luo, B.; Zhang, Z.; Wang, Q.; He, B. Multi-Chain Graphs of Graphs: A New Approach to Analyzing Blockchain Datasets. Adv. Neural Inf. Process. Syst. 2024, 37, 28490–28514. [Google Scholar]
  46. Teng, Y.; Lv, J.; Wang, Z.; Gao, Y.; Dong, W. TimeChain: A Secure and Decentralized Off-chain Storage System for IoT Time Series Data. In Proceedings of the ACM on Web Conference 2025, Sydney, NSW, Australia, 28 April–2 May 2025; pp. 3651–3659. [Google Scholar]
  47. Li, Q.; Ren, J.; Zhang, Y.; Song, C.; Liao, Y.; Zhang, Y. Privacy-Preserving DNN Training with Prefetched Meta-Keys on Heterogeneous Neural Network Accelerators. In Proceedings of the 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 9–13 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
  48. Wang, T.; Zhao, X.; Zhang, J. TMF-Net: Multimodal smart contract vulnerability detection based on multiscale transformer fusion. Inf. Fusion 2025, 122, 103189. [Google Scholar] [CrossRef]
  49. Dashtaki, S.M.; Chagahi, M.H.; Bahadori, A.; Moshiri, B.; Piran, M.J.; Montazeri, A. HSIF: A Transformer-Based Cross-Attention Framework for Cryptocurrency Trend Forecasting via Multimodal Sentiment–Market Fusion. IEEE Access 2025, 13, 156600–156612. [Google Scholar] [CrossRef]
  50. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  51. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  52. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  53. Kipf, T. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  54. Haider, M.Z.; Noreen, T.; Salman, M. Blockchain Fraud Detection Using Ensemble Graph Neural Networks. J. Artif. Intell. Res. 2025, 2, 24–41. [Google Scholar] [CrossRef]
  55. Chen, S.; Liu, Y.; Zhang, Q.; Shao, Z.; Wang, Z. Multi-Distance Spatial-Temporal Graph Neural Network for Anomaly Detection in Blockchain Transactions. Adv. Intell. Syst. 2025, 7, 2400898. [Google Scholar] [CrossRef]
Figure 1. The overall architecture of the proposed GAN-based fraudulent transaction generator designed to alleviate the severe scarcity of anomalous samples in cryptocurrency datasets. The module functions as a conditional generative adversarial network where the Generator (G) transforms a random noise vector (z) sampled from a normal distribution and a specific class condition (y) into high-fidelity synthetic transaction samples ( P f a k e ). Simultaneously, a Discriminator (D) is trained to distinguish between real data and synthesized samples, providing feedback to refine the generation process. Uniquely, the framework incorporates an Attack Detector loop to generate adversarial examples ( P a d v ) during training. This adversarial process iteratively updates the model parameters to ensure that the synthesized fraudulent transactions maintain both structural consistency and semantic coherence with real-world anomalies, thereby enhancing the robustness of the downstream detection system.
Figure 1. The overall architecture of the proposed GAN-based fraudulent transaction generator designed to alleviate the severe scarcity of anomalous samples in cryptocurrency datasets. The module functions as a conditional generative adversarial network where the Generator (G) transforms a random noise vector (z) sampled from a normal distribution and a specific class condition (y) into high-fidelity synthetic transaction samples ( P f a k e ). Simultaneously, a Discriminator (D) is trained to distinguish between real data and synthesized samples, providing feedback to refine the generation process. Uniquely, the framework incorporates an Attack Detector loop to generate adversarial examples ( P a d v ) during training. This adversarial process iteratively updates the model parameters to ensure that the synthesized fraudulent transactions maintain both structural consistency and semantic coherence with real-world anomalies, thereby enhancing the robustness of the downstream detection system.
Electronics 15 00633 g001
Figure 2. The schematic architecture of the multi-domain time series detection module, which employs a dual-branch parallel processing strategy to jointly characterize anomalous signals in both time and frequency domains. The upper panel illustrates the high-level workflow where the input multimodal time series is bifurcated into a Long-term Time Domain Branch and a Frequency Domain Branch, with their respective feature embeddings ( E T and E F ) subsequently integrated via a Fusion layer. As detailed in the bottom panels, the Time Domain Branch utilizes a hierarchical 1D Convolutional Neural Network followed by ReLU activation and Max Pooling to capture local temporal variations and long-term dependencies. Conversely, the Frequency Domain Branch applies a Fast Fourier Transform (FFT) to convert the transaction sequence into a spectral representation, which is then processed by a Multilayer Perceptron (MLP) to identify periodic irregularities and high-frequency volatility shifts that are typical of automated bot attacks but difficult to detect in the time domain.
Figure 2. The schematic architecture of the multi-domain time series detection module, which employs a dual-branch parallel processing strategy to jointly characterize anomalous signals in both time and frequency domains. The upper panel illustrates the high-level workflow where the input multimodal time series is bifurcated into a Long-term Time Domain Branch and a Frequency Domain Branch, with their respective feature embeddings ( E T and E F ) subsequently integrated via a Fusion layer. As detailed in the bottom panels, the Time Domain Branch utilizes a hierarchical 1D Convolutional Neural Network followed by ReLU activation and Max Pooling to capture local temporal variations and long-term dependencies. Conversely, the Frequency Domain Branch applies a Fast Fourier Transform (FFT) to convert the transaction sequence into a spectral representation, which is then processed by a Multilayer Perceptron (MLP) to identify periodic irregularities and high-frequency volatility shifts that are typical of automated bot attacks but difficult to detect in the time domain.
Electronics 15 00633 g002
Figure 3. The detailed architecture of the Transformer-clustering detection module, designed to model the dynamic evolution of cryptocurrency transactions within a multimodal feature space while identifying deviations from normal manifold structures. The input transaction sequence ( x 1 , , x N ) is first projected into a high-dimensional feature space via data-specific MLPs ( M L P d a t a ) and concatenated with global gating vectors generated by M L P ρ . A Transformer Block, utilizing multi-head self-attention mechanisms, processes these embeddings to capture long-range dependencies and contextual semantics ( e 1 , , e N + 1 ). The output is then fed into two parallel inference heads: one estimating the Cluster Responsibilities ( P ( z | X , k ) ) to probabilistically assign transactions to latent normal behavior modes, and another dynamically predicting the optimal Number of Clusters ( P ( k | X ) ). This architecture allows the system to robustly detect anomalies that manifest as significant deviations from dynamically updated cluster centers in the high-dimensional latent space.
Figure 3. The detailed architecture of the Transformer-clustering detection module, designed to model the dynamic evolution of cryptocurrency transactions within a multimodal feature space while identifying deviations from normal manifold structures. The input transaction sequence ( x 1 , , x N ) is first projected into a high-dimensional feature space via data-specific MLPs ( M L P d a t a ) and concatenated with global gating vectors generated by M L P ρ . A Transformer Block, utilizing multi-head self-attention mechanisms, processes these embeddings to capture long-range dependencies and contextual semantics ( e 1 , , e N + 1 ). The output is then fed into two parallel inference heads: one estimating the Cluster Responsibilities ( P ( z | X , k ) ) to probabilistically assign transactions to latent normal behavior modes, and another dynamically predicting the optimal Number of Clusters ( P ( k | X ) ). This architecture allows the system to robustly detect anomalies that manifest as significant deviations from dynamically updated cluster centers in the high-dimensional latent space.
Electronics 15 00633 g003
Figure 4. ROC curves of different models.
Figure 4. ROC curves of different models.
Electronics 15 00633 g004
Figure 5. Box plot comparison of AUC value distribution for different analytical methods.
Figure 5. Box plot comparison of AUC value distribution for different analytical methods.
Electronics 15 00633 g005
Figure 6. Performance comparison of various ablation variants under different metrics.
Figure 6. Performance comparison of various ablation variants under different metrics.
Electronics 15 00633 g006
Table 1. Multimodal cryptocurrency dataset statistics.
Table 1. Multimodal cryptocurrency dataset statistics.
Data TypeSourceScale
On-chain transaction dataBitcoin, Ethereum full nodes58,420,000 transactions
Transaction graph nodesAddress interaction parsing6,300,000 addresses
Market price time-seriesBinance, Coinbase Pro, OKX3,820,000 min-level records
Order-book depth and volatilityExchange APIs1,120,000 records
Enterprise IPO and firm fundamentalsStock exchanges, financial disclosures2439 companies
Abnormal event labelsElliptic, Chainalysis, blacklists42,600 labeled instances
Table 2. Overall Comparison with Baseline Methods. Bold represents the best result.
Table 2. Overall Comparison with Baseline Methods. Bold represents the best result.
MethodPrecisionRecallF1-ScoreAUCFPR ↓Latency (ms)
SVM [50]0.6110.3820.4680.7010.1475.2
Random Forest [51]0.6580.4170.5090.7420.1396.1
LSTM [52]0.6710.4520.5420.7630.12812.4
GCN [53]0.6890.4810.5630.7810.12310.7
MDST-GNN [55]0.7130.5330.6110.8120.11215.9
Ensemble-GNN [54]0.7290.6050.6610.8420.10122.8
Proposed Method (Ours)0.752 **0.703 **0.727 **0.889 **0.084 **19.2
** Indicates statistical significance at p < 0.01 level compared to the best baseline (Ensemble-GNN).
Table 3. Impact of GAN generation quality and data augmentation on detection performance. Bold represents the best result.
Table 3. Impact of GAN generation quality and data augmentation on detection performance. Bold represents the best result.
Model/MetricFID ↓IS ↑Recall ↑F1-Score ↑
Without GAN (raw data)0.4210.503
Basic GAN46.23.110.5170.561
GAN + Feature Consistency32.53.840.5820.609
GAN + Multi-domain Joint Training28.94.020.6130.644
Proposed GAN-enhanced Model25.74.150.671 **0.692 **
** Indicates statistical significance at p < 0.01 level compared to the second-best model.
Table 4. Performance comparison of anomaly distribution modeling methods. Bold represents the best result.
Table 4. Performance comparison of anomaly distribution modeling methods. Bold represents the best result.
MethodPrecisionRecallF1-ScoreAUCFPR ↓
One-Class SVM0.6110.3820.4680.7010.147
Autoencoder (AE)0.6440.4310.5150.7340.133
VAE0.6730.4910.5670.7680.119
VAE + GAN Augmented Data0.7020.5530.6190.8120.104
Multi-domain + VAE (proposed)0.731 **0.602 **0.660 **0.854 **0.091 **
** Indicates statistical significance at p < 0.01 level compared to the second-best method.
Table 5. Sensitivity Analysis of Key Hyperparameters ( λ , h, β ) on Detection Performance (F1-score). Bold represents the best result.
Table 5. Sensitivity Analysis of Key Hyperparameters ( λ , h, β ) on Detection Performance (F1-score). Bold represents the best result.
GAN Weight ( λ )KDE Bandwidth (h)Fusion Weight ( β )
ValueF1-ScoreValueF1-ScoreValueF1-Score
0.10.6540.10.6820.00.641
1.00.6890.50.7150.20.678
5.00.7121.00.7270.40.705
10.00.7271.50.7040.60.727
20.00.7082.00.6680.80.712
50.00.663--1.00.693
Table 6. Performance and real-time evaluation of multimodal fusion and dual-branch detection. Bold represents the best result.
Table 6. Performance and real-time evaluation of multimodal fusion and dual-branch detection. Bold represents the best result.
Model ConfigurationAUCF1-ScoreLatency (ms)FPR ↓
Transformer Only0.8120.64114.70.121
Clustering Only0.7810.6038.30.139
Transformer + Clustering (no fusion)0.8260.65917.90.113
Fixed-Weight Fusion0.8510.68118.40.108
Adaptive Risk Fusion (proposed)0.884 **0.713 **19.20.087 **
** Indicates statistical significance at p < 0.01 level compared to Fixed-Weight Fusion.
Table 7. Ablation study: contribution of each module to final performance. Bold represents the best result.
Table 7. Ablation study: contribution of each module to final performance. Bold represents the best result.
Model VariantPrecisionRecallF1-ScoreAUC
Full Model (proposed)0.742 **0.691 **0.715 **0.889 **
Without GAN Module0.6880.5730.6250.812
Without multi-domain + VAE0.6610.5120.5770.794
Without Transformer Branch0.6530.4980.5640.781
Without Clustering Branch0.6440.4720.5440.769
Without Fusion Mechanism0.6920.5580.6170.802
** Indicates statistical significance at p < 0.01 level compared to the best variant.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gong, Y.; Hu, P.; Zhang, Z.; Liu, P.; Li, Z.; Zhang, R.; Yin, J.; Li, M. Data-Centric Generative and Adaptive Detection Framework for Abnormal Transaction Prediction. Electronics 2026, 15, 633. https://doi.org/10.3390/electronics15030633

AMA Style

Gong Y, Hu P, Zhang Z, Liu P, Li Z, Zhang R, Yin J, Li M. Data-Centric Generative and Adaptive Detection Framework for Abnormal Transaction Prediction. Electronics. 2026; 15(3):633. https://doi.org/10.3390/electronics15030633

Chicago/Turabian Style

Gong, Yunpeng, Peng Hu, Zihan Zhang, Pengyu Liu, Zhengyang Li, Ruoyun Zhang, Jinghui Yin, and Manzhou Li. 2026. "Data-Centric Generative and Adaptive Detection Framework for Abnormal Transaction Prediction" Electronics 15, no. 3: 633. https://doi.org/10.3390/electronics15030633

APA Style

Gong, Y., Hu, P., Zhang, Z., Liu, P., Li, Z., Zhang, R., Yin, J., & Li, M. (2026). Data-Centric Generative and Adaptive Detection Framework for Abnormal Transaction Prediction. Electronics, 15(3), 633. https://doi.org/10.3390/electronics15030633

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop