A Data-Driven Multimodal Method for Early Detection of Coordinated Abnormal Behaviors in Live-Streaming Platforms
Abstract
1. Introduction
- A multimodal abnormal marketing dataset encompassing video, text, audio, and user behavior sequences is constructed, together with data synchronization and quality calibration strategies to enhance cross-modal consistency and reliability.
- A cross-modal temporal alignment module is proposed, which integrates dynamic time alignment and semantic attention mechanisms to achieve effective fusion of heterogeneous modalities within a unified temporal semantic space.
- A transformer-based temporal anomaly modeling module is designed to capture early weak signals and abrupt abnormal patterns of fraudulent behaviors, thereby enhancing the detection capability for covert marketing fraud.
- A cooperative behavior detection module is introduced, in which graph neural networks are employed to model user interaction structures, enabling the identification of organized groups such as paid posters and bot clusters, along with interpretable abnormal subgraph analysis.
- Diffusion models and self-supervised learning strategies are incorporated to improve model generalization performance and cross-scenario transferability under weakly labeled and small-sample conditions.
2. Related Work
2.1. Multimodal Temporal Representation Learning
2.2. Graph-Based Relational Structure Modeling
2.3. Generative and Weakly Supervised Learning Paradigms
3. Materials and Method
3.1. Data Collection
3.2. Data Preprocessing and Augmentation
3.3. Proposed Method
3.3.1. Overall
3.3.2. Cross-Modal Temporal Alignment
3.3.3. Temporal Fraud Pattern Modeling
3.3.4. Cooperative Manipulation Detection
4. Results and Discussion
4.1. Experimental Configuration
4.1.1. Hardware and Software Platform
4.1.2. Baseline Models and Evaluation Metrics
4.2. Overall Performance Comparison with Baseline Methods
4.3. Ablation Study of Different Components in MM-FGDNet
4.4. Cross-Domain Generalization Performance Under Different Target Scenarios
4.5. Discussion
4.6. Limitation and Future Work
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rachmad, Y.E. The Evolution of Seller Behavior: From Traditional Markets to Live Streaming; YER E-Book Publication: Jakarta, Indonesia, 2024. [Google Scholar]
- Yang, L.; Yuan, X.; Yang, X. Study of the influencing mechanism of user interaction behavior of short video e-commerce live-streaming from the perspective of SOR theory and interactive ritual chains. Curr. Psychol. 2024, 43, 28403–28415. [Google Scholar] [CrossRef]
- Fan, J.; Peng, L.; Chen, T.; Cong, G. Regulation strategy for behavioral integrity of live streamers: From the perspective of the platform based on evolutionary game in China. Electron. Mark. 2024, 34, 21. [Google Scholar] [CrossRef]
- Chen, X.; Ji, L.; Jiang, L.; Huang, J.T. The bright side of emotional extremity: Evidence from tipping in live streaming platform. Inf. Manag. 2023, 60, 103726. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, J.; Fang, Y. Live streaming e-commerce: The impact of the intensity, duration, and phases of peak interaction on sales performance. In Proceedings of the 14th China Summer Workshop on Information Management (CISWIM), Chongqing, China, 26–28 June 2021; Volume 6. [Google Scholar]
- Wu, Q.; Sang, Y.; Wang, D.; Lu, Z. Malicious selling strategies in livestream e-commerce: A case study of Alibaba’s Taobao and ByteDance’s TikTok. Acm Trans. Comput.-Hum. Interact. 2023, 30, 1–29. [Google Scholar] [CrossRef]
- Gan, T.; Yang, K.; Wang, W. Review of Machine Learning and False Advertising in Live E-commerce: Features, Motivations, and Identification Studies. In International Conference on Computing and Communication Networks; Springer: Berlin/Heidelberg, Germany, 2024; pp. 297–306. [Google Scholar]
- Zhang, X.; Han, Y.; Xu, W.; Wang, Q. HOBA: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture. Inf. Sci. 2021, 557, 302–316. [Google Scholar] [CrossRef]
- Baesens, B.; Höppner, S.; Verdonck, T. Data engineering for fraud detection. Decis. Support Syst. 2021, 150, 113492. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, Y.; Ma, X. A new strategy for tuning ReLUs: Self-adaptive linear units (SALUs). In ICMLCA 2021, 2nd International Conference on Machine Learning and Computer Application; VDE: Offenbach am Main, Germany, 2021; pp. 1–8. [Google Scholar]
- Nyasala, U.S.; Lingannagari, N.R.; Kakarla, P.; Gujjala, U.K.; Neeruganti, S.H.V. Predictive analytics with machine learning for fraud detection of online marketing transactions. In AIP Conference Proceedings; AIP Publishing LLC: Melville, NY, USA, 2025; Volume 3237, p. 020029. [Google Scholar]
- Li, Q.; Ren, J.; Zhang, Y.; Song, C.; Liao, Y.; Zhang, Y. Privacy-Preserving DNN Training with Prefetched Meta-Keys on Heterogeneous Neural Network Accelerators. In 2023 60th ACM/IEEE Design Automation Conference (DAC); IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
- Olushola, A.; Mart, J. Fraud detection using machine learning. Sci. Prepr. 2024, 17, 103–115. [Google Scholar]
- Wang, Y. Analysis of users’ impulse purchase behavior based on data mining for e-commerce live broadcast. In Electronic Commerce Research; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–24. [Google Scholar]
- Zhang, C.; Wang, Y.; Zhang, J. Identification and Evaluation of Key Risk Factors of Live Streaming e-Commerce Transactions Based on Social Network Analysis. J. Theor. Appl. Electron. Commer. Res. 2025, 20, 169. [Google Scholar] [CrossRef]
- Malik, M. The Future of Modern Finance: AI-Driven Fraud Detection and Energy Market Forecasting; Springer: Berlin/Heidelberg, Germany, 2025. [Google Scholar]
- Kokab, S.T.; Asghar, S.; Naz, S. Transformer-based deep learning models for the sentiment analysis of social media data. Array 2022, 14, 100157. [Google Scholar] [CrossRef]
- Hasan, M.; Islam, L.; Jahan, I.; Meem, S.M.; Rahman, R.M. Natural language processing and sentiment analysis on bangla social media comments on russia–ukraine war using transformers. Vietnam J. Comput. Sci. 2023, 10, 329–356. [Google Scholar] [CrossRef]
- Ayetiran, E.F.; Özgöbek, Ö. An inter-modal attention-based deep learning framework using unified modality for multimodal fake news, hate speech and offensive language detection. Inf. Syst. 2024, 123, 102378. [Google Scholar] [CrossRef]
- Xiao, K.; Qian, Z.; Qin, B. A survey of data representation for multi-modality event detection and evolution. Appl. Sci. 2022, 12, 2204. [Google Scholar] [CrossRef]
- Li, Q.; Zhang, Y.; Ren, J.; Li, Q.; Zhang, Y. You can use but cannot recognize: Preserving visual privacy in deep neural networks. arXiv 2024, arXiv:2404.04098. [Google Scholar] [CrossRef]
- Song, H.; Kim, M.; Park, D.; Shin, Y.; Lee, J.G. Learning from noisy labels with deep neural networks: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 8135–8153. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Zhou, J.; Chen, Y.; Chen, Z.; Hong, L.; Chen, G. Comment detection algorithm of e-commerce water army based on BERT. In Proceedings of the 2024 International Conference on Smart City and Information System, Kuala Lumpur, Malaysia, 17–19 May 2024; pp. 597–603. [Google Scholar]
- Khodabandehlou, S.; Golpayegani, A.H. FiFrauD: Unsupervised financial fraud detection in dynamic graph streams. Acm Trans. Knowl. Discov. Data 2024, 18, 1–29. [Google Scholar] [CrossRef]
- Chen, T.; Tong, C.; Bai, Y.; Yang, J.; Cong, G.; Cong, T. Analysis of the public opinion evolution on the normative policies for the live streaming e-commerce industry based on online comment mining under COVID-19 epidemic in China. Mathematics 2022, 10, 3387. [Google Scholar] [CrossRef]
- Shehnepoor, S.; Togneri, R.; Liu, W.; Bennamoun, M. HIN-RNN: A graph representation learning neural network for fraudster group detection with no handcrafted features. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 4153–4166. [Google Scholar] [CrossRef] [PubMed]
- Yu, J.; Wang, H.; Wang, X.; Li, Z.; Qin, L.; Zhang, W.; Liao, J.; Zhang, Y.; Yang, B. Temporal insights for group-based fraud detection on e-commerce platforms. IEEE Trans. Knowl. Data Eng. 2024, 37, 951–965. [Google Scholar] [CrossRef]
- Li, Z.; Wang, H.; Zhang, P.; Hui, P.; Huang, J.; Liao, J.; Zhang, J.; Bu, J. Live-streaming fraud detection: A heterogeneous graph neural network approach. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 3670–3678. [Google Scholar]
- Li, W.; Fan, H.; Wong, Y.; Yang, Y.; Kankanhalli, M. Improving context understanding in multimodal large language models via multimodal composition learning. In Proceedings of the Forty-First International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Wan, Q.; Chen, J.; Yu, C.; Lu, M.; Liu, D. Optimal marketing strategies for live streaming rooms in livestream e-commerce. Electron. Commer. Res. 2024, 25, 4655. [Google Scholar] [CrossRef]
- Xu, G.; Ren, M.; Wang, Z.; Li, G. MEMF: Multi-entity multimodal fusion framework for sales prediction in live streaming commerce. Decis. Support Syst. 2024, 184, 114277. [Google Scholar] [CrossRef]
- Porras, D.C.; Louwerse, M.M. Face to face: The eyes as an anchor in multimodal communication. Cognition 2025, 256, 106047. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Q.; Li, X.; Gao, P. Forecasting Sales in Live-Streaming Cross-Border E-Commerce in the UK Using the Temporal Fusion Transformer Model. J. Theor. Appl. Electron. Commer. Res. 2025, 20, 92. [Google Scholar] [CrossRef]
- Zhang, W.; Xie, R.; Quan, P.; Ma, Z. Product return prediction in live streaming e-commerce with cross-modal contrastive transformer. Decis. Support Syst. 2025, 194, 114470. [Google Scholar] [CrossRef]
- Li, H.; Wang, R.; Shi, C. “Oh My God! Buy it!” Analysis on the Characteristics of Anchor’s Speech in Live Broadcast E-Commerce and Purchase Intention: A Linguistic Perspective. Sage Open 2023, 13, 21582440231202126. [Google Scholar] [CrossRef]
- Luo, H.; Cheng, S.; Zhou, W.; Yu, S.; Lin, X. A study on the impact of linguistic persuasive styles on the sales volume of live streaming products in social e-commerce environment. Mathematics 2021, 9, 1576. [Google Scholar] [CrossRef]
- Lin, W.; Li, C. Review of studies on emotion recognition and judgment based on physiological signals. Appl. Sci. 2023, 13, 2573. [Google Scholar] [CrossRef]
- Xiong, Y.; Wei, N.; Qiao, K.; Li, Z.; Li, Z. Exploring consumption intent in live e-commerce barrage: A text feature-based approach using Bert-BiLSTM model. IEEE Access 2024, 12, 69288–69298. [Google Scholar] [CrossRef]
- Kwon, O.H.; Vu, K.; Bhargava, N.; Radaideh, M.I.; Cooper, J.; Joynt, V.; Radaideh, M.I. Sentiment analysis of the United States public support of nuclear power on social media using large language models. Renew. Sustain. Energy Rev. 2024, 200, 114570. [Google Scholar] [CrossRef]
- Zhou, R.; Shen, Q.; Kong, H. A study of text classification algorithms for live-streaming e-commerce comments based on improved BERT model. PLoS ONE 2025, 20, e0316550. [Google Scholar] [CrossRef]
- Cross-Modal Representation Learning: Joint and Distributed Embedding. Ph.D. Thesis, Seoul National University Graduate School, Seoul, Republic of Korea, 2022.
- Yuan, X.; Qi, A.; Wu, H.; Wang, J.; Guo, Y.; Li, S.; Zhao, L. Cross-modal feature alignment and fusion with contrastive learning in multimodal recommendation. Knowl.-Based Syst. 2025, 326, 114020. [Google Scholar] [CrossRef]
- Popoola, N.T. Big data-driven financial fraud detection and anomaly detection systems for regulatory compliance and market stability. Int. J. Comput. Appl. Technol. Res 2023, 12, 32–46. [Google Scholar]
- Khodabandehlou, S.; Hashemi Golpayegani, S.A. How do abnormal trading behaviors diffuse in electronic markets? Soc. Netw. Anal. Min. 2024, 14, 98. [Google Scholar] [CrossRef]
- Thundiyil, S.; Shalamzari, S.; Picone, J.; McKenzie, S. Transformers for modeling long-term dependencies in time series data: A review. In 2023 IEEE Signal Processing in Medicine and Biology Symposium (SPMB); IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
- Hamidi, H.H.; Haghi, B. An approach based on data mining and genetic algorithm to optimizing time series clustering for efficient segmentation of customer behavior. Comput. Hum. Behav. Rep. 2024, 16, 100520. [Google Scholar] [CrossRef]
- Ren, L.; Hu, R.; Li, D.; Liu, Y.; Wu, J.; Zang, Y.; Hu, W. Dynamic graph neural network-based fraud detectors against collaborative fraudsters. Knowl.-Based Syst. 2023, 278, 110888. [Google Scholar] [CrossRef]
- Wen, Z.; Fang, Y. Trend: Temporal event and node dynamics for graph representation learning. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 1159–1169. [Google Scholar]
- Walauskis, M.A.; Khoshgoftaar, T.M. Unsupervised label generation for severely imbalanced fraud data. J. Big Data 2025, 12, 63. [Google Scholar] [CrossRef]
- Njima, W.; Bazzi, A.; Chafii, M. DNN-based indoor localization under limited dataset using GANs and semi-supervised learning. IEEE Access 2022, 10, 69896–69909. [Google Scholar] [CrossRef]
- Lee, J.D.; Lei, Q.; Saunshi, N.; Zhuo, J. Predicting what you already know helps: Provable self-supervised learning. Adv. Neural Inf. Process. Syst. 2021, 34, 309–323. [Google Scholar]
- Liu, X.; Zhang, F.; Hou, Z.; Mian, L.; Wang, Z.; Zhang, J.; Tang, J. Self-supervised learning: Generative or contrastive. IEEE Trans. Knowl. Data Eng. 2021, 35, 857–876. [Google Scholar] [CrossRef]
- Liu, Y.; Li, Z.; Pan, S.; Gong, C.; Zhou, C.; Karypis, G. Anomaly detection on attributed networks via contrastive self-supervised learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2378–2392. [Google Scholar] [CrossRef] [PubMed]
- Ghaleb, F.A.; Saeed, F.; Al-Sarem, M.; Qasem, S.N.; Al-Hadhrami, T. Ensemble synthesized minority oversampling-based generative adversarial networks and random forest algorithm for credit card fraud detection. IEEE Access 2023, 11, 89694–89710. [Google Scholar] [CrossRef]
- Lin, L.; Li, Z.; Li, R.; Li, X.; Gao, J. Diffusion models for time-series applications: A survey. Front. Inf. Technol. Electron. Eng. 2024, 25, 19–41. [Google Scholar] [CrossRef]
- Gharoun, H.; Momenifar, F.; Chen, F.; Gandomi, A.H. Meta-learning approaches for few-shot learning: A survey of recent advances. Acm Comput. Surv. 2024, 56, 1–41. [Google Scholar] [CrossRef]
- Jin, J.; Zhang, Y. The analysis of fraud detection in financial market under machine learning. Sci. Rep. 2025, 15, 29959. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; NeurIPS Proceedings: San Diego, CA, USA, 2017; Volume 30. [Google Scholar]
- Liu, Z.; Dou, Y.; Yu, P.S.; Deng, Y.; Peng, H. Alleviating the inconsistency problem of applying graph neural network to fraud detection. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020; pp. 1569–1572. [Google Scholar]
- Kiela, D.; Bhooshan, S.; Firooz, H.; Perez, E.; Testuggine, D. Supervised multimodal bitransformers for classifying images and text. arXiv 2019, arXiv:1909.02950. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning; PMLR: Cambridge MA, USA, 2021; pp. 8748–8763. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining; IEEE: New York, NY, USA, 2008; pp. 413–422. [Google Scholar]
- Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef] [PubMed]






| Data Modality | Source | Quantity | Granularity |
|---|---|---|---|
| Live-stream video frames | Live e-commerce platforms | 3.2 M frames | Frame-level |
| Audio segments | Live-stream audio tracks | 18,400 clips | Segment-level |
| Comments & barrage text | Live comment streams | 12.6 M entries | Message-level |
| Transaction records | Tipping and order logs | 1.8 M events | Event-level |
| User interaction sequences | Likes, follows, shares | 9.4 M actions | Action-level |
| Method | AUC ↑ | Precision ↑ | Recall ↑ | F1 ↑ | FDR ↑ | FAR ↓ | EDS ↑ | CDGS ↑ |
|---|---|---|---|---|---|---|---|---|
| Isolation Forest | 0.781 | 0.702 | 0.648 | 0.674 | 0.648 | 0.142 | 0.412 | 0.681 |
| One-Class SVM | 0.793 | 0.718 | 0.662 | 0.689 | 0.662 | 0.136 | 0.438 | 0.695 |
| LSTM | 0.834 | 0.756 | 0.721 | 0.738 | 0.721 | 0.121 | 0.502 | 0.734 |
| GRU | 0.842 | 0.764 | 0.728 | 0.746 | 0.728 | 0.118 | 0.517 | 0.742 |
| Transformer (temporal) | 0.867 | 0.791 | 0.756 | 0.773 | 0.756 | 0.106 | 0.561 | 0.771 |
| GNN-based Fraud Detection | 0.872 | 0.798 | 0.761 | 0.779 | 0.761 | 0.103 | 0.548 | 0.786 |
| MMBT | 0.883 | 0.812 | 0.774 | 0.793 | 0.774 | 0.097 | 0.584 | 0.801 |
| CLIP-style Fusion | 0.889 | 0.819 | 0.781 | 0.800 | 0.781 | 0.094 | 0.601 | 0.812 |
| MM-FGDNet (Ours) | 0.927 | 0.861 | 0.834 | 0.847 | 0.834 | 0.071 | 0.689 | 0.872 |
| Variant | AUC ↑ | F1 ↑ | FDR ↑ | FAR ↓ | EDS ↑ | CDGS ↑ |
|---|---|---|---|---|---|---|
| Full MM-FGDNet | 0.927 | 0.847 | 0.834 | 0.071 | 0.689 | 0.872 |
| w/o Cross-Modal Alignment | 0.898 | 0.812 | 0.798 | 0.089 | 0.602 | 0.821 |
| w/o Temporal Fraud Modeling | 0.883 | 0.799 | 0.782 | 0.093 | 0.531 | 0.814 |
| w/o Cooperative Detection | 0.892 | 0.807 | 0.791 | 0.086 | 0.578 | 0.793 |
| w/o Diffusion Augmentation | 0.901 | 0.818 | 0.804 | 0.082 | 0.614 | 0.839 |
| Target Scenario | Method | F1 ↑ | AUC ↑ | EDS ↑ |
|---|---|---|---|---|
| New Streamer | Transformer (temporal) | 0.742 | 0.851 | 0.512 |
| Multimodal Fusion (MMBT) | 0.769 | 0.873 | 0.548 | |
| GNN-based Fraud Detection | 0.779 | 0.881 | 0.536 | |
| MM-FGDNet | 0.816 | 0.914 | 0.662 | |
| New Product Category | Transformer (temporal) | 0.736 | 0.846 | 0.498 |
| Multimodal Fusion (MMBT) | 0.761 | 0.869 | 0.531 | |
| GNN-based Fraud Detection | 0.771 | 0.878 | 0.519 | |
| MM-FGDNet | 0.808 | 0.909 | 0.643 | |
| New Platform | Transformer (temporal) | 0.721 | 0.832 | 0.471 |
| Multimodal Fusion (MMBT) | 0.747 | 0.861 | 0.503 | |
| GNN-based Fraud Detection | 0.756 | 0.871 | 0.491 | |
| MM-FGDNet | 0.801 | 0.902 | 0.628 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Luo, J.; Zhu, P.; Wang, Y.; Xiao, Z.; Li, J.; Kong, X.; Zhan, Y. A Data-Driven Multimodal Method for Early Detection of Coordinated Abnormal Behaviors in Live-Streaming Platforms. Electronics 2026, 15, 769. https://doi.org/10.3390/electronics15040769
Luo J, Zhu P, Wang Y, Xiao Z, Li J, Kong X, Zhan Y. A Data-Driven Multimodal Method for Early Detection of Coordinated Abnormal Behaviors in Live-Streaming Platforms. Electronics. 2026; 15(4):769. https://doi.org/10.3390/electronics15040769
Chicago/Turabian StyleLuo, Jingwen, Pinrui Zhu, Yiyan Wang, Zilin Xiao, Jingqi Li, Xuebei Kong, and Yan Zhan. 2026. "A Data-Driven Multimodal Method for Early Detection of Coordinated Abnormal Behaviors in Live-Streaming Platforms" Electronics 15, no. 4: 769. https://doi.org/10.3390/electronics15040769
APA StyleLuo, J., Zhu, P., Wang, Y., Xiao, Z., Li, J., Kong, X., & Zhan, Y. (2026). A Data-Driven Multimodal Method for Early Detection of Coordinated Abnormal Behaviors in Live-Streaming Platforms. Electronics, 15(4), 769. https://doi.org/10.3390/electronics15040769
