A Hybrid Multi-Agent System for Early Scam Detection in Crypto-Assets
Abstract
1. Introduction
- RQ1 (LLM reliability): To what extent can constrained LLM prompting combined with deterministic rule-based logic produce reproducible, auditable outputs for regulatory document analysis?
- RQ2 (cross-source generalization): How robust is a fraud-detection classifier trained on one community-curated data source when evaluated against an independently labeled expert corpus?
- RQ3 (multi-modal complementarity): Can a multi-agent architecture integrating off-chain documentary analysis with on-chain behavioural signals provide complementary risk evidence that neither modality achieves alone?
- Architectural Innovation—Hybrid Neuro-Symbolic Architecture for Regulatory NLP: We present a Multi-Agent System integrating an LLM-powered Heuristic Agent with a hybrid-AI Compliance Agent, intended to support automated off-chain documentary due diligence and MiCAR-aligned regulatory assessment for crypto-assets. This hybrid paradigm demonstrates that LLM semantic extraction combined with deterministic compliance logic can yield auditable, reproducible regulatory outputs (RQ1).
- Technical Specification: We provide detailed architectural documentation covering agent orchestration protocols, prompt engineering strategies, and structured output schemas that support both interpretability and machine-processable analytical outputs.
- Empirical Validation: We report a multi-tier evaluation: (i) large-scale benchmarking of the On-Chain Agent on 227,000 labeled tokens with 98.23% balanced accuracy in-source and 93.45% cross-source generalization (94.67% combined); (ii) off-chain pipeline assessment on 150 projects, measuring both reproducibility (95% identical alert tier across five independent reruns) and latency (210 s mean end-to-end processing time); and (iii) an exploratory usability study assessing perceived clarity and actionability.
2. State of the Art
2.1. Linking Blockchain Addresses to Real-World Identities
2.2. Detection of Fraudulent Wallets
2.3. Scam Token Detection Approaches
2.4. LLMs for Legal and Regulatory Document Analysis
2.5. Gaps in Current Approaches
3. Platform Design
3.1. System Overview and Objectives
3.2. Functional Requirements and Regulatory Constraints
3.3. High-Level Architecture and Data Flow
3.4. Agent Roles and Interactions
- is the heuristic alert score (range 0.0–1.0) provided by the Heuristic Agent.
- is the compliance score (range 0.0–1.0) from the Compliance Agent. The term is used to convert this into a “non-compliance alert” score.
- is the on-chain alert score (range 0.0–1.0) produced by the On-Chain Agent (Section 4.4).
- , , and are configurable weights (e.g., ) representing the relative importance of each analysis in the final aggregated score, such that . In this study, the default weights are design-time priors chosen to slightly emphasize off-chain heuristic risk (as the earliest documentary signal) while keeping compliance and on-chain signals comparable; these values are not empirically optimized. We do not report a sensitivity analysis here; future work will quantify how weight perturbations affect alert tiers and thresholds, or tune weights per jurisdictional policy. Concretely, assigns greater influence to the heuristic signal because it is available earliest in the assessment lifecycle—before on-chain data accumulate—and directly targets the off-chain documentary gaps that motivate this work; and are set equal, treating regulatory disclosure deficiencies and on-chain behavioural fraud signals as complementary evidence streams of comparable supervisory importance. These values were informed by iterative consultation with domain experts during system design but remain configurable per deployment and jurisdictional policy. In deployments where the on-chain module is disabled, can be set to zero.
4. Materials and Methods
4.1. Data Acquisition
4.2. LLM-Driven Heuristic Alert Scoring
4.3. MiCAR-Aligned Compliance Verification
- Stage 1: Taxonomic Classification. Each project is assigned to one of six MiCAR taxonomic categories: SECURITY, EMT (E-Money Token), ART (Asset-Referenced Token), OTHER, NON_MICAR, or NON_CLASSIFIABLE. This determination is critical, as it prescribes the specific regulatory framework and associated disclosure obligations applicable to each asset class. The classification workflow begins with LLM-mediated semantic extraction. The model analyzes project documentation to identify whether characteristic flags represent Boolean legal and economic properties. Flag definitions are presented to the LLM as contextual guidance (Appendix A, Table A2). Following flag extraction, yielding a boolean feature vector (e.g., redeemable_in_fiat:True, backed_by_assets:False, etc.), this representation is submitted to a deterministic rule-based classification engine implementing explicit logical mappings from flag combinations to MiCAR asset classes (Appendix A, Table A1).
- Stage 2: Disclosure Verification. Upon taxonomic classification (e.g., assignment to EMT), the Compliance Agent proceeds to compliance verification against class-specific regulatory obligations. The system constructs an asset-class-specific compliance checklist by retrieving applicable disclosure requirements (Appendix A, Table A3), which synthesize universal obligations applicable to all MiCAR-regulated crypto-assets and specialized requirements contingent upon the determined classification. A second LLM invocation performs documentary verification. The model receives project documentation alongside the compliance checklist and systematically verifies disclosure presence, that is, whether each mandated element appears within available documentation. To ensure interpretive consistency, the LLM receives operationalized definitions for each requirement (Appendix A, Table A4). Importantly, this stage assesses disclosure existence rather than substantive adequacy or legal sufficiency, which remains within human supervisory purview.
4.4. On-Chain Agent
4.4.1. Data Sources
- TokenScout Dataset: A curated academic corpus of 214,084 ERC-20 tokens encompassing over 9.7 million token transfer events. The dataset was manually labeled by four experienced auditors over 800 man-hours, classifying tokens based on observed behaviors, including abnormal liquidity fluctuations, suspicious transaction patterns, and cross-token interactions. The corpus comprises 212,278 scam tokens (179,995 Rug Pulls, 22,800 Honeypots, 9483 Ponzi schemes) and 1806 verified legitimate tokens.
- ChainAbuse Dataset: Community-driven reports yielding 74,889 fraudulent Ethereum addresses flagged for Rug Pulls and Honeypots. Since reports typically identify scammer Externally Owned Accounts (EOAs) rather than contract addresses, automated contract discovery via Transfer event parsing identified 27,773 candidate tokens, refined through liquidity and activity filters to 11,634 high-confidence scam tokens.
- CoinMarketCap Baseline: The top 2000 tokens by market capitalization serve as a robust negative class of legitimate assets with sustained market scrutiny, providing reliable baseline data for evaluating token behaviors within the Ethereum ecosystem.
4.4.2. Architectural Design and Operational Pipeline
- Stage 1: Multi-Level Transaction Reconstruction. For each analyzed contract, the agent performs exhaustive interaction history extraction via Etherscan API integration:
- 1.
- ERC-20 Token Transfers: Indexing Transfer event logs captures all token movements, including minting, burning, and peer-to-peer transfers, even those triggered by complex internal calls or DEX interactions.
- 2.
- Native ETH Transactions: Direct Ether flows to/from the contract reveal funding sources, initial liquidity provision, and fee accumulation patterns.
- 3.
- Internal Transaction Traces: Deep inspection of internal messages exposes routing via decentralized exchanges (Uniswap, SushiSwap), mixing protocols, and automated market maker (AMM) cascading calls invisible in superficial transaction analysis.
- Stage 2: Heuristic-Based Noise Filtration (ChainAbuse only). To refine the noisy 27,773 candidate corpus derived from ChainAbuse reports into high-confidence fraudulent tokens, a two-step protocol applies:
- Liquidity filter: Exclusion of tokens with non-zero trading volume within the 30-day period prior to the data collection date (verified via DexScreener) removes actively traded assets that are likely legitimate.
- Activity filter: Retention of contracts recording transfers filters out tokens with minimal on-chain footprint, yielding a final ChainAbuse subset of 11,634 high-probability fraudulent tokens.
- Stage 3: Graph-Temporal Feature Engineering. The agent constructs a directed transaction graph where vertices represent addresses and edges denote token transfers, then extracts a 47-dimensional feature vector organized into three semantic categories:
- Structural Features (Network Topology). These features capture how transaction participants are interconnected, revealing network architecture patterns that distinguish legitimate from fraudulent schemes. Centralized control structures and anomalous connectivity patterns often indicate coordinated manipulation.
- Graph Topology Primitives: Node count, edge count, and transaction count quantify network scale.
- Centrality Measures: Degree, Betweenness, Closeness, Eigenvector, and Katz centrality, aggregated via mean/std/min/max. Rug Pulls display high Katz centrality (few wallets controlling value flow) and low closeness (isolated clusters).
- Clustering Coefficient: Local clustering measures network cohesion; Honeypots often show tight-knit collusion networks.
- Entity Distribution Metrics: Unique sender/receiver counts and ratios expose concentration patterns. Scam tokens typically exhibit low entity diversity.
- Monetary Features (Value Flow Patterns). These features analyze the economic dynamics of token transfers, detecting anomalies in transfer amounts, wealth concentration, and fund movement patterns. Fraudulent tokens routinely exhibit extreme value concentration among insiders.
- Value Statistics: Mean, standard deviation, median, and quantiles (Q25, Q75, Q95) of transfer values. Pump-and-dump schemes show extreme value variance.
- Multi-Scale Transformations: Normalized, logarithmic, and harmonic transformations reveal both high-value insider transactions and dusting attacks.
- Accumulated Flow Tracking: Cumulative incoming/outgoing value per edge exposes layered laundering sequences.
- Concentration Indices: Gini coefficient quantifies wealth inequality. Fraudulent tokens routinely exhibit Gini .
- Temporal Features (Activity Dynamics). These features examine the timing and rhythm of transaction activity, distinguishing ephemeral schemes with compressed activity windows from sustained legitimate projects with stable long-term engagement.
- Lifetime Statistics: Activity span and inter-arrival times expose burstiness. Rug Pulls compress activity into 24–72 h windows before abandonment.
- Long-term vs. Short-term Dynamics: Comparison of activity in the early vs. late stages of the token lifespan distinguishes ephemeral schemes from sustained projects.
- Edge Frequency & Recency: Transaction count per edge and days since last transaction capture activity decay post-scam.
4.4.3. Performance Benchmarking
- Comparison with Related Work. The Balanced Random Forest approach involves design trade-offs relative to alternative methodologies:
- vs. TokenScout Temporal GNN: Our approach sacrifices approximately 5 percentage points in accuracy (93.45% vs. 98.41%) but delivers full interpretability via CIU attribution and symbolic rule extraction, addressing EU AI Act transparency requirements.
- vs. Opaque Deep Learning: The Random Forest ensemble requires 1/50th training time (12 CPU-hours vs. 600 GPU-hours for GNN), enables real-time inference (<2 s latency vs. 15–30 s), and provides auditable decision paths suitable for regulatory proceedings.
- vs. Heuristic-Only Baselines: The feature-engineered approach captures 23% more sophisticated scams (e.g., time-locked Honeypots, gradual Rug Pulls) that rule-based systems fail to detect.
4.4.4. Explainability and Investigative Tooling
- Contextual Importance and Utility (CIU) Methodology. Model behavior is analyzed using the Contextual Importance and Utility methodology, which provides a structured approach to interpreting individual feature contributions. CIU assigns two metrics to each feature: Contextual Importance (CI), which quantifies the feature’s potential influence on the prediction, and Contextual Utility (CU), which measures how the current feature value supports the target class. For fraud classification with prediction probability , all top features exhibit , indicating strong support for the scam classification (Table 4).
- Symbolic Rule Extraction. Automated rule induction generates auditable decision logic, enabling non-technical compliance officers to validate flagged contracts. Table 5 presents the extracted fraud detection rules with their corresponding confidence thresholds.
- Feature Interpretation. The extracted rules provide actionable insights:
- Transaction burstiness (eth_max_daily_tx ): High daily transaction volumes correlate with pump-and-dump schemes, where fraudsters rapidly inflate trading activity before exit.
- Network centralization (katz_centrality, closeness_centrality): Fraudulent tokens exhibit highly centralized networks where few actors control value flow, indicative of Sybil attacks or coordinated manipulation.
- Counterparty inequality (counterparty_gini ): Extreme concentration of transactions among a few counterparties suggests insider coordination rather than organic market participation.
4.5. Workflow Orchestration and Evaluation Methodology
- Technical Evaluation. A structured, multi-step protocol was designed to test the system under realistic operational conditions. A dataset of 150 cryptocurrency projects was sourced from CoinMarketCap, selected to ensure diverse representation across token typologies (utility tokens, stablecoins, governance tokens) and project maturities. For this sample, LLM-extracted elements and flags were manually checked against public sources to verify extraction fidelity, but no authoritative, expert-labeled ground-truth benchmark exists to certify which projects are definitively fraudulent versus legitimate. In the absence of confirmed fraud cases, we do not report end-to-end detection metrics (detection rate, false positives, false negatives) for the integrated MAS; instead, we restrict our evaluation to verifying that each individual element extracted by the LLM is factually consistent with publicly accessible information sources.
- 1.
- Unit Testing: Each agent (Crawler Agent, Heuristic Agent, Compliance Agent, and On-Chain Agent) was independently tested to verify successful task completion, including XMPP message delivery validation, intermediate output consistency, and correct inter-agent communication handoffs.
- 2.
- Integration Testing: The complete end-to-end pipeline was evaluated to assess task orchestration and inter-agent cooperation, monitoring message latency, data throughput, and total execution time per token.
- 3.
- Scalability Testing: Performance under increasing workloads was evaluated by subjecting the system to 1–10 parallel token analyses, measuring scalability and resource utilization under high-throughput processing.
- 4.
- Reproducibility Testing: Multiple runs on identical datasets verified that the platform produced stable, reproducible risk scores and compliance assessments across executions.
- User Evaluation. We conducted an evaluation study with eight participants, during which they used the platform to examine various cryptocurrency projects, including two experts from regulatory authorities with compliance/risk-analysis exposure. The study did not include a controlled task-performance experiment or statistical hypothesis testing; it relied on questionnaire-based usability feedback only. Accordingly, we do not claim statistical evidence about user trust or system accuracy in real-world regulatory settings; such claims require larger, regulator-led studies with ground-truth outcomes and formal hypothesis testing. Feedback was collected using structured questionnaires administered in two stages:
- Pre-Test Questionnaire: Recorded participants’ baseline views, their prior familiarity with crypto-assets, and their expectations.
- Post-Test Questionnaire: Evaluated participants’ direct interaction with the platform, with emphasis on usability, clarity of the user interface, perceived reliability of results, the practical value of the risk score, and overall user satisfaction.
5. Results
5.1. Scalability and Performance
5.2. User Evaluation
- Usability. The platform was rated positively for usability: six of eight participants rated “ease of use” at 4 or 5 (out of 5), and five of eight rated “clarity and intuitiveness of the user interface” at 4 or 5 (remaining ratings were 3). These counts are reported to avoid overstating precision from a small sample.
- Analytical Outputs. Core analytical outputs were perceived as effective and valuable. The “usefulness of the risk score” (from the Heuristic Agent) was rated 4 or 5 by six of eight users. More significantly, the “relevance and actionability of the insights”, that is, the specific warning indicators and compliance gaps identified, was rated 4 or 5 by seven of eight participants (three rated it a perfect 5). Both regulatory-authority participants provided domain-grounded qualitative feedback, requesting deeper exportable audit trails and a clearer separation between heuristic narrative cues and formal compliance rationales in borderline cases.
- Overall Performance. System reliability was rated 4 or 5 by six of eight participants, and performance speed was rated 5 by three of eight (six of eight rated speed 4 or 5). Consequently, overall satisfaction was high (six of eight rated 4 or 5), and six of eight participants indicated they would recommend the platform (rating 4 or 5). We stress that no inferential statistical test was performed, and no claim of statistical significance is made; the ratings are reported descriptively to document early user reactions, not to establish evidence of effectiveness. These findings suggest that the platform can provide comprehensible, actionable risk assessments valued by users across varying levels of technical expertise. However, we emphasize that the sample of eight participants represents a significant limitation: the small cohort precludes statistical inference, and the participant composition (researchers, students, and only two experts from regulatory authorities) remains only partially aligned with the operational perspectives of regulatory practitioners and compliance professionals. These results should therefore be interpreted as preliminary, exploratory evidence of usability rather than definitive validation of effectiveness in operational regulatory settings.
- Controlled Validation Experiment. To complement subjective user feedback with an objective detection capability assessment, we conducted a small-scale sanity check. Within a corpus of 50 real crypto-asset projects, we artificially introduced four projects designed to exhibit overt fraudulent patterns (e.g., anonymous teams, unrealistic return promises, missing disclosures, impersonation tactics). All four artificially fraudulent projects were correctly identified by the platform, receiving high overall alert scores (), while the legitimate projects in the corpus received appropriately lower risk assessments. Given the intentionally clear nature of these injected cases and the small scale, this test is presented as an initial methodological check rather than a full real-world benchmark. Broader validation with annotated ground-truth datasets, including explicit false-positive/false-negative analyses on borderline projects, is planned as future work.
5.3. Case Study: Illustrative Token Analysis
| Listing 1. Reconciliator alert report for anonymized project. |
![]() |
5.4. Summary of Results
6. Discussion
6.1. Interpretation of Findings
6.2. Comparative Advantages
6.3. Computational Resource Requirements
6.4. Methodological and Theoretical Implications
- LLM Reliability under Constrained Prompting (RQ1). The 95% reproducibility rate demonstrates that structured output schemas combined with role-constrained system prompts can substantially reduce the stochastic variability inherent in LLM-based analysis. This finding contributes empirical evidence to the emerging literature on deterministic LLM deployment in high-stakes domains. Prior work has documented significant output instability in unconstrained LLM applications [41]; our result quantifies the variance reduction achievable through prompt engineering and schema enforcement in a regulatory context, and identifies the residual 5% instability as concentrated at tier boundaries rather than producing dramatic misclassifications.
- Distribution Shift in Fraud Detection (RQ2). The 4.78 percentage-point accuracy drop from in-source (98.23%) to cross-source (93.45%) evaluation constitutes a substantive empirical finding about label-collection bias in crypto-fraud datasets. Community-reported labels (ChainAbuse) and expert-curated labels (TokenScout) encode systematically different fraud typology distributions, labeling granularity, and class-balance profiles. This gap provides quantitative evidence that fraud-detection models cannot be assumed to generalize across data sources without recalibration—a finding with direct implications for any deployed RegTech system that relies on heterogeneous training corpora. We emphasize this result as a caution against reporting only in-source accuracy, which can significantly overestimate real-world performance.
- Off-Chain/On-Chain Complementarity (RQ3). The reconciliation architecture demonstrates that documentary risk signals (heuristic and compliance scores) and on-chain behavioural signals provide non-redundant evidence: the Reconciliator Agent surfaces cases where these modalities diverge, enabling analysts to focus investigative effort on projects exhibiting conflicting signals. The case study illustrates a scenario where strong on-chain fraud indicators () coexist with partial protective documentary signals (verified source code, market listing), producing a nuanced alert that neither modality would generate alone. To provide preliminary quantitative support, we computed pairwise agreement statistics across the 112 projects (of 150) for which all three component scores were available. The Spearman rank correlation between the aggregated off-chain signal and the on-chain score was (), indicating a moderate positive but far from redundant association. In 31 projects (27.7%), the two signals diverged by more than 0.3 on the normalized scale. Of these, 19 exhibited elevated off-chain risk coupled with low on-chain scores (, ), consistent with documentary red flags preceding on-chain manifestation—precisely the proactive detection scenario motivating this work. The remaining 12 displayed the opposite pattern (, ), indicating projects whose professionally crafted disclosures masked anomalous transactional behaviour detectable only through on-chain analysis. These divergence cases illustrate concretely that the two modalities provide non-redundant evidence, and that their integration surfaces investigative leads that neither channel would generate independently. While a formal information-theoretic quantification of complementarity remains future work, these descriptive statistics establish an empirical basis for multi-modal regulatory risk assessment.
6.5. Limitations
6.6. Future Research Directions
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Extended MiCAR Tables
| Asset Types | Flags |
|---|---|
| SECURITY | regulated_as_security = true represents_equity = true represents_debt = true has_capital_rights = true investment_promise = true dividend_like = true security_language = true rights_transferable = true |
| EMT (E-Money Token) | redeemable_in_fiat = true backed_by_assets = false audited_reserves = true utility_function = false |
| ART (Asset-Referenced Token) | backed_by_assets = true |
| OTHER | utility_function = true governance_function = true |
| NON_MICAR | nft_unique = true investment_promise = true whitepaper_present = false |
| NON_CLASSIFIABLE | No crypto indicators detected |
| Flag | Prompt (Significance) |
|---|---|
| regulated_as_security | The asset is classified under regulatory frameworks governing securities, meaning it must comply with legal requirements about investor protection, disclosure, and reporting obligations. |
| represents_equity | The asset confers ownership rights or a share in the profits of an underlying entity, resembling traditional equity instruments such as shares or stock. |
| represents_debt | The asset represents a financial obligation of an issuer to repay a debt, akin to traditional debt securities such as bonds or promissory notes. |
| has_capital_rights | The asset grants the holder certain rights over the capital structure of an entity, including claims to dividends, profits, or liquidation proceeds, similar to ownership stakes. |
| investment_promise | The asset is marketed with the promise of financial returns, suggesting an investment opportunity. This implies potential regulatory scrutiny for compliance with securities laws. |
| dividend_like | The asset offers returns or benefits similar to dividends, which are typically distributed by equity holders of a company, signaling investment-like characteristics. |
| security_language | The marketing or contractual terms of the asset use terminology commonly associated with securities, such as “shares,” “equity,” or “interest,” which may trigger regulatory requirements for registration and oversight. |
| rights_transferable | The asset can be freely transferred, often implying liquidity and tradability, similar to financial instruments that are exchanged in secondary markets. |
| redeemable_in_fiat | The asset can be converted or redeemed for a fiat currency, providing a clear value exchange mechanism, which is typical for stablecoins or other forms of currency-backed assets. |
| daily_redeemability | The asset can be redeemed or exchanged daily, providing liquidity and flexibility to investors, which is a critical characteristic for money market instruments. |
| reserve_assets_held | The issuer holds a reserve of assets backing the issued tokens or units, providing security to investors by ensuring that tangible assets, similar to collateralization in financial markets, back the asset. |
| reserve_audited | The reserve assets held by the issuer are subject to independent audits, enhancing transparency and trust by confirming that the issuer maintains sufficient reserves to back the value of the asset. |
| redemption_policy_clear | The asset has a clearly defined process for redemption, ensuring that investors can easily exchange or liquidate their holdings, similar to the redemption terms for traditional securities. |
| whitepaper_present | The asset has a formal whitepaper or investment prospectus, which is critical for providing transparency and detailed information about the asset’s structure, risks, and potential returns. |
| Asset Types | Requirements |
|---|---|
| EMT (E-Money Token) | whitepaper_present, risk_factors_disclosed, issuer_identified, disclaimers_present, kyc_aml_controls, marketing_consistent, redeemable_in_fiat, daily_redeemability, reserve_assets_held, reserves_audited, safeguarding_mechanism, redemption_policy_clear |
| ART (Asset-Referenced Token) | whitepaper_present, risk_factors_disclosed, issuer_identified, disclaimers_present, kyc_aml_controls, marketing_consistent, asset_backing_disclosed, valuation_method_disclosed, reserve_policy_clear, redemption_mechanism_disclosed, governance_arrangements_disclosed |
| SECURITY | whitepaper_present, risk_factors_disclosed, issuer_identified, disclaimers_present, kyc_aml_controls, marketing_consistent, prospectus_present, registered_with_authority, investor_protection_mechanisms |
| OTHER | whitepaper_present, risk_factors_disclosed, issuer_identified, disclaimers_present, kyc_aml_controls, marketing_consistent |
| Requirements | Definition |
|---|---|
| whitepaper_present | The whitepaper represents a foundational document providing a comprehensive disclosure of the asset’s structure, purpose, and operational details. It must be officially registered with the competent regulatory authority, ensuring transparency and accountability in accordance with regulatory standards. |
| risk_factors_disclosed | The whitepaper must include an explicit disclosure of the potential risks associated with the asset, ensuring that investors are fully informed of the possible financial, operational, and regulatory risks inherent in the investment. |
| issuer_identified | The identity of the issuer must be clearly stated, with sufficient details to establish the legitimacy and accountability of the party responsible for the asset. This is essential for investor protection and regulatory compliance. |
| disclaimers_present | A legal disclaimer outlining the limitations of liability, investor responsibilities, and risk factors must be present in the whitepaper or related documentation. This ensures that investors are aware of the legal framework and risks associated with the asset. |
| kyc_aml_controls | The project must implement a robust Know Your Customer (KYC) and Anti-Money Laundering (AML) process to verify participants’ identities and prevent illegal activities such as money laundering and fraud. This is a critical compliance measure to adhere to international financial regulations. |
| marketing_consistent | All marketing and promotional activities must be consistent with the information disclosed in the whitepaper, ensuring that the asset is marketed truthfully and transparently to potential investors. Misleading claims or discrepancies in marketing communications can lead to regulatory sanctions. |
| redeemable_in_fiat | The asset must be convertible into fiat currency, ensuring liquidity and marketability. This requirement establishes the asset’s potential for real-world value realization and its compliance with financial market regulations. |
| daily_redeemability | The asset should allow daily redemption, providing liquidity to users and facilitating real-time market transactions. This enhances the asset’s usability and ensures compliance with liquidity requirements. |
| reserve_assets_held | The project must hold reserve assets that substantiate the value of the issued tokens. These reserves act as a financial safeguard, ensuring the asset is backed by tangible financial resources and mitigating the risk of a “Rug Pull” or insolvency. |
| reserves_audited | The reserve assets held by the issuer are subject to independent audits, enhancing transparency and trust by confirming that the issuer maintains sufficient reserves to back the value of the asset. |
| safeguarding_mechanism | The project must have mechanisms in place to safeguard participants’ assets, including protections against fraud, hacking, and the misappropriation of funds. This is a key element in protecting investor interests and ensuring the project’s financial stability. |
| redemption_policy_clear | The asset’s redemption policy must be clearly articulated, outlining the specific procedures and conditions under which the asset can be exchanged for fiat or other assets. This ensures that investors have a clear understanding of the redemption process, thus minimizing potential disputes. |
| asset_backing_disclosed | The underlying assets or collateral backing the issued token must be disclosed, providing investors with transparency into the asset’s value and financial stability. This disclosure is critical to understanding the asset’s intrinsic value. |
| valuation_method_disclosed | The method of valuing the asset must be disclosed, including any models, algorithms, or financial metrics used to determine its worth. This ensures that investors can assess the asset’s value with confidence and in accordance with industry standards. |
| reserve_policy_clear | A clear and comprehensive reserve policy must be in place, detailing how reserves are managed, accessed, and used to support the asset’s value. This ensures that reserves are properly allocated and utilized, preventing mismanagement. |
| redemption_mechanism_disclosed | The mechanism by which investors can redeem their assets must be disclosed, ensuring that there are clear and efficient procedures for converting the asset into fiat or other tokens. This ensures that the redemption process aligns with investor expectations and legal requirements. |
| governance_arrangements_disclosed | The governance structure of the project must be disclosed, providing details about how decisions are made, how power is distributed, and the roles of key stakeholders. This transparency helps establish accountability and ensures that the project operates in the best interest of its investors. |
| prospectus_present | A formal prospectus must be provided, containing detailed financial information about the asset, its risks, and its market potential. This document serves as an essential disclosure for investors, providing them with all the necessary information to make informed decisions. |
| registered_with_authority | The asset must be officially registered with the relevant regulatory authority, ensuring that it meets the required legal and financial standards. This provides investors with assurance that the asset is legitimate and subject to regulatory oversight. |
| investor_protection_mechanisms | The project must implement mechanisms designed to protect investors, including safeguards against fraud, risk mitigation measures, and avenues for dispute resolution. These mechanisms are essential for fostering investor confidence and maintaining market integrity. |
Appendix B. Heuristic Agent Prompts and Schemas
| Listing A1. System prompt for the Heuristic Agent (full text). |
![]() |
| Listing A2. Pydantic models defining the structured output for the Heuristic Agent’s analysis. |
![]() |
Appendix C. Compliance Agent Prompts and Schemas
Appendix C.1. Classifier Phase I: Asset-Flag Extraction
| Listing A3. System prompt for Asset-Flag extraction. |
![]() |
| Listing A4. Human prompt for Asset-Flag extraction. |
![]() |
| Listing A5. Pydantic schema for AssetFlags used in Phase I. |
![]() ![]() |
Appendix C.2. Classifier Phase II: MiCAR Compliance Assessment
| Listing A6. System prompt for MiCAR compliance assessment. |
![]() |
| Listing A7. Human prompt for MiCAR compliance assessment. |
![]() |
| Listing A8. Pydantic schemas for ComplianceFlags and ComplianceResult used in Phase II. |
![]() ![]() ![]() |
Appendix D. User Feedback Questionnaires
| Question | Possible Answers |
|---|---|
| 1. Which best describes your role? (Select one) | A. Regulator/Compliance professional; B. Investor; C. Researcher/Academic; D. Developer/Engineer; E. Student/Learner; F. Other |
| 2. How familiar are you with cryptocurrencies? (Select one) | A. Not familiar; B. Beginner; C. Intermediate; D. Advanced |
| 3. What do you aim to achieve by using the platform? (Multiple selection) | A. Identify risky or fraudulent crypto-asset projects; B. Verify regulatory compliance of projects; C. Understand smart contracts and transactions; D. Learn about crypto-assets more safely; E. Other |
| 4. What aspect is most important to you when assessing a crypto-asset project? (Select one) | A. Clear and reliable information; B. Easy-to-understand results; C. Quick overview; D. Detailed explanations; E. Not sure yet |
| 5. How confident are you in your understanding of cryptocurrency risks and regulations? (Select one) | A. Very confident; B. Somewhat confident; C. Not confident |
| 6. What do you hope to achieve by using the platform? | Open-ended |
| 7. How do you choose which crypto-asset projects to invest in? | Open-ended |
| Question | Possible Answers |
|---|---|
| 1. Which best describes your role? (Select one) | A. Regulator/Compliance professional; B. Investor; C. Researcher/Academic; D. Developer/Engineer; E. Student/Learner; F. Other |
| 2. How familiar are you with cryptocurrencies? (Select one) | A. Not familiar; B. Beginner; C. Intermediate; D. Advanced |
| 3. How easy was it to start using the platform? (Select one) | 0. Very difficult; 1; 2; 3; 4; 5. Very easy |
| 4. How clear and intuitive did you find the user interface? (Select one) | 0. Not at all intuitive; 1; 2; 3; 4; 5. Very intuitive |
| 5. How accurately did the system detect fraudulent or risky projects? | 0. Not accurate at all; 1; 2; 3; 4; 5. Very accurate |
| 6. How useful was the risk score in evaluating a project’s reliability or compliance? | 0. Not useful; 1; 2; 3; 4; 5. Very useful |
| 7. How relevant and actionable were the insights provided by the system? | 0. Not relevant/actionable; 1; 2; 3; 4; 5. Very relevant/actionable |
| 8. To what extent did the analysis influence your decision-making? | 0. No influence; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10. Strong influence |
| 9. How reliable was the system during your analysis? | 0. Not reliable at all; 1; 2; 3; 4; 5. Very reliable |
| 10. How would you rate the system’s performance speed? | 0. Very slow/frequent delays; 1; 2; 3; 4; 5. Very fast/no delays |
| 11. How satisfied are you with the software overall? (Select one) | 0. Very unsatisfied; 1; 2; 3; 4; 5. Very satisfied |
| 12. How likely are you to recommend the platform to a colleague? | 0. Not likely; 1; 2; 3; 4; 5. Extremely likely |
| 13. What aspects of the platform did you find most valuable or effective? | Open-ended |
| 14. What improvements or additional features would you recommend for future versions? | Open-ended |
References
- Kasula, V.K.; Alshboul, A. Leveraging Advanced Technologies to Enhance Public Awareness and Mitigate Risks of Cryptocurrency Scams: A Qualitative Analysis. In Demystifying AI and ML for Cyber–Threat Intelligence; Springer: Cham, Switzerland, 2025; pp. 359–369. [Google Scholar]
- Meiklejohn, S.; Pomarole, M.; Jordan, G.; Levchenko, K.; McCoy, D.; Voelker, G.M.; Savage, S. A fistful of bitcoins: Characterizing payments among men with no names. In Proceedings of the 2013 Conference on Internet Measurement Conference, Barcelona, Spain, 23–25 October 2013; pp. 127–140. [Google Scholar]
- Bai, Z.; Wang, P. From medium to risk factor: How crypto investment elevates fraud victimization. Financ. Res. Lett. 2025, 86, 108592. [Google Scholar] [CrossRef]
- Federal Bureau of Investigation. 2024 Internet Crime Report. 2024. Available online: https://www.ic3.gov/AnnualReport/Reports/2024_IC3Report.pdf (accessed on 20 May 2025).
- Arner, D.W.; Barberis, J.; Buckey, R.P. FinTech, RegTech, and the reconceptualization of financial regulation. Nw. J. Int’l L. Bus. 2016, 37, 371. [Google Scholar]
- Gomber, P.; Kauffman, R.J.; Parker, C.; Weber, B.W. On the fintech revolution: Interpreting the forces of innovation, disruption, and transformation in financial services. J. Manag. Inf. Syst. 2018, 35, 220–265. [Google Scholar] [CrossRef]
- Balaji, P.G.; Srinivasan, D. An introduction to multi-agent systems. In Innovations in Multi-Agent Systems and Applications-1; Springer: Berlin/Heidelberg, Germany, 2010; pp. 1–27. [Google Scholar]
- Trerotola, M.; Calvaresi, D. AI-Driven Multi-Agent Systems for Automated Regulatory Analysis of Crypto Projects. In Proceedings of the 2025 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE); IEEE: Piscataway, NJ, USA, 2025; pp. 735–740. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Katz, D.M.; Bommarito, M.J.; Gao, S.; Arredondo, P. Gpt-4 passes the bar exam. Philos. Trans. R. Soc. A 2024, 382, 20230254. [Google Scholar] [CrossRef]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
- White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv 2023, arXiv:2302.11382. [Google Scholar] [CrossRef]
- Van der Hoek, W.; Wooldridge, M. Multi-agent systems. Found. Artif. Intell. 2008, 3, 887–928. [Google Scholar]
- Toma, A.M.; Cerchiello, P. Initial coin offerings: Risk or opportunity? Front. Artif. Intell. 2020, 3, 18. [Google Scholar] [CrossRef]
- Karimov, B.; Wójcik, P. Identification of scams in initial coin offerings with machine learning. Front. Artif. Intell. 2021, 4, 718450. [Google Scholar] [CrossRef] [PubMed]
- Mazorra, B.; Adan, V.; Daza, V. Do not rug on me: Leveraging machine learning techniques for automated scam detection. Mathematics 2022, 10, 949. [Google Scholar] [CrossRef]
- Liang, R.; Chen, J.; Wu, C.; He, K.; Wu, Y.; Sun, W.; Du, R.; Zhao, Q.; Liu, Y. Towards effective detection of ponzi schemes on ethereum with contract runtime behavior graph. ACM Trans. Softw. Eng. Methodol. 2025, 34, 1–32. [Google Scholar] [CrossRef]
- Pocher, N.; Zichichi, M.; Merizzi, F.; Shafiq, M.Z.; Ferretti, S. Detecting anomalous cryptocurrency transactions: An AML/CFT application of machine learning-based forensics. Electron. Mark. 2023, 33, 37. [Google Scholar] [CrossRef]
- Luo, B.; Zhang, Z.; Wang, Q.; Ke, A.; Lu, S.; He, B. AI-powered fraud detection in decentralized finance: A project life cycle perspective. ACM Comput. Surv. 2024, 57, 1–38. [Google Scholar] [CrossRef]
- Chalkidis, I.; Fergadiotis, M.; Malakasiotis, P.; Aletras, N.; Androutsopoulos, I. LEGAL-BERT: The Muppets Straight Out of Law School. arXiv 2020, arXiv:2010.02559. [Google Scholar] [CrossRef]
- Guha, N.; Nyarko, J.; Ho, D.E.; Re, C.; Chilton, A.; Narayana, A.; Chohlas-Wood, A.; Peters, A.; Waldon, B.; Rockmore, D.N.; et al. LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models. arXiv 2023, arXiv:2308.11462. [Google Scholar] [CrossRef]
- Trerotola, M.; Calvaresi, D. Enhancing Blockchain Transaction Tracking: A Systematic Review of DLT-Based Financial Systems. In Proceedings of the 2025 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE); IEEE: Piscataway, NJ, USA, 2025; pp. 747–752. [Google Scholar]
- Monamo, P.; Marivate, V.; Twala, B. Unsupervised learning for robust Bitcoin fraud detection. In Proceedings of the 2016 Information Security for South Africa (ISSA); IEEE: Piscataway, NJ, USA, 2016; pp. 129–134. [Google Scholar]
- Chang, T.H.; Svetinovic, D. Improving bitcoin ownership identification using transaction patterns analysis. IEEE Trans. Syst. Man, Cybern. Syst. 2018, 50, 9–20. [Google Scholar] [CrossRef]
- Chen, B.; Wei, F.; Gu, C. Bitcoin theft detection based on supervised machine learning algorithms. Secur. Commun. Netw. 2021, 2021, 6643763. [Google Scholar] [CrossRef]
- Iscan, C.; Kumas, O.; Akbulut, F.P.; Akbulut, A. Wallet-based transaction fraud prevention through LightGBM with the focus on minimizing false alarms. IEEE Access 2023, 11, 131465–131474. [Google Scholar] [CrossRef]
- Weber, M.; Domeniconi, G.; Chen, J.; Weidele, D.K.I.; Bellei, C.; Robinson, T.; Leiserson, C.E. Anti-money laundering in bitcoin: Experimenting with graph convolutional networks for financial forensics. arXiv 2019, arXiv:1908.02591. [Google Scholar] [CrossRef]
- Nerurkar, P.; Bhirud, S.; Patel, D.; Ludinard, R.; Busnel, Y.; Kumari, S. Supervised learning model for identifying illegal activities in Bitcoin. Appl. Intell. 2021, 51, 3824–3843. [Google Scholar] [CrossRef]
- Northcutt, C.; Jiang, L.; Chuang, I. Confident learning: Estimating uncertainty in dataset labels. J. Artif. Intell. Res. 2021, 70, 1373–1411. [Google Scholar] [CrossRef]
- Wu, C.; Chen, J.; Zhao, Z.; He, K.; Xu, G.; Wu, Y.; Wang, H.; Li, H.; Liu, Y.; Xiang, Y. Tokenscout: Early detection of ethereum scam tokens via temporal graph learning. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, Salt Lake City, UT, USA, 14–18 October 2024; pp. 956–970. [Google Scholar]
- Chalkidis, I.; Jana, A.; Hartung, D.; Bommarito, M.; Androutsopoulos, I.; Katz, D.; Aletras, N. LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 4310–4330. [Google Scholar] [CrossRef]
- Cui, J.; Ning, M.; Li, Z.; Chen, B.; Yan, Y.; Li, H.; Ling, B.; Tian, Y.; Yuan, L. Chatlaw: A multi-agent collaborative legal assistant with knowledge graph enhanced mixture-of-experts large language model. arXiv 2023, arXiv:2306.16092. [Google Scholar]
- Zetzsche, D.A.; Annunziata, F.; Arner, D.W.; Buckley, R.P. The Markets in Crypto-Assets regulation (MiCA) and the EU digital finance strategy. Cap. Mark. Law J. 2021, 16, 203–225. [Google Scholar] [CrossRef]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Adadi, A.; Berrada, M. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
- Devarajulu, V.S.; Kanipakam, S.; Addula, S.R.; Venkata Krishna, P. Embedding Accountability in the AI Lifecycle for Critical Finance Applications. In Proceedings of the 2025 Cyber Awareness and Research Symposium (CARS); IEEE: Piscataway, NJ, USA, 2025; pp. 1–8. [Google Scholar]
- Ramírez, S. FastAPI: Modern Web Framework for Building APIs with Python 3.7+. 2018. Available online: https://fastapi.tiangolo.com/ (accessed on 20 May 2025).
- Vercel. Next.js: The React Framework for Production. 2016. Available online: https://nextjs.org/ (accessed on 20 May 2025).
- Microsoft. Playwright: Fast and Reliable End-to-End Testing for Modern Web Apps. 2020. Available online: https://playwright.dev/ (accessed on 20 May 2025).
- Palanca, J.; Terrasa, A.; Julian, V.; Carrascosa, C. Spade 3: Supporting the new generation of multi-agent systems. IEEE Access 2020, 8, 182537–182549. [Google Scholar] [CrossRef]
- Shen, Y.; Heacock, L.; Elias, J.; Hentel, K.D.; Reig, B.; Shih, G.; Moy, L. ChatGPT and other large language models are double-edged swords. Radiology 2023, 307, e230163. [Google Scholar] [CrossRef] [PubMed]
- Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
- Sheth, A.; Roy, K.; Gaur, M. Neurosymbolic artificial intelligence (why, what, and how). IEEE Intell. Syst. 2023, 38, 56–62. [Google Scholar] [CrossRef]


| Functionality/Platform | Our Platform | Chainalysis | Elliptic | Token Sniffer |
|---|---|---|---|---|
| Monitor New Tokens | ✓ | × | × | ✓ |
| Regulatory Compliance (MiCAR) | ✓ | × | × | × |
| Focus on Explainability (LLM) | ✓ | × | × | ✓ |
| Dimension | Our System | Toma & Cerch. [14] | Karimov & W. [15] | Mazorra et al. [16] | Pocher et al. [18] | Liang et al. [17] | Luo et al. [19] |
|---|---|---|---|---|---|---|---|
| Data sources | Both | Off-chain | Off-chain | On-chain | On-chain | On-chain | Survey |
| Reg. framework | MiCAR | × | × | × | AML/CFT * | × | × |
| LLM integration | ✓ | × | × | × | × | × | × |
| Multi-agent arch. | 7 agents | × | × | × | × | × | × |
| Explainability | SHAP + CIU + rules | × | SHAP+PDP | × | × | CRBG † | N/A |
| Output type | Struct. alert | Stat. profile | Classification | Rug-pull flag | Anomaly score | Ponzi/benign | Taxonomy |
| Scale | 227k+ tokens | 196 ICOs | ∼300 ICOs | 20k+ tokens | BTC tx graph | ETH contracts | Survey |
| Validation Scenario | Balanced Accuracy | Scam Recall | False Positive Rate |
|---|---|---|---|
| Intra-source (ChainAbuse) | 98.23% | 96.91% | 0.45% |
| Cross-source (TokenScout) | 93.45% | 89.12% | 2.31% |
| Combined datasets | 94.67% | 91.38% | 1.89% |
| Feature | CI | CU | Interpretation |
|---|---|---|---|
| eth_max_daily_tx | 0.993 | 1.0 | Transaction burstiness |
| eth_tx_per_active_week | 0.940 | 1.0 | High-frequency trading |
| eth_katz_centrality | 0.893 | 1.0 | Network centralization |
| closeness_centrality | 0.841 | 1.0 | Isolated clusters |
| eth_eigenvector_centrality | 0.729 | 1.0 | Influential node control |
| katz_centrality | 0.701 | 1.0 | Value flow concentration |
| eth_tx_per_active_day | 0.493 | 1.0 | Daily activity spikes |
| counterparty_gini | 0.481 | 1.0 | Counterparty inequality |
| eth_daily_tx_std | 0.452 | 1.0 | Transaction variance |
| gas_used_cv | 0.437 | 1.0 | Gas usage variability |
| Rule Condition | CI | Confidence |
|---|---|---|
| eth_max_daily_tx | 0.993 | 0.96 |
| eth_tx_per_active_week | 0.940 | 0.94 |
| closeness_centrality | 0.841 | 0.93 |
| counterparty_gini | 0.481 | 0.91 |
| katz_centrality | 0.701 | 0.93 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Trerotola, M.; Parente, M.; Calvaresi, D. A Hybrid Multi-Agent System for Early Scam Detection in Crypto-Assets. Appl. Sci. 2026, 16, 3122. https://doi.org/10.3390/app16073122
Trerotola M, Parente M, Calvaresi D. A Hybrid Multi-Agent System for Early Scam Detection in Crypto-Assets. Applied Sciences. 2026; 16(7):3122. https://doi.org/10.3390/app16073122
Chicago/Turabian StyleTrerotola, Mario, Mimmo Parente, and Davide Calvaresi. 2026. "A Hybrid Multi-Agent System for Early Scam Detection in Crypto-Assets" Applied Sciences 16, no. 7: 3122. https://doi.org/10.3390/app16073122
APA StyleTrerotola, M., Parente, M., & Calvaresi, D. (2026). A Hybrid Multi-Agent System for Early Scam Detection in Crypto-Assets. Applied Sciences, 16(7), 3122. https://doi.org/10.3390/app16073122













