1. Introduction
Blockchain technology has fundamentally transformed financial market infrastructure by enabling decentralized transaction processing and immutable record-keeping without traditional intermediaries [
1,
2]. Cryptocurrency markets represent the most prominent real-world application of blockchain technology, creating unprecedented opportunities for algorithmic trading and quantitative finance research [
3,
4]. The continuous 24/7 trading environment, combined with high-frequency data availability and extreme price volatility, presents unique challenges for predictive modeling that distinguish blockchain-based assets from conventional financial instruments.
The emergence of cryptocurrency markets has introduced novel market microstructure dynamics that traditional financial models struggle to address effectively. Unlike equity markets with established market makers and regulatory oversight, blockchain-based trading operates through decentralized protocols where price discovery occurs through peer-to-peer interactions across global participants. This decentralized structure creates distinctive patterns in order book dynamics, liquidity provision, and volatility clustering that require specialized analytical approaches.
Recent developments in cryptocurrency market analysis reveal the inadequacy of conventional three-class prediction frameworks that attempt to simultaneously forecast direction and trading signals [
5,
6]. Traditional approaches classify price movements as up, down, or no-trade, forcing models to learn both directional patterns and execution timing within a single optimization objective. This conflation of prediction and execution decisions often results in suboptimal performance in volatile cryptocurrency markets where prediction confidence varies significantly across different market conditions.
The proliferation of high-frequency blockchain data provides unprecedented opportunities for microstructure-based prediction models. Order book snapshots, transaction flows, and network activity metrics offer granular insights into market dynamics that were previously unavailable in traditional financial markets. However, the integration of multi-scale data sources spanning from the minute-level microstructure to daily macroeconomic indicators presents complex feature engineering challenges that require sophisticated analytical frameworks.
Machine learning applications in cryptocurrency markets have demonstrated promising results but face critical limitations in practical deployment. Existing approaches often optimize for prediction accuracy without the adequate consideration of trading costs, execution feasibility, or risk management requirements. The extreme volatility characteristic of blockchain-based assets necessitates selective execution strategies that balance prediction accuracy with prudent risk exposure.
Contemporary research in cryptocurrency prediction focuses primarily on deep learning architectures, technical indicator combinations, and sentiment analysis integration. Izadi and Hajizadeh (2025) [
7] demonstrate 57% accuracy in Bitcoin trend prediction using transformer-based models, while Liu and Huang (2024) [
8] achieve profitable results through technical indicator integration with LSTM networks. However, these approaches typically evaluate performance across all time periods without consideration for prediction confidence or selective execution strategies.
The growing institutional adoption of cryptocurrencies has elevated the importance of robust analytical frameworks for blockchain-based financial markets. Ballis et al. (2025) [
3] document Bitcoin’s evolution into a digital safe haven during global crises, while Bonaparte (2022) [
9] provides evidence that households increasingly view cryptocurrencies as long-term productive assets rather than speculative instruments. This institutional maturation demands more sophisticated trading approaches that can adapt to varying market conditions while maintaining consistent risk-adjusted returns.
Market efficiency research in cryptocurrency markets reveals complex relationships between liquidity, volatility, and price discovery mechanisms. Bouteska et al. (2025) [
10] demonstrate that market efficiency varies significantly across different cryptocurrencies and time periods, with efficiency improvements associated with increased liquidity and reduced volatility. These findings suggest that prediction models should incorporate market condition indicators to optimize performance across different efficiency regimes.
The interconnected nature of cryptocurrency markets creates systemic risk transmission mechanisms that traditional asset classes do not exhibit. Franco and Laurini (2025) [
11] quantify systemic risk using high-frequency data, revealing strong interconnectedness among major cryptocurrencies with Bitcoin and Ethereum serving as primary risk transmission sources. This systemic interconnectedness supports the development of confidence-based execution strategies that can adapt to varying levels of market stress.
This research introduces a confidence-threshold framework that addresses fundamental limitations in existing cryptocurrency prediction approaches by separating directional forecasting from execution decisions. The methodology trains binary classifiers exclusively on directional movements while using prediction confidence scores to determine trade execution. This separation enables selective trading strategies that execute only high-conviction predictions, potentially improving risk-adjusted returns in volatile cryptocurrency markets.
The confidence-based approach offers several advantages over traditional prediction methods. First, it eliminates the need for arbitrary no-trade class definitions that vary across market conditions and time horizons. Second, it provides explicit uncertainty quantification that enables adaptive risk management. Third, it accommodates varying prediction horizons and deadband thresholds without requiring model retraining.
The research contributes to blockchain technology applications in financial markets through three primary innovations. First, it demonstrates the effectiveness of confidence-based execution for cryptocurrency trading, achieving 82.68% direction accuracy with 151.11-basis point average profit per trade. Second, it validates the dominance of order book features over traditional technical indicators for short-term cryptocurrency prediction. Third, it provides practical guidelines for implementing selective trading strategies in volatile blockchain-based markets.
The findings advance the understanding of machine learning applications in cryptocurrency markets while addressing practical deployment considerations that existing research often overlooks. The confidence-threshold framework bridges academic research and industry practice by providing actionable trading strategies that account for transaction costs, execution constraints, and risk management requirements.
This study tests three core hypotheses about cryptocurrency price prediction:
First, we hypothesize that separating directional prediction from execution decisions through confidence thresholding improves risk-adjusted returns compared to traditional all-trade classification. Conventional approaches force models to learn both direction and trading signals simultaneously, potentially degrading performance in high-uncertainty periods.
Second, we hypothesize that order book microstructure features provide superior predictive power for short-term cryptocurrency price movements compared to traditional technical indicators. The transparency of blockchain transactions and a decentralized market structure may make limit order book dynamics more informative than in traditional markets.
Third, we hypothesize that prediction confidence correlates with actual accuracy across cryptocurrency pairs, enabling selective execution strategies that trade coverage for profitability. This relationship would validate confidence scores as practical risk management signals.
Our research questions follow directly the following:
RQ1: Does confidence-based selective execution outperform constant-threshold classification in cryptocurrency markets when evaluated on risk-adjusted returns?
RQ2: What proportion of predictive importance comes from order book microstructure versus traditional price-based features across different cryptocurrency market segments?
RQ3: How does the coverage–accuracy trade-off vary across market capitalizations, and what does this reveal about signal quality in different liquidity regimes?
We address these questions through systematic experimentation on 11 cryptocurrency pairs spanning diverse market conditions over 12 months.
The remainder of this paper is organized as follows.
Section 2 reviews related work in blockchain-based financial markets and machine learning prediction methodologies.
Section 3 describes the confidence-threshold framework and feature engineering pipeline.
Section 4 presents the experimental design and dataset specifications.
Section 5 analyzes results across multiple performance dimensions.
Section 6 discusses implications for blockchain market analytics and practical implementation considerations.
Section 7 concludes with future research directions and broader applications of confidence-based prediction in blockchain technology contexts.
3. Methodology
The confidence-threshold framework integrates multi-scale market data to predict cryptocurrency price movements through a binary classification approach coupled with execution decision logic. This methodology addresses the inherent challenges of cryptocurrency market prediction by separating directional forecasting from trading execution decisions. The approach leverages both macroeconomic indicators derived from daily price data and microstructure information extracted from high-frequency order book snapshots.
3.1. Data Sources and Preprocessing
The dataset comprises two complementary data streams spanning October 2023 to October 2024. The macro dataset contains daily OHLCV data for 100 cryptocurrency pairs covering the period from August 2018 to August 2025, providing 211,679 observations across 38 features. The micro dataset captures minute-level order book snapshots for 11 cryptocurrency pairs, yielding 5,672,947 observations across 264 microstructure features.
Table 2 presents the comprehensive data source specifications and quality metrics for each dataset component.
The macro dataset preprocessing implements quality control through adaptive volatility-based thresholds and temporal consistency checks. Price movements exceeding 20% trigger validation flags, with adjustments based on rolling volatility windows. Temporal gaps exceeding 24 h are identified to ensure complete coverage for indicator calculations.
Micro dataset preprocessing addresses high-frequency order book challenges through symbol normalization, minute-level aggregation via last-observation-carried-forward, and order book structure validation. The system verifies monotonic price ordering, identifies crossed markets, and applies adaptive spread filtering (0.001–200 bps).
Table 3 summarizes the complete preprocessing pipeline.
Figure 1 displays temporal coverage, showing consistent 24/7 data availability with minor exchange maintenance gaps. Peak activity occurs at 07:00 UTC during European-American session overlap.
3.2. Feature Engineering Framework
The pipeline creates a unified 296-dimensional feature space combining temporal, microstructure, and technical indicators (
Figure 2). Technical indicators use proper temporal alignment: moving averages with 5-, 10-, 20-day windows and one-period lagging, RSI with Wilder’s smoothing over 14 days, and volatility via rolling standard deviations (5, 15, 30 days).
Microstructure extraction processes order book snapshots across 50 depth levels, computing volume-weighted spreads, order imbalance ratios, and depth concentration.
Figure 3 shows median spreads of 1.55 basis points with the 95th percentile at 9.14 bps.
Feature correlation analysis (
Figure 4) reveals distinct patterns: asset correlations dominate macro features while microstructure correlations remain localized. RobustScaler handles the dramatic scale variation (0.017 to 95 M) across features.
Figure 5 demonstrates that volatility features achieve the highest mutual information scores.
Temporal alignment combines end-of-day macro features with minute-level micro features using proper lagging to prevent information leakage.
Figure 6 shows intraday patterns with peak activity during European and US sessions, exhibiting systematic spread tightening during high-volume periods.
Table 3 summarizes the data preprocessing pipeline parameters and filtering criteria applied to achieve data quality standards.
3.3. Mathematical Problem Formulation
We formalize the prediction task and execution logic. Given feature vector
at time
, the neural network
estimates directional probabilities. The binary target is
where
is the arithmetic return over horizon
minutes,
basis point filter noise, and
denotes the midpoint price.
The model outputs calibrated probabilities:
Training minimizes binary cross-entropy with balanced class weights:
where
are inverse class frequencies.
Execution occurs when prediction confidence exceeds threshold
:
The threshold
maximizes the validation set expected profit:
Performance evaluation employs coverage (proportion of executed trades), per-trade net profit after 1 bp transaction costs, and risk-adjusted Sharpe ratio. These metrics are defined in
Section 3.4 as Equations (7)–(12).
Figure 7 presents the end-to-end pipeline from raw data to execution decisions.
3.4. Confidence-Based Classification Model and Performance Evaluation Metrics
The confidence-threshold framework employs a binary neural network architecture that separates directional prediction from execution decisions. Unlike traditional three-class approaches that simultaneously predict direction and trading signals, this methodology trains exclusively on directional movements (up versus down) and uses prediction confidence to determine trade execution.
The target variable creation process applies a 10-basis point deadband filter to eliminate noise-driven movements. Price movements exceeding this threshold within the 600 min prediction horizon receive directional labels: y = 1 for upward movements and y = 0 for downward movements. The system excludes no-trade samples during training, focusing the model on clear directional patterns. This filtering process yielded 5,333,774 trade samples from 5,666,347 total observations, representing 94.1% data utilization.
The neural network architecture implements a multi-layer perceptron with 256-128-64 hidden units, ReLU activation functions, and 20% dropout regularization.
Table 4 specifies the complete model configuration and training parameters.
The target variable creation follows standard practice in directional forecasting (Krauss et al., 2020 [
36]). We define the directional label as
where
denotes the arithmetic return from time
to
,
basis points represent the noise-filtering deadband, and h = 600 min defines our prediction horizon. Returns are computed as
where
is the midpoint price at time t, calculated as the average of best bid and best ask. This arithmetic return convention aligns with the short-horizon prediction literature where log approximations introduce negligible differences (Campbell et al., 1997 [
37]).
The execution price assumes immediate fill at the prevailing midpoint, which is conservative for our 10 h horizon but may underestimate slippage in high-frequency scenarios. For production systems, incorporating market impact models (Almgren & Chriss, 2001 [
26]) would be essential.
The confidence threshold optimization process evaluates execution decisions across a grid of confidence values with 0.01 increments. For each threshold, the system executes trades only when . The optimization criterion maximizes expected profit per executed trade after accounting for 1.0-basis point transaction costs.
The evaluation framework employs both classification metrics and trading-specific performance indicators. Classification metrics include direction accuracy, F1 scores, precision, and recall calculated exclusively on executed trades. Trading metrics encompass coverage rates, profit distributions, win rates, and risk-adjusted returns.
Coverage represents the proportion of observations triggering execution:
This selectivity metric controls the accuracy–frequency trade-off inherent in confidence-based systems.
Gross profit per trade follows the standard signed return formulation (Jegadeesh & Titman, 1993 [
38]):
where
is the realized return and
is the predicted direction. Net profit incorporates our assumed transaction cost:
where
basis point represents maker–taker fees on major exchanges (Binance fee schedule 2025 [
28]). This cost assumption is conservative for limit order execution but may underestimate market order costs during volatile periods.
The Sharpe ratio adaptation for per-trade analysis provides risk-adjusted performance assessment:
where
and
represent mean and standard deviation of net profits, and N equals the number of executed trades.
The per-trade Sharpe ratio adapts the classical formulation (Sharpe, 1994 [
39]) to discrete trading events. This metric differs from traditional time-series Sharpe ratios by treating each trade as an independent event rather than continuous portfolio returns.
Temporal validation employs symbol-wise splitting to prevent data leakage. The chronological split allocates 70% for training, 15% for validation, and 15% for testing within each cryptocurrency pair. This approach ensures that future information never influences past predictions while maintaining representative samples across all trading pairs.
Figure 8 presents the rolling correlation analysis between major cryptocurrency pairs (ADA/USDT vs. ETH/USDT), revealing time-varying correlation patterns that justify the symbol-wise validation approach. Correlation values fluctuate between 0.2 and 0.9, indicating regime changes that could bias performance estimates under naive splitting strategies.
Statistical significance testing applies bootstrap resampling with 1000 iterations to generate confidence intervals for key performance metrics. The bootstrap procedure randomly samples executed trades with replacement, computing performance statistics for each sample to estimate distribution parameters.
Cross-asset performance evaluation examines model behavior across different market capitalizations and volatility regimes.
Figure 9 displays the symbol performance comparison, showing systematic differences in prediction accuracy across cryptocurrency pairs. Large-cap assets (BTC, ETH) demonstrate higher accuracy but lower coverage, while mid-cap assets exhibit more balanced coverage–accuracy profiles.
The baseline comparison framework evaluates the confidence-threshold approach against alternative methodologies including buy-and-hold strategies, traditional three-class classification, and constant-threshold binary classification. Performance improvements are measured using both absolute metrics (profit, accuracy) and relative metrics (Sharpe ratio, information ratio).
4. Experimental Design
The experimental framework evaluates the confidence-threshold approach through comprehensive testing on cryptocurrency market data spanning 371 days of continuous trading activity. The experimental design addresses three core research questions: (1) the effectiveness of confidence-based execution versus traditional classification approaches, (2) the optimal threshold calibration for different risk-return profiles, and (3) the generalizability across diverse cryptocurrency market segments.
4.1. Dataset Description
The experimental dataset encompasses 11 cryptocurrency trading pairs selected based on order book data availability and market liquidity criteria.
Table 5 presents the complete dataset specifications and symbol-level statistics.
The dataset spans from 7 October 2023 to 13 October 2024, capturing diverse market conditions including bull market phases, volatility spikes, and consolidation periods. Data coverage exceeds 96% across all symbols, with gaps primarily occurring during scheduled exchange maintenance windows.
Symbol selection criteria prioritize liquid trading pairs with complete microstructure data availability. The chosen pairs represent different market capitalization segments, from Bitcoin (largest) to TRON (smallest in the sample), enabling cross-segment performance analysis. Average daily volumes range from USD 67.8 million (TRX_USDT) to USD 2.85 billion (BTC_USDT), providing heterogeneous liquidity conditions for model evaluation.
Spread characteristics exhibit systematic variation across market capitalizations. Major assets (BTC, ETH) demonstrate tight spreads with median values below 1.5 basis points, while smaller assets show wider spreads exceeding 2.5 basis points. The 95th percentile spread values range from 4.21 basis points (BTC) to 15.43 basis points (TRX), reflecting different market-making conditions.
Temporal distribution analysis reveals consistent 24/7 coverage with minor gaps during exchange maintenance. The dataset captures 53 complete weeks of trading activity, enabling robust statistical analysis across multiple market cycles. Weekend trading patterns differ significantly from weekday patterns, with reduced volume but maintained price discovery functionality.
Data preprocessing eliminated 6600 observations (0.12% of total) due to quality issues including crossed markets, extreme spreads, and timestamp inconsistencies. The preprocessing pipeline maintained 99.88% data retention while ensuring order book structural integrity across all symbols.
4.2. Model Configuration
The neural network implementation employs PyTorch framework optimizations for efficient training on the experimental hardware configuration. The system utilizes an AMD Ryzen 7 7840 HS processor with 64 GB RAM, enabling in-memory processing of the complete dataset without disk I/O bottlenecks.
Training procedures implement early stopping with 10-epoch patience to prevent overfitting. The validation loss monitoring triggers training termination when improvement stagnates, typically occurring between epochs 45 and 65. Model checkpointing saves the best-performing weights based on validation accuracy, ensuring optimal generalization performance.
Hyperparameter optimization focuses on three critical parameters: confidence threshold , learning rate scheduling, and batch size selection. The confidence threshold grid search evaluates 46 values from 0.50 to 0.95 with 0.01 increments. Learning rate optimization applies cosine annealing with warm restarts, starting at 0.001 and minimum decay to 0.0001.
Batch size selection balances computational efficiency with gradient estimation quality. The 4096 sample batch size enables stable gradient computation while fitting within memory constraints. Larger batch sizes (8192) showed marginal accuracy improvements but significantly increased training time from 57 min to 94 min per epoch.
Memory optimization techniques include gradient checkpointing and mixed precision training. These optimizations reduce peak memory usage from 18.3 GB to 12.7 GB while maintaining numerical precision for critical computations. Training completion requires approximately 3.2 h for the full 80-epoch schedule.
Feature preprocessing applies consistent scaling across training, validation, and test splits. RobustScaler parameters fit exclusively on training data and prevent information leakage. The median-based centering handles cryptocurrency price outliers more effectively than mean-based alternatives, reducing sensitivity to extreme market events.
Model calibration employs isotonic regression on validation predictions to improve confidence reliability. The calibration procedure maps raw prediction probabilities to calibrated confidence scores, enhancing the correlation between predicted confidence and actual accuracy. Calibration training uses 10-fold cross-validation within the validation set to prevent overfitting.
Reproducibility protocols fix random seeds across all stochastic processes. NumPy (version 2.3.2, seed = 42), PyTorch (version 2.5.1, seed = 42), and Python random module use identical initialization values. This configuration enables exact replication of experimental results across different hardware platforms. (The experiments were implemented in Python 3.11.9 using the following key packages: NumPy 2.3.2, pandas 2.3.1, scikit-learn 1.7.1, LightGBM 4.6.0, and XGBoost 2.1.1.).
The confidence threshold optimization procedure evaluates each threshold value using validation set performance. The optimization criterion maximizes expected profit per trade:
where execution probability depends on the confidence threshold and expected profit accounts for directional accuracy and transaction costs.
5. Results and Analysis
The confidence-threshold framework was evaluated on a dataset comprising 802,967 observations across 11 cryptocurrency pairs from October 2023 to October 2024. The neural network model employed a multi-layer perceptron architecture with 256-128-64 hidden units, trained for 80 epochs with calibrated probability outputs. The system achieved selective execution on 96,247 trades (11.99% coverage) using an optimized confidence threshold of τ = 0.8.
5.1. Classification Performance
The binary direction classifier demonstrated strong predictive accuracy on executed trades.
Table 6 presents the comprehensive classification metrics for the confidence-based execution framework.
The model achieved 82.68% direction accuracy on executed trades, indicating robust prediction capability when confidence exceeds the threshold. The F1 score of 0.8195 demonstrates balanced performance across both up and down movement predictions. The ROC-AUC of 0.6886 suggests moderate discriminative ability, while the PR-AUC of 0.6695 indicates reasonable precision-recall trade-offs given the class distribution.
Figure 10 displays the confusion matrix for executed trades, revealing balanced classification performance with 30,103 correct up predictions and 49,478 correct down predictions out of 96,247 total executions. The false positive rate for up movements was 23.2%, while the false negative rate was 13.2%.
Figure 11 presents the ROC curve analysis, showing the trade-off between true positive rate and false positive rate across different probability thresholds. The curve demonstrates performance above random baseline (AUC = 0.5), with optimal operating points concentrated in the high-specificity region corresponding to the confidence-based execution strategy.
5.2. Trading Performance Evaluation
The confidence-threshold framework generated substantial risk-adjusted returns through selective trade execution.
Table 7 summarizes the trading performance metrics across the complete test period.
Trading profitability depends critically on transaction cost assumptions.
Table 8 presents sensitivity analysis across realistic cost scenarios.
The framework maintains profitability up to 5-basis point transaction costs, though performance degrades steadily. Each additional basis point reduces average profit by approximately 33 basis points and Sharpe ratio by 0.21. The optimal confidence threshold increases with costs (from τ = 0.78 at 0.5 bp to τ = 0.92 at 5 bp) as the system compensates by executing only the highest-conviction trades.
At costs exceeding 5 bp, the strategy becomes marginally profitable or negative for lower market cap pairs. This threshold aligns with maker–taker fees on major exchanges (0.5–2 bp) but exceeds typical market order costs (5–10 bp) during volatile periods. Practitioners using market orders would need to adjust thresholds upward or accept reduced profitability.
The relationship between costs and optimal threshold suggests that adaptive threshold mechanisms could improve robustness across varying liquidity conditions. Future work should explore dynamic threshold adjustment based on realized execution costs.
The system generated an average net profit of 151.11 basis points per executed trade after accounting for 1.0-basis point transaction costs. The median profit of 124.60 basis points indicates positive skewness in the return distribution, with more frequent moderate gains than extreme losses. The Sharpe ratio of 0.8313 demonstrates favorable risk-adjusted performance, considering the per-trade volatility of 181.77 basis points.
Figure 12 illustrates the cumulative profit progression over the test period, showing consistent positive drift with periodic drawdown periods. The cumulative performance exhibits steady growth patterns with maximum drawdowns contained within acceptable risk parameters. The profit trajectory demonstrates the effectiveness of the confidence-based execution strategy in avoiding low-conviction trades.
Figure 13 presents the distribution of per-trade net profits, revealing a right-skewed distribution with mode around 100 basis points. The histogram shows that 82.68% of trades generated positive returns, with the loss tail extending to −641 basis points while the profit tail reaches 1196 basis points. This asymmetric distribution supports the effectiveness of the confidence threshold in filtering profitable opportunities.
5.3. Cross-Asset Performance Analysis
The confidence-threshold framework exhibited heterogeneous performance across the 11 cryptocurrency pairs analyzed.
Table 9 presents the per-symbol trading metrics, revealing significant variability in both execution coverage and profitability patterns.
The results demonstrate an inverse relationship between coverage and profitability across different market capitalizations. Bitcoin (BTC_USDT) achieved the highest average profit of 198.45 basis points with 88.9% accuracy but exhibited the lowest coverage at 8.2%. Conversely, smaller market cap assets like TRX_USDT showed higher coverage (21.4%) but lower profitability (108.56 basis points) and accuracy (77.3%).
Figure 14 illustrates the per-symbol accuracy distribution across executed trades, showing performance ranging from 77.3% to 88.9%. The accuracy variations correlate with market liquidity characteristics, where more liquid pairs (BTC, ETH) demonstrate superior prediction reliability. The confidence threshold effectively filtered low-conviction predictions, maintaining accuracy above 77% across all symbols.
Figure 15 presents the average net profit distribution by cryptocurrency pair, revealing substantial profit heterogeneity. Major assets (BTC, ETH, XRP) generated profits exceeding 160 basis points, while emerging assets showed more modest returns between 108 and 145 basis points. This pattern suggests that the confidence-based approach adapts effectively to varying market microstructure conditions.
The coverage analysis reveals systematic differences across asset classes. Large-cap cryptocurrencies exhibited conservative execution patterns (8.2–11.7% coverage) with high-conviction predictions, while mid- and small-cap assets showed more frequent trading opportunities (12.7–21.4% coverage). This coverage distribution reflects the underlying volatility and price movement characteristics inherent to different cryptocurrency market segments.
5.4. Feature Importance and Model Interpretation
The mutual information feature selection process identified 128 critical predictors from the original 296-feature spaces.
Table 10 categorizes the selected features by their market data origin and temporal characteristics.
Order book features dominated the selected feature set, comprising 81.3% of the total features. Deep order book levels provided the most predictive power for direction classification. This finding validates the importance of market microstructure information for short-term price movement prediction in cryptocurrency markets.
Traditional technical indicators contributed only 4.7% of selected features, suggesting that conventional technical analysis metrics provide limited additional value when comprehensive order book data is available. The Average True Range (ATR) and RSI emerged as the most relevant technical indicators, while moving averages showed modest predictive capacity.
The feature selection process eliminated several expected predictors, including volume ratio indicators and order book level counts, due to insufficient variability across the dataset. This elimination pattern indicates that certain microstructure metrics may be less informative in cryptocurrency markets compared to traditional equity markets.
Spread-based features, including lagged spread measurements and moving averages, maintained moderate importance scores. The spread_bps_lag1 feature achieved the highest individual importance among microstructure indicators, confirming the predictive value of recent transaction cost patterns for future price movements.
The temporal horizon of 600 min (10 h) proved effective for capturing meaningful price movements while maintaining sufficient prediction accuracy. This horizon length accommodates the 24/7 nature of cryptocurrency markets and allows the model to incorporate overnight and weekend trading patterns that are absent in traditional financial markets.
Model calibration successfully improved probability estimates, as evidenced by the coherent relationship between predicted confidence levels and actual trading outcomes. The calibration process reduced overconfidence in borderline predictions while maintaining discriminative power for high-confidence trades, resulting in more reliable execution decisions.
5.5. Baseline Comparison and Ablation Analysis
To validate the advantage of confidence-based selective execution, we compare our framework against three baseline approaches using identical data and features.
Table 11 presents the comparative performance analysis.
The confidence-threshold framework substantially outperforms all baselines. The always-execute binary classifier achieves 76.34% accuracy across all predictions but generates only 89.23-basis point average profit with 0.42 Sharpe ratio. Our selective approach sacrifices coverage (11.99% vs. 100%) to improve accuracy by 6.34 percentage points and profit by 69% while doubling the Sharpe ratio.
The fixed threshold baseline (τ = 0.5, no calibration) executes 47.83% of observations but achieves only 74.12% accuracy and 102.45 bps profit. This demonstrates that both threshold optimization and probability calibration contribute to performance gains. The calibration step alone improves accuracy by 8.56 percentage points on executed trades.
Random selection at equivalent coverage (12%) produces near-chance accuracy (50.21%) and negative returns (−8.34 bps), confirming that our confidence scores capture genuine signal rather than data artifacts.
We also examined LSTM and Transformer architectures during preliminary experiments but found their performance comparable to our MLP when using identical features and training procedures. The multi-layer perceptron offers faster training (3.2 h vs. 8+ h for LSTM) and simpler deployment without sacrificing accuracy. The key innovation lies in the confidence-based execution logic rather than architectural complexity.
Feature ablation tests reveal that order book features contribute 73% of total performance when measured by profit degradation upon removal. Removing technical indicators reduces profit by only 4%, while removing order book depth causes a 68% profit decline. This confirms hypothesis H2 regarding microstructure dominance.
6. Discussion
6.1. Key Findings and Implications
The confidence-threshold framework demonstrates substantial improvements over traditional cryptocurrency prediction approaches through selective execution strategies. The system achieved 82.68% direction accuracy on executed trades while maintaining 11.99% market coverage, generating average net profits of 151.11 basis points per trade. These results indicate that confidence-based filtering effectively identifies high-conviction trading opportunities in volatile cryptocurrency markets.
The 10 h prediction horizon (600 min) proves optimal for capturing meaningful price movements while avoiding excessive noise. This timeframe accommodates the 24/7 nature of cryptocurrency markets and allows sufficient time for order execution without exposure to rapid market reversals. Shorter horizons (60–300 min) showed higher noise sensitivity, while longer horizons (12–24 h) reduced prediction accuracy due to increased uncertainty.
Order book features dominate the predictive importance hierarchy, comprising 81.3% of selected features. This finding validates the critical role of the market microstructure in short-term price prediction for cryptocurrency markets. Traditional technical indicators contribute only 4.7% of predictive power, suggesting limited additional value when comprehensive microstructure data is available.
The inverse relationship between market capitalization and coverage rates reveals systematic differences in prediction confidence across asset classes. Bitcoin demonstrates the lowest coverage (8.2%) but highest profitability (198.45 bps), while smaller assets like TRON show higher coverage (21.4%) with reduced profitability (108.56 bps). This pattern reflects varying signal-to-noise ratios across different market segments.
6.2. Cryptocurrency Market Considerations
Cryptocurrency markets exhibit unique characteristics that distinguish them from traditional financial markets. The 24/7 trading environment eliminates traditional market opening and closing effects, creating continuous price discovery processes. Weekend trading patterns show reduced volume but maintained volatility, contrasting with equity markets where weekend gaps are common.
Decentralized exchange structures introduce additional complexity through fragmented liquidity and varying fee structures. The confidence-threshold approach adapts naturally to these conditions by incorporating microstructure signals that reflect local liquidity conditions rather than relying solely on price-based indicators.
Regulatory uncertainty creates periodic volatility spikes that traditional models struggle to handle. The confidence-based approach provides natural protection by reducing execution frequency during high-uncertainty periods, as prediction confidence typically decreases when regulatory announcements create market instability.
The dominance of retail trading in cryptocurrency markets generates different order flow patterns compared to institutional-dominated traditional markets. Retail-driven price movements often exhibit momentum characteristics that the microstructure features capture effectively, explaining the superior performance of order book-based predictions.
6.3. Practical Implementation Considerations
Real-time implementation requires robust data infrastructure capable of processing high-frequency order book updates with minimal latency. The 128-feature model demands approximately 15 milliseconds for feature computation and prediction generation, making it compatible with minute-frequency trading strategies but potentially limiting for higher-frequency applications.
Risk management integration presents both opportunities and challenges. The confidence scores provide natural position sizing signals, with higher confidence predictions justifying larger position sizes. However, the selective execution approach may result in extended periods without trading signals, requiring sophisticated portfolio management to maintain capital efficiency.
Transaction cost assumptions prove critical for threshold optimization. The 1.0-basis point cost assumption may underestimate real-world implementation costs, particularly for smaller cryptocurrency pairs or during high-volatility periods. Sensitivity analysis suggests that costs exceeding 2.5 basis points would require threshold adjustments to maintain profitability.
The 11-symbol limitation constrains diversification opportunities compared to traditional portfolio approaches. Expanding the symbol universe requires comprehensive order book data collection, which involves significant infrastructure investments and exchange relationships.
6.4. Limitations and Future Directions
Data availability represents the primary constraint for broader application. Complete order book data remains limited to major cryptocurrency exchanges and popular trading pairs. Expanding to decentralized exchanges or smaller assets would require different data collection strategies and potentially modified modeling approaches.
Model generalization across different market regimes requires further validation. The October 2023 to October 2024 period captured diverse market conditions but may not represent all possible cryptocurrency market states. Extended validation across multiple market cycles would strengthen confidence in the approach.
The fixed confidence threshold approach could benefit from adaptive mechanisms that adjust to changing market conditions. Machine learning-based threshold optimization could potentially improve performance by responding to volatility regime changes or market structure evolution.
Feature engineering opportunities exist through cross-asset signal incorporation and alternative data sources. Social sentiment indicators, on-chain metrics, and macroeconomic factors could enhance prediction accuracy, particularly for longer-term horizons.
Ensemble methods combining multiple prediction horizons and confidence thresholds present promising research directions. Multi-horizon ensembles could provide more robust predictions by capturing both short-term microstructure signals and longer-term trend information.
The framework could extend to other blockchain-based assets beyond cryptocurrencies, including tokenized securities, non-fungible tokens, and decentralized finance protocols, each presenting unique prediction challenges and opportunities.
6.5. Comparative Analysis with Existing Literature
The confidence-threshold framework demonstrates competitive performance compared to recent cryptocurrency prediction approaches. Our 82.68% direction accuracy with 11.99% coverage compares favorably to the existing literature, though direct comparisons require careful consideration of different evaluation methodologies.
Peng et al. (2024) [
31] report improved financial metrics using attention-based CNN-LSTM with triple trend labeling but do not specify exact accuracy figures. Their focus on reducing transaction frequency aligns with our selective execution approach, though our confidence-based method provides more systematic selectivity control.
Kang et al. (2025) [
33] achieve Sharpe ratios up to 3.56 using TimesNet with Bollinger Bands on 4 h Ethereum data. Our per-trade Sharpe ratio of 0.83 operates on different time scales (10 h horizon) and evaluation methodology, making direct comparison challenging. However, our approach maintains consistent performance across 11 cryptocurrency pairs rather than optimizing for individual assets.
Viéitez et al. (2024) [
35] generate profit factors up to 5.16 for Ethereum using external indicators over longer time periods. Our 151.11-basis point average profit per trade on 600 min horizons targets different trading frequencies and risk profiles. Their approach excludes technical indicators while ours demonstrates that microstructure features dominate predictive importance.
The network-based approach of Zhong et al. (2023) [
32] shows highest profits in trading simulations but lacks specific performance metrics for direct comparison. Our framework incorporates cross-asset information implicitly through mutual information feature selection, though not through explicit network modeling.
Our findings regarding order book feature dominance (81.3% of selected features) align with Liu et al.’s (2025) [
12] observations about liquidity commonality and microstructure importance. However, most existing studies focus primarily on price-based features rather than comprehensive order book analysis.
The confidence-threshold paradigm represents a novel contribution not directly addressed in the current literature. While Golnari et al. (2024) [
30] incorporate probabilistic elements through P-GRU, their approach generates probability distributions for predicted values rather than using confidence for execution decisions. Our separation of directional prediction from execution decisions provides a systematic framework for managing prediction uncertainty.
6.6. Regulatory and Ethical Considerations
Real-world deployment of automated cryptocurrency trading systems raises several regulatory and ethical concerns that practitioners must address [
40,
41].
Market manipulation risks deserve careful attention. High-frequency trading algorithms can potentially contribute to artificial volatility or facilitate pump-and-dump schemes, particularly in less liquid cryptocurrency pairs. Our selective execution approach partially mitigates these risks by reducing trading frequency and avoiding low-confidence predictions during volatile periods. However, operators should implement additional safeguards including position limits, maximum order sizes, and volatility circuit breakers [
42].
Regulatory compliance varies dramatically across jurisdictions. The United States treats most cryptocurrencies as securities subject to SEC oversight, requiring algorithmic trading firms to register as broker-dealers. European MiCA regulations impose transparency requirements on automated trading systems. Operators must ensure compliance with local regulations, including audit trails, risk disclosures, and anti-money laundering procedures.
DeFi protocol integration introduces unique challenges. Decentralized exchanges lack traditional market oversight, creating opportunities for front-running through transaction reordering. Our 10 h prediction horizon reduces vulnerability to such attacks compared to high-frequency strategies, but deploying on public blockchains still requires careful consideration of MEV (Maximal Extractable Value) risks.
Fairness concerns arise when sophisticated algorithms trade against retail participants. While our confidence-based approach does not exploit information asymmetries or engage in predatory trading, the superior performance versus simple strategies could exacerbate wealth concentration. Exchanges might consider implementing fairness mechanisms or educational resources to level the playing field.
Data privacy presents minimal concerns for cryptocurrency trading since all blockchain transactions are public. However, operators should protect proprietary trading strategies and avoid disclosing positions that could be exploited by competitors.
We recommend that institutions deploying similar systems establish ethics review boards, conduct regular audits for market impact, and maintain transparency about algorithmic trading activities within regulatory boundaries. The cryptocurrency industry’s evolution toward institutional adoption demands higher standards of responsible algorithm design.
7. Conclusions
This research introduces a confidence-threshold framework for cryptocurrency price direction prediction that separates directional forecasting from execution decisions. The approach addresses fundamental challenges in cryptocurrency market prediction through selective execution based on model confidence rather than traditional classification thresholds.
The experimental results demonstrate significant performance improvements over traditional approaches. The system achieved 82.68% direction accuracy with 151.11-basis point average profit per trade across 11 cryptocurrency pairs. These results validate the effectiveness of confidence-based execution in volatile cryptocurrency markets.
The dominance of order book features in prediction importance confirms the critical role of market microstructure information for short-term cryptocurrency price movements. Traditional technical indicators provide limited additional value when comprehensive microstructure data is available, suggesting that conventional technical analysis approaches may be insufficient for optimal cryptocurrency trading.
The methodology contributes to cryptocurrency market analytics by providing a practical framework for selective trading execution. The confidence-based approach offers natural risk management capabilities while maintaining competitive returns, addressing key concerns for algorithmic trading implementation in cryptocurrency markets.
Future research directions include extending the framework to additional blockchain-based assets, developing adaptive threshold mechanisms, and incorporating alternative data sources such as on-chain metrics and social sentiment indicators. The confidence-threshold paradigm presents broader applications beyond cryptocurrency markets, potentially enhancing prediction systems across various financial asset classes.
The findings advance the understanding of machine learning applications in cryptocurrency markets while providing practical tools for algorithmic trading implementation. The framework bridges academic research and practical trading requirements, contributing to the growing field of cryptocurrency market analytics and blockchain-based financial system analysis.