Cryptocurrency Futures Portfolio Trading System Using Reinforcement Learning

Jae Heon Chun; Suk Jun Lee

doi:10.3390/app15179400

and

Business School, Kwangwoon University, 26 Kwangwoon-gil, Nowon-Gu, Seoul 139-701, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci.2025, 15(17), 9400;https://doi.org/10.3390/app15179400

This article belongs to the Section Computing and Artificial Intelligence

Version Notes

Order Reprints

Review Reports

Abstract

This paper proposes a cryptocurrency portfolio trading system (CPTS) that optimizes trading performance in the cryptocurrency futures market by leveraging reinforcement learning and timeframe analysis. By employing the advantage actor–critic (A2C) algorithm and analysis of variance (ANOVA) portfolios are constructed over multiple timeframes. Data corresponding to the trade of 18 major cryptocurrencies on Binance Futures––between January 2022 and December 2023––are used to show that trading strategies can be effectively categorized into those with high-frequency (10, 30, and 60 min) and low-frequency (daily) timeframes. Empirical results demonstrate statistically significant differences in returns between these timeframe groups, with major cryptocurrencies (e.g., Bitcoin and Ethereum) exhibiting higher returns in high-frequency trading (16–17%) than in daily trading (6–7%) during training. Performance evaluation during the test period revealed that the low-frequency group achieved a 43.06% average return, significantly outperforming the high-frequency group (5.68%). The ANOVA results confirm that both the frequency type and portfolio selection significantly influence trading performance at the 5% significance level. This study offers a novel approach to cryptocurrency trading that considers the distinct characteristics of different timeframes. The effectiveness of combining reinforcement learning with statistical analysis for portfolio optimization in highly volatile cryptocurrency markets is demonstrated.

Keywords:

advantage actor–critic; cryptocurrency; reinforcement learning; timeframe

1. Introduction

In recent years, the cryptocurrency market has expanded rapidly, establishing itself as a critical asset class in the global financial market. In particular, prominent cryptocurrencies, such as Bitcoin and Ethereum, offer investors new opportunities and challenges driven by their significant volatility and liquidity [,]. However, owing to the unique characteristics of the cryptocurrency market, traditional financial analysis methods struggle to adequately explain or predict its price volatility [,]. Consequently, novel analytical approaches that leverage machine learning have been applied.

Stablecoins play a pivotal role in the cryptocurrency market. They are designed to minimize price volatility by pegging their value to real-world assets, such as fiat currencies (e.g., the US dollar and euro) or commodities (e.g., gold) [,]. For instance, Tether USDt (USDT) is a representative stablecoin pegged 1:1 to the US dollar, which is widely utilized as a fundamental medium of exchange in cryptocurrency trading []. Moreover, stablecoins constitute a critical infrastructure in the cryptocurrency market because of their advantages in facilitating transactions, providing liquidity, and ensuring price stability []. They hedge against the volatility and unpredictability inherent in cryptocurrency markets, enabling investors to manage sudden market fluctuations, that would otherwise pose significant risks, more effectively []. Further, the circulating supply of stablecoins can serve as an indicator of market liquidity and investor sentiment [].

Reinforcement learning (RL) has been demonstrated to be an effective machine-learning approach for optimizing decision-making in uncertain and complex environments []. It is primarily composed of four elements: state, agent, action, and reward. Through interactions with the environment, the agent learns a policy that maximizes cumulative rewards. This approach has been extensively studied in various domains including finance, robotics, and natural language processing [,]. A prominent example showcasing the potential of RL is Google DeepMind’s AlphaGo, which exhibits superhuman performance in the complex game, Go []. Recently, RL has been increasingly applied in dynamic systems, such as financial markets [].

In this study, we propose a conceptual framework for a cryptocurrency portfolio trading system (CPTS) designed to generate profitability based on multiple timeframes by implementing RL on the Binance Futures market.

The proposed trading system employs analysis of variance (ANOVA) to create a portfolio over different timeframes and RL to learn the trading model. We employ the advantage actor–critic (A2C) algorithm, which is known for its effectiveness in single-asset trading, as the RL framework. Based on the actor–critic architecture, A2C learns policy and value functions simultaneously, enabling efficient policy exploration []. Thus, it combines the advantages of policy- and value-based learning and exhibits stable learning and fast convergence. These characteristics make it particularly useful for single-asset trading that exhibits high volatility.

The A2C algorithm is trained on input data comprising technical indicators derived from open, high, low, and close (OHLC) prices and trading volumes obtained from cryptocurrency futures markets, as well as funding rates, on-chain data, and bases. Four frequencies—10, 30, and 60 min daily—were considered as timeframes for portfolio composition. Portfolio construction is based on profitable cryptocurrencies and an ANOVA of returns generated over different timeframes during the training period. The performance of CPTS is compared with that of Binance Futures, which are not selected as benchmarks for the portfolio. This study also performed an ANOVA to evaluate the performance of the CPTS with respect to varying timeframes.

The remainder of this paper is organized as follows. The fundamental concepts of blockchain, cryptocurrency, RL, and system trading and portfolios are briefly discussed in Section 2, along with a brief review of the existing literature on these topics. The proposed CPTS is introduced in Section 3, and Section 4 presents an empirical study performed to evaluate its performance. Finally, the conclusions are presented in Section 5.

2. Background and Related Works

2.1. Blockchain and Cryptocurrency

In a blockchain, data are stored in blocks, and each block contains the hash value of the previous block, creating a chain structure that ensures data integrity and immutability []. In addition, by sharing and verifying transaction records across all nodes, the system enhances transparency and trustworthiness [].

Bitcoin, the first cryptocurrency to leverage blockchain technology, implements a secure and reliable electronic payment system without the need for a central authority []. However, Bitcoin exhibits certain limitations, including restricted transaction processing speed and the lack of smart contract functionality []. To address these issues, Ethereum was introduced in 2015, offering a platform for smart contract development and decentralized applications, thereby expanding the application scope of blockchain technology [].

Cryptocurrencies play a pivotal role in blockchain networks and extend beyond mere value exchange. They provide economic incentives to participants, encouraging network activity, while ensuring the maintenance and security of blockchain systems []. For instance, in Bitcoin, the proof-of-work mechanism rewards miners with Bitcoin to generate new blocks []. This incentivizes miners to allocate computational resources, thereby enhancing network stability [].

In addition, cryptocurrencies serve as the fuel required to execute smart contracts and operate decentralized applications, thereby contributing to the expansion of the blockchain ecosystem []. In Ethereum, the Ether (ETH) fulfills this role by covering the execution costs of smart contracts [] and enhancing resource allocation and security within the blockchain platform.

The cryptocurrency market is characterized by high volatility and liquidity, offering both opportunities and risks to investors []. The price dynamics of cryptocurrencies are influenced by a variety of factors, including supply and demand, regulatory developments, and technological advancements []. Therefore, accurate price predictions play a crucial role in developing investment strategies and managing risks in cryptocurrency markets []. Existing studies on cryptocurrency price prediction have employed various approaches, including statistical methods, machine-learning-, and deep-learning-based techniques [,].

In conclusion, blockchains and cryptocurrencies are transforming the paradigm of financial systems, and their interplay plays a pivotal role in the advancement of blockchain ecosystems. Cryptocurrencies are essential for maintaining the activity and security of blockchain networks while offering new opportunities to investors. In this context, the development of cryptocurrency-trading strategies using RL has emerged as a significant and meaningful research topic, and is described in greater detail in the following subsection.

2.2. Reinforcement Learning

RL has been applied in various domains including gaming, robotics, natural language processing, and recommendation systems []. In the financial sector, RL is used for portfolio optimization, algorithmic trading, and risk management [,]. In particular, given the uniquely uncertain and dynamic characteristics of the cryptocurrency market, RL has emerged as an effective alternative to traditional financial models for ensuring optimal decision-making in the trade of related assets []. RL-based cryptocurrency-trading strategies aim to maximize returns and minimize risks in highly volatile markets [].

RL is distinguished from supervised and unsupervised learning in that its agents undergo independent training without explicit guidance or labeled data []. RL agents learn the optimal timing and positioning of trades using market data. [] demonstrated that RL outperforms traditional approaches in this task. However, it also faces several challenges, such as instability during the learning process, difficulties in hyperparameter tuning, and complexity of real-world data []. To address these issues, potential improvements to algorithms have been investigated to enable stable learning and model design tailored to real-world data.

RL algorithms can be categorized into value-based, policy-based, and actor–critic methods []. Value-based methods, such as Q-learning [] and deep Q-networks (DQN) [], focus on learning the state-action value function (Q-function) to derive the optimal policy. Policy-based methods directly learn policy functions for action selection, making them particularly effective in continuous action spaces []. Actor–critic methods combine the advantages of both approaches by learning a policy (actor) and value functions (critic) simultaneously [].

In particular, the A2C algorithm exhibits a stable learning performance and fast convergence in problems with relatively simple states and action spaces, such as single-asset trading []. In exploring the applicability of RL in financial markets, Fischer [] highlighted that actor–critic algorithms, such as A2C, could be effective for single-asset trading. Additionally, Pricope [] developed algorithmic trading strategies using A2C to achieve excellent performance in single-asset trading scenarios.

While other algorithms such as DDPG and PPO have shown promise in continuous control problems, A2C provides a balanced approach that is well-suited for our initial exploration of timeframe-dependent portfolio optimization.

In this study, the A2C algorithm was employed to trade a single cryptocurrency asset based on efficient policy exploration. The actor and critic networks in our A2C model are both composed of a two-layer LSTM network followed by a fully connected layer. The hyperparameters used for training are as follows: learning rate = 0.001, discount factor (gamma) = 0.99, and entropy coefficient = 0.01. In this framework, the actor learns the policy

π_{θ} (a ∣ s)

to select an action

a

in a given state

s

, while the critic estimates the state value function

V_{φ} (s)

. The fundamental component of A2C is the advantage function,

A (s, a)

, which is defined as follows:

A (s, a) = Q (s, a) - V (s) .

(1)

Here,

Q (s, a)

represents the expected return when executing action

a

in state

s

, and

V (s)

denotes the expected value of state

s

. The actor and critic optimize the two objective functions defined by (2) and (3), respectively.

J_{θ} = E [l o g π_{θ} (a_{t}∣ s_{t}) A (s_{t}, a_{t})],

(2)

J_{φ} = E [(R_{t} - V_{φ} (s_{t}))^{2}] .

(3)

2.3. System Trading and Portfolio Management

System trading is a trading method that automatically executes buy-and-sell orders using predefined algorithms and rules. Automated trading systems utilize computer programs to analyze market data, generate trading signals, and execute orders []. System trading eliminates subjective human judgment, minimizes irrational decision-making driven by emotions, and enhances the efficiency and consistency of trading activities and strategies [,,,].

Early forms of system trading involved executing trades based on rules derived from technical analysis indicators, such as moving averages and momentum indicators [,]. These technical indicators played a crucial role in analyzing price fluctuations and determining trading timings by systematizing market patterns. However, as the complexity of financial markets has increased, quantitative trading, which employs statistical methodologies and mathematical modeling, has been introduced as an alternative [,]. Subsequently, advancements in machine learning and artificial intelligence have revealed new possibilities for system trading [,].

The system trading approach has also been applied in the field of portfolio management. Portfolio optimization aims to determine the allocation of investments corresponding to various assets to maximize expected returns while minimizing risk []. Traditional portfolio theory calculates optimal asset allocation based on asset returns and covariances. However, it struggles to capture the nonlinearity and complex interactions inherent in financial markets [].

Consequently, machine-learning techniques, such as RL, have been studied to address portfolio optimization problems [,]. In particular, deep RL, which combines RL with deep learning, is advantageous for portfolio management because of its ability to capture complex patterns in high-dimensional data [,].

Applying system trading and RL to the analysis of cryptocurrency markets, which are characterized by high volatility and complex market structures, presents both challenges and opportunities [,]. Recent studies have focused on optimizing cryptocurrency portfolios and developing strategies to adapt to market volatility using RL [,].

Jiang and Liang [] applied the Deep Deterministic Policy Gradient (DDPG) algorithm, which is an RL-based portfolio management framework, to enhance the returns of cryptocurrency portfolios.

While these studies demonstrate the potential of RL in cryptocurrency portfolio management, they do not explicitly consider the impact of different trading timeframes on portfolio performance. Our study differs from these works by employing ANOVA to statistically analyze the performance of cryptocurrencies across different timeframes and construct portfolios based on these findings.

However, [] utilized ensemble RL to achieve stable portfolio performance across various market conditions. These studies suggest that RL is a promising tool for system trading in cryptocurrency markets. However, cryptocurrency markets present several challenges, including data noise, potential market manipulation, and regulatory uncertainty [,]. Therefore, the development of RL-based system trading strategies must consider these factors to ensure the stability and practicality of the model.

In conclusion, system trading is a critical methodology that enables automated trading and portfolio management in financial markets, with further advancements driven by RL integration. In particular, its application in the cryptocurrency market has garnered attention owing to its ability to leverage RL to handle high volatility and complex market dynamics.

3. Materials and Methods

This section provides a detailed description of the proposed CPTS framework, which includes three stages (Figure 1). In the first stage, an input dataset is constructed. The second stage involves training the model for portfolio construction. The third stage evaluates the performance of the CPTS using trading simulation.

Figure 1. Construction procedure for the proposed CPTS framework.

3.1. Stage 1: Construction of Input Dataset

At this stage, the OHLC prices, volume, funding rate, and circulating supply data of USDT of cryptocurrencies {

C_{1}, C_{2}, \dots, C_{n}

} are collected. Technical indicators {

T_{1}, T_{2}, {. . ., T}_{n}

} are generated using trading data (i.e., OFLC prices and volumes). In addition, the basis indicators are calculated based on future and spot trading data and are defined as follows:

{B a s i s}_{o p e n} = \frac{({F u t u r e s}_{o p e n} - {S p o t}_{o p e n})}{{S p o t}_{o p e n}}

(4)

{B a s i s}_{h i g h} = \frac{({F u t u r e s}_{h i g h} - {S p o t}_{h i g h})}{{S p o t}_{h i g h}}

(5)

{B a s i s}_{l o w} = \frac{({F u t u r e s}_{l o w} - {S p o t}_{l o w})}{{S p o t}_{l o w}}

(6)

{B a s i s}_{c l o s e} = \frac{({F u t u r e s}_{c l o s e} - {S p o t}_{c l o s e})}{{S p o t}_{c l o s e}}

(7)

3.2. Stage 2: Model Training and Portfolio Construction

At this stage, a model for constructing cryptocurrency futures portfolios is trained using the RL algorithm, A2C. The proposed RL model primarily comprises four modules: the environment, data processing, learning, and agents. The environment module defines the market environment and exchanges rules between the input dataset and both the agent and learning modules. The interactions of the RL agent are based on these data. The data-processing module transforms the state variables (i.e., technical indicators, funding rates, basis indicators, and circulating supply data) received from the environment module into a form suitable for learning using the RL algorithm. Because the variables exhibit different value ranges and scales, inputting them directly may introduce a bias during the learning process. To prevent this, the distribution of the variables was adjusted by applying standardization or normalization techniques to the input data. The learning module trains the agent’s policy network based on the preprocessed data received from the data-processing module and the market environment information obtained from the environment module. A Long Short-Term Memory (LSTM) neural network was employed to capture the characteristics of time-series data []. LSTM promotes the learning of long-term dependencies, enabling the model to identify patterns in the financial time-series data. The actor and critic networks in our A2C model are both composed of a two-layer LSTM network followed by a fully connected layer. The hyperparameters used for training are as follows: learning rate = 0.001, discount factor (gamma) = 0.99, and entropy coefficient = 0.01. The policy network uses the state as the input and outputs the probabilities for each action, whereas the value network estimates the value function of each state. The agent module selects actions based on the learned policy and receives rewards by interacting with the environment. At each time step, the agent chooses one of four actions: buying (entering a long position), selling (entering a short position), holding, and closing (exiting a position). The reward for each action was calculated by considering the realized profits and losses from trade, transaction costs, and funding rate expenses. The agent updates the parameters of the policy and value networks based on the reward signals, thereby minimizing the loss function of the A2C algorithm. The following training procedure was implemented.

The agent observes the current state $s_{t}$ and selects an action $a_{t}$ using the policy network.
Based on the selected action, the agent receives the next state $s_{t + 1}$ and reward $r_{t}$ from the environment.
The reward comprises immediate and delayed components; the immediate reward reflects the instantaneous profit or loss resulting from the action, whereas the delayed reward considers the long-term performance.
The agent calculates the loss function using its experienced states, actions, and rewards, and updates the parameters of the policy and value networks. The loss function is defined as follows:

L o s s = \sum_{t = 0}^{T} ((r_{t} + γ V (s_{t + 1}) - V (s_{t}))^{2} - l o g π (a_{t}∣ s_{t}) (r_{t} + γ V (s_{t + 1}) - V (s_{t}))) .

(8)

Here,

V (s_{t})

represents the output of the value network, and

π (a_{t}, s_{t})

denotes the output probabilities of the policy network.

The reward function is based on realized profit and loss (PnL) arising from trade after accounting for transaction costs and funding rate expenses. During the training period, the agent incorporates factors beyond the realized profit and loss into the reward function, including volatility (applying penalties during sharp fluctuations in returns) and maximum drawdown prevention (applying penalties when the maximum drawdown is large). Using this approach, the agent learns strategies to maximize returns while minimizing risks.

To mitigate the risk of overfitting, we employed a standard training/testing split of the data, using a 70% to 30% ratio. The model is trained on the training set, and its performance is evaluated on the unseen test set. The inclusion of penalties for volatility and maximum drawdown in the reward function also helps to prevent the agent from learning overly aggressive and potentially overfitted strategies.

In addition, to evaluate the performance of CPTS over different time intervals, various timeframes (10, 30, and 60 min daily) were established, and datasets specific to each timeframe were constructed accordingly.

Cryptocurrencies with higher per-agent trading returns than the average benchmark interest rate during the training period were selected for testing. Subsequently, portfolios were constructed based on the results of the ANOVA and post hoc tests on returns over different timeframes. In particular, we aimed to compare the returns during the test period between timeframe groups with significant differences in returns and those without such differences for CPTS.

3.3. Stage 3: Evaluation of CPTS

Trading simulations were conducted on timeframe-specific portfolio cryptocurrencies obtained using the model trained in Step 2. The simulations were executed individually for each cryptocurrency, and the CPTS performance was evaluated in terms of the agent’s rate of return (ROR). Specifically, the effectiveness of portfolio selection was verified by analyzing the differences in returns between the groups selected and not selected for the portfolio.

4. Experiments

4.1. Data Description

The data are obtained from the application programming interface of Binance Futures and Glassnode. We selected the top 20 cryptocurrencies by market capitalization on Binance Futures. After excluding two cryptocurrencies with insufficient historical data for the full study period, we were left with a final set of 18 cryptocurrencies for our experiment (see Table 1). The selected cryptocurrencies are all actively traded on Binance Futures and have sufficient liquidity for our analysis. Additionally, stablecoins are methodologically excluded from price prediction studies as they are specifically designed to maintain price stability rather than generate returns through price appreciation, making them unsuitable for our research objectives.

Table 1. Ticker symbols of cryptocurrencies used in this study.

To this end, cryptocurrency exchange futures contracts are selected, and their daily prices are investigated. Among the Binance Futures products, USDT perpetual contracts, which exhibit the highest trading volumes, are selected, along with the top 20 assets in terms of market capitalization. Perpetual contracts are cryptocurrency-trading instruments linked to the value of the underlying assets and serve as futures contracts without expiration dates. Of these, two assets with significant data losses were excluded, yielding an aggregate of 18 assets.

For each cryptocurrency, the time-series data were structured with respect to the selected timeframes (10, 30, and 60 min daily). The tick data were converted into these respective timeframes, allowing the model to learn the patterns and trends of the time-series data.

The collected data were divided into training and testing datasets. The training period is from 1 January 2022, to 30 April 2023, and data collected over this period are used by the agent to learn the optimal policy. The testing period was defined as 1 May 2023, to 31 December 2023, and was used to evaluate the performance of the trained model.

For on-chain data, we collect the circulating supply of USDT from Glassnode, which we use as a macroscopic proxy for market liquidity that uniformly affects all cryptocurrencies. Since each cryptocurrency protocol has different characteristics and on-chain metrics that are not directly comparable (e.g., Bitcoin’s hash rate has no equivalent in Ethereum or other networks), we adopted this universal approach to ensure methodological consistency across our diverse asset portfolio. However, the utilization of on-chain data is limited. This study considers only the circulating supply of USDT and does not consider various other on-chain indicators, such as active addresses, transaction volumes, and protocol-specific metrics.

4.2. CPTS Experimental Results

The average ROR values of the CPTS corresponding to 10, 30, and 60 min, as well as the daily timeframes during the training period are listed in Table 2.

Table 2. Average ROR corresponding to the 10 min, 30 min, and 60 min timeframes during the training period.

The 10, 30, and 60 min timeframes corresponded to higher performance metrics than those of the daily timeframe. Notably, BTCUSDT, a major cryptocurrency, recorded returns of 16.572%, 16.832%, and 19.048% in the 10, 30, and 60 min timeframes, respectively. In contrast, it exhibits a relatively low return of 6.594% in the daily timeframe. Similarly, ETHUSDT exhibits a clear trend, with returns of 12.522%, 13.552%, and 13.618% corresponding to the 10 min, 30 min, and 60 min timeframes, respectively, compared with 6.107% corresponding to the daily timeframe. A similar pattern was observed for platform tokens such as BNBUSDT. Notably, TRXUSDT exhibited the most significant difference, achieving a return of 27.092% in the 10 min timeframe and 10.526% in the daily timeframe.

To select the portfolio assets to be employed in the CPTS, an ANOVA was performed on the RORs corresponding to different timeframes. Before conducting the ANOVA, we verified the assumptions of normality and homogeneity of variances using the Shapiro–Wilk and Levene’s tests, respectively. As some of the data did not meet these assumptions, we also employed the non-parametric Fritaedman test and Dunn’s post hoc test. Table 3 lists the ANOVA results, which confirm that the differences depend on the timeframe, while Table 4 presents the results of the post hoc tests.

Table 3. ANOVA results corresponding to portfolio construction during the training period.

Table 4. Post-analysis results for the training period (Dunn’s test).

No significant differences in ROR were observed among the 10, 30, and 60 min timeframes, but the daily timeframe exhibited statistically significant differences compared with the others (p < 0.05). In this study, the 10 min, 30 min, and 60 min timeframes comprised the high-frequency group, whereas the daily timeframe comprised the low-frequency group. Portfolios suitable for high- and low-frequency groups were selected. In addition, exceeding the average benchmark interest rate of 2.389% during the training period was considered a criterion for portfolio construction. We use the average interest rate as a benchmark as it represents the risk-free rate of return in our market context. We acknowledge that comparisons with buy-and-hold strategies and cryptocurrency market indices would provide more comprehensive performance evaluation but were unable to conduct these comparisons due to data limitations in our current study design.

Table 5 and Table 6 present the cryptocurrencies in the high- and low-frequency groups selected for the portfolios. The 95% confidence intervals for the mean ROR are also reported.

Table 5. Cryptocurrencies in the high-frequency group selected for the portfolio.

Table 6. Cryptocurrencies in the low-frequency group selected for the portfolio.

More cryptocurrencies in the high-frequency group outperform the benchmark interest rate, indicating that trading strategies utilizing high-frequency data could be more effective given the high volatility of the cryptocurrency market.

Trading simulations for the CPTS were performed during the test period, based on the portfolios constructed during the training period. Table 7 and Table 8 present the ROR values for the high- and low-frequency groups, respectively. In the high-frequency group, TRXUSDT exhibited the most stable and high-quality performance, with returns of 9.51%, 8.00%, and 8.01% at 10, 30, and 60 min, respectively. BNBUSDT also exhibits stable performance, with returns of 7.49%, 8.00%, and 7.99% at 10, 30, and 60 min, respectively. Additionally, BCHUSDT and ATOMUSDT recorded high returns of 7.90% and 7.40%, respectively, corresponding to the 10 min timeframe.

Table 7. Average RORs of cryptocurrencies in the high-frequency group selected for the portfolio during the testing period.

Table 8. Average RORs of the cryptocurrencies in the low-frequency group selected for the portfolio during the testing period.

In the low-frequency group, all selected assets achieve high returns, exceeding 28%. Notably, BCHUSDT exhibited the highest return at 69.76%, followed by BTCUSDT (49.62%), and ADAUSDT (47.49%). Notably, top-market cap coins, such as ETHUSDT and ATOMUSDT, are also observed to exhibit stable returns of 35.05% and 32.11%, respectively.

Table 9 presents the results of the t-test performed to identify the differences between the RORs of the two groups (high and low frequencies) during the test period. ANOVA results indicated that the differences in RORs between the groups were statistically significant.

Table 9. T-test results on the differences in ROR between the two groups.

Additionally, to verify the effectiveness of portfolio selection, a full-factorial mixed-model ANOVA was performed on the returns of cryptocurrencies selected for the high- and low-frequency groups, and those that were not selected. The analysis reveals that type of frequency (A) and portfolio selection (B) exert significant effects on performance at the 5% level of significance (See Table 10).

Table 10. Multifactor ANOVA results for the test problem.

5. Conclusions

This study proposes a CPTS framework that analyzes the data characteristics of cryptocurrency futures markets by using portfolios and RL. CPTS utilizes technical indicators, basis indicators, and on-chain data corresponding to different timeframes to construct a portfolio of cryptocurrencies whose returns exceed the benchmark interest rates during the training period. The performance of the CPTS was evaluated in terms of profitability, considering different timeframes. The application of CPTS to the cryptocurrency futures market in our experiments reveals the effectiveness of dividing portfolio composition into high-frequency (10, 30, and 60 m) and low-frequency timeframes (daily). Additionally, the ANOVA results indicate that the strength of returns depends on the timeframe-based classification. For example, during the training period, BTCUSDT recorded returns of 16.624%, 17.036%, and 17.210% corresponding to high-frequency timeframes in contrast to the relatively low return of 6.594% corresponding to low-frequency timeframes. ETHUSDT also exhibits high returns corresponding to high-frequency timeframes, and reduced returns corresponding to the daily timeframe.

This study not only proposes a novel trading system for the cryptocurrency futures market based on high-frequency and low-frequency data analysis but also contributes to the existing literature by performing ANOVA and applying portfolio construction methods tailored to various timeframes. In the past, trading strategies did not sufficiently consider the impact of different timeframes on performance, leading to many cases where they could not be effectively applied to markets with diverse data characteristics. The proposed trading system overcomes these limitations using various analysis methods, such as RL, ANOVA, and post hoc tests, to distinguish between high- and low-frequency trading strategies, thereby reflecting the unique volatility and data patterns of the cryptocurrency market. The overall results indicate that CPTS exhibits immense potential as a complementary trading system in the cryptocurrency market, by accounting for varying timeframes during portfolio construction. Although this approach does not always guarantee optimal trading outcomes, it can assist traders and investors in exploring alternative perspectives while developing trading systems or decision-support tools.

However, several limitations should be acknowledged. First, the experimental and evaluation period was relatively short, and, consequently, long-term market trends and volatility may not have been sufficiently captured. Second, the utilization of on-chain data is significantly limited, as we consider only the circulating supply of USDT and do not incorporate various other critical on-chain indicators. Future research would benefit from incorporating a broader range of on-chain metrics such as active addresses (indicating network usage and adoption), transaction volume and velocity (reflecting network activity), hash rate (for proof-of-work cryptocurrencies, indicating network security), network value to transactions ratio (NVT, serving as a valuation metric), realized market capitalization, and social sentiment indicators from various platforms. These additional on-chain indicators could provide richer insights into cryptocurrency market dynamics, network health, adoption trends, and investor behavior, potentially enhancing the predictive power of our model. Third, as the cryptocurrencies used are not categorized during experimentation, developing strategies that reflect the characteristics of each category is challenging. Fourth, due to computational constraints and data limitations, we could not conduct comprehensive comparisons with buy-and-hold strategies or cryptocurrency market indices, which would provide more meaningful benchmarks for evaluating the practical effectiveness of our CPTS framework. Fifth, future research should include more extensive hyperparameter search for the A2C algorithm to optimize performance further. Sixth, the evaluation based solely on returns does not capture risk-adjusted performance, which is crucial in volatile cryptocurrency markets. Future research should incorporate comprehensive risk-adjusted performance metrics such as the Sharpe ratio, Sortino ratio, maximum drawdown analysis, and Value at Risk (VaR) to provide more thorough evaluation of trading strategies in highly volatile markets.

Future research should enhance the performance of the trading system by conducting training and testing over longer periods, incorporating the broader range of on-chain data indicators mentioned above, and experimenting with a larger and more diverse set of cryptocurrencies. Additionally, implementing comprehensive risk-adjusted performance analysis, including the Sharpe ratio and other risk metrics, would significantly enhance the evaluation framework. Furthermore, future studies should include meaningful benchmarks such as buy-and-hold strategies, equal-weight portfolios, and cryptocurrency market indices to provide more comprehensive performance evaluation and establish the practical value of algorithmic trading approaches.

Subsequent research efforts should also focus on conducting comprehensive comparisons with other reinforcement learning algorithms such as DDPG, PPO, and ensemble methods, along with extended training and testing periods. Additionally, exploring the concept of ‘market time’ could provide valuable insights into our timeframe-dependent results. The relationship between sampling frequency, market microstructure characteristics such as the Hurst exponent, and performance outcomes warrants investigation using market time scaling approaches.

Author Contributions

Conceptualization, S.J.L.; methodology, J.H.C. and S.J.L.; software, J.H.C.; validation, S.J.L.; formal analysis, J.H.C. and S.J.L.; investigation, J.H.C. and S.J.L.; data curation, J.H.C. and S.J.L.; writing—original draft preparation, S.J.L.; writing—review and editing, S.J.L.; visualization, J.H.C.; supervision, S.J.L.; project administration, S.J.L.; funding acquisition, S.J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Research Grant of Kwangwoon University in 2024.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are available in Google Drive at the following address: [https://drive.google.com/file/d/1ulvv04UHdL6QaOnWdINFYVgVCqrAu5ne/view?usp=drive_link (accessed on 26 August 2025)]. Access is open for research purposes.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CPTS	Cryptocurrency portfolio trading system
A2C	Advantage actor–critic
ANOVA	Analysis of variance
DQN	Deep Q-network
PnL	Profit and loss

References

Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System. Bitcoin White Paper. 2008. Available online: https://bitcoin.org/bitcoin.pdf (accessed on 17 August 2025).
Buterin, V. A Next-Generation Smart Contract and Decentralized Application Platform. Ethereum White Paper. 2014. Available online: https://ethereum.org/en/whitepaper/ (accessed on 17 August 2025).
Corbet, S.; Lucey, B.; Yarovaya, L. Datestamping the Bitcoin and Ethereum Bubbles. Fin. Res. Lett. 2018, 26, 81–88. [Google Scholar] [CrossRef]
Liang, Z.; Chen, H.; Zhu, J.; Jiang, K.; Li, Y. Adversarial Deep Reinforcement Learning in Portfolio Management. arXiv 2018, arXiv:1808.09940. [Google Scholar] [CrossRef]
Tether. Tether: Fiat Currencies on the Bitcoin Blockchain. Tether White Paper. 2021. Available online: https://assets.ctfassets.net/vyse88cgwfbl/5UWgHMvz071t2Cq5yTw5vi/c9798ea8db99311bf90ebe0810938b01/TetherWhitePaper.pdf (accessed on 17 August 2025).
Au, C.H.; Hsu, W.S.; Shieh, P.H.; Yue, L. Can Stablecoins Foster Cryptocurrencies Adoption? J. Comput. Inf. Syst. 2024, 64, 360–369. [Google Scholar] [CrossRef]
Bullmann, D.; Klemm, J.; Pinna, A. Search for Stability in Crypto-Assets: Are Stablecoins the Solution? SSRN: Rochester, NY, USA; European Central Bank (ECB): Frankfurt am Main, Germany, 2019. [Google Scholar] [CrossRef]
Baur, D.G.; Hong, K.; Lee, A.D. Bitcoin: Medium of Exchange or Speculative Assets? J. Int. Financ. Mark. Inst. Money 2018, 54, 177–189. [Google Scholar] [CrossRef]
Lyons, R.K.; Viswanath-Natraj, G. What Keeps Stablecoins Stable? J. Int. Money Fin. 2023, 131, 102777. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement Learning in Robotics: A Survey. Int. J. Robot. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef]
Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
Li, Y. Deep Reinforcement Learning: An Overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. arXiv 2016, arXiv:1602.01783. [Google Scholar] [CrossRef]
Yli-Huumo, J.; Ko, D.; Choi, S.; Park, S.; Smolander, K. Where Is Current Research on Blockchain Technology?—A Systematic Review. PLoS ONE 2016, 11, e0163477. [Google Scholar] [CrossRef]
Carlsten, M.; Kalodner, H.; Weinberg, S.M.; Narayanan, A. On the Instability of Bitcoin without the Block Reward. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, 24–28 October 2016; ACM: New York, NY, USA, 2016; pp. 154–167. [Google Scholar] [CrossRef]
Kroll, J.A.; Davey, I.C.; Felten, E.W. The Economics of Bitcoin Mining, or Bitcoin in the Presence of Adversaries. In Proceedings of the 12th Workshop on the Economics of Information Security (WEIS 2013), Washington, DC, USA, 10–11 June 2013. [Google Scholar]
Clark, J.A.; Miller, A.; Bonneau, J.; Narayanan, A.; Felten, E.W. Research Perspectives and Challenges for Bitcoin and Cryptocurrencies. In Proceedings of the IEEE Symposium on Security and Privacy, San Jose, CA, USA, 17–21 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 104–121. [Google Scholar]
Wood, G. Ethereum: A Secure Decentralised Generalised Transaction Ledger. Ethereum Yellow Paper. 2014. Available online: https://ethereum.github.io/yellowpaper/paper.pdf (accessed on 17 August 2025).
Corbet, S.; Lucey, B.; Urquhart, A.; Yarovaya, L. Cryptocurrencies as a Financial Asset: A Systematic Analysis. Int. Rev. Financ. Anal. 2019, 62, 182–199. [Google Scholar] [CrossRef]
Kristoufek, L. What Are the Main Drivers of the Bitcoin Price? Evidence from Wavelet Coherence Analysis. PLoS ONE 2015, 10, e0123923. [Google Scholar] [CrossRef] [PubMed]
McNally, S.; Roche, J.; Caton, S. Predicting the Price of Bitcoin Using Machine Learning. In Proceedings of the 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), Cambridge, UK, 21–23 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 339–343. [Google Scholar] [CrossRef]
Ji, S.; Kim, J.; Im, H. A Comparative Study of Bitcoin Price Prediction Using Deep Learning. Mathematics 2019, 7, 898. [Google Scholar] [CrossRef]
Deng, Y.; Bao, F.; Kong, Y.; Ren, Z.; Dai, Q. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 653–664. [Google Scholar] [CrossRef]
Carta, S.; Corriga, A.; Ferreira, A.; Podda, A.S.; Recupero, D.R. A Multi-Layer and Multi-Ensemble Stock Trader Using Deep Learning and Deep Reinforcement Learning. Appl. Intell. 2021, 51, 889–905. [Google Scholar] [CrossRef]
Fischer, T. Reinforcement Learning in Financial Markets—A Survey. FAU Discussion Papers in Economics. 2018. Available online: https://www.econstor.eu/bitstream/10419/183139/1/1032172355.pdf (accessed on 17 August 2025).
Jiang, Z.; Liang, J. Cryptocurrency Portfolio Management with Deep Reinforcement Learning. In Proceedings of the 2017 Intelligent Systems Conference (IntelliSys), London, UK, 7–8 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 905–913. [Google Scholar] [CrossRef]
Zhao, D.; Rinaldo, A.; Brookins, C. Cryptocurrency Price Prediction and Trading Strategies Using Support Vector Machines. arXiv 2019, arXiv:1911.11819. [Google Scholar] [CrossRef]
Henderson, P.; Islam, R.; Bachman, P.; Pineau, J.; Precup, D.; Meger, D. Deep Reinforcement Learning That Matters. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; AAAI Press: Menlo Park, CA, USA, 2018. [Google Scholar] [CrossRef]
Watkins, C.J.; Dayan, P. Q-Learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Advances in Neural Information Processing Systems 12; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
Konda, V.R.; Tsitsiklis, J.N. Actor-Critic Algorithms. In Advances in Neural Information Processing Systems 13; MIT Press: Cambridge, MA, USA, 2000; pp. 1008–1014. [Google Scholar]
Pricope, T.V. Deep Reinforcement Learning in Quantitative Algorithmic Trading: A Review. arXiv 2021, arXiv:2106.00123. [Google Scholar] [CrossRef]
Chan, E.P. Algorithmic Trading: Winning Strategies and Their Rationale; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar] [CrossRef]
Kaufman, P.J. Trading Systems and Methods, 5th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Pardo, R. The Evaluation and Optimization of Trading Strategies; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Park, C.H.; Irwin, S.H. What Do We Know about the Profitability of Technical Analysis? J. Econ. Surv. 2007, 21, 786–826. [Google Scholar] [CrossRef]
Aldridge, I. High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Murphy, J.J. Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications; Penguin: New York, NY, USA, 1999. [Google Scholar]
Wilder, J.W. New Concepts in Technical Trading Systems; Trend Publishing Research: Greensboro, NC, USA, 1978. [Google Scholar]
Tsay, R.S. Analysis of Financial Time Series; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
Chincarini, L.B. Quantitative Equity Portfolio Management: An Active Approach to Portfolio Construction and Management; McGraw-Hill: New York, NY, USA, 2006. [Google Scholar]
Gu, S.; Kelly, B.; Xiu, D. Empirical Asset Pricing via Machine Learning. Rev. Financ. Stud. 2020, 33, 2223–2273. [Google Scholar] [CrossRef]
Dixon, M.F.; Halperin, I.; Bilokon, P. Machine Learning in Finance: From Theory to Practice; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
Markowitz, H. Portfolio Selection. J. Finance 1952, 7, 77–91. [Google Scholar] [CrossRef]
De Prado, M.L. Advances in Financial Machine Learning; John Wiley & Sons: Hoboken, NJ, USA, 2018. [Google Scholar]
Zhang, Z.; Zohren, S.; Roberts, S. Deep Reinforcement Learning for Trading. arXiv 2020, arXiv:1911.10107. [Google Scholar] [CrossRef]
Feng, W.; Wang, Y.; Zhang, Z. Informed Trading in the Bitcoin Market. Fin. Res. Lett. 2018, 26, 63–70. [Google Scholar] [CrossRef]
Silantyev, E. Order Flow Analysis of Cryptocurrency Markets. Digit. Finance 2019, 1, 191–218. [Google Scholar] [CrossRef]
Dyhrberg, A.H. Bitcoin, Gold and the Dollar––A GARCH Volatility Analysis. Fin. Res. Lett. 2016, 16, 85–92. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]

Figure 1. Construction procedure for the proposed CPTS framework.

Table 1. Ticker symbols of cryptocurrencies used in this study.

Ticker Symbol	Cryptocurrency
BTCUSDT	Bitcoin
ETHUSDT	Ethereum
ADAUSDT	Cardano
SOLUSDT	Solana
THETAUSDT	Theta Network
NEARUSDT	NEAR Protocol
BNBUSDT	Binance Coin
AVAXUSDT	Avalanche
DOTUSDT	Polkadot
BCHUSDT	Bitcoin Cash
TRXUSDT	TRON
FTMUSDT	Fantom
ALGOUSDT	Algorand
EGLDUSDT	Elrond
ATOMUSDT	Cosmos
XTZUSDT	Tezos
KAVAUSDT	Kava
ENJUSDT	Enjin Coin

Table 2. Average ROR corresponding to the 10 min, 30 min, and 60 min timeframes during the training period.

Ticker Symbol	10 min	30 min	60 min	Daily
ADAUSDT	6.700	5.939	6.515	2.829
ALGOUSDT	0.237	0.198	0.186	0.931
ATOMUSDT	7.458	6.901	7.043	3.191
AVAXUSDT	1.789	1.595	1.543	1.499
BCHUSDT	5.315	5.332	4.976	3.201
BNBUSDT	16.572	16.832	19.048	7.59
BTCUSDT	16.624	17.036	17.21	6.594
DOTUSDT	3.747	3.735	3.852	2.119
EGLDUSDT	2.322	2.362	2.353	1.881
ENJUSDT	1.614	1.496	1.407	1.103
ETHUSDT	12.522	13.552	13.618	6.107
FTMUSDT	2.572	2.128	2.241	1.105
KAVAUSDT	2.562	2.708	2.486	1.439
NEARUSDT	1.081	1.059	1.04	0.744
SOLUSDT	1.041	0.988	1.038	1.322
THETAUSDT	3.322	3.676	3.767	2.485
TRXUSDT	27.092	26.416	26.015	10.526
XTZUSDT	3.885	3.570	3.936	1.950

Table 3. ANOVA results corresponding to portfolio construction during the training period.

Source of Variation	Sum of Squares	Degrees of Freedom	Mean Square	F	p-Value
Timeframe	216,834.9	3	72,278.29	15.563	0.000
Residual	6,669,306	1436	4644.36

Table 4. Post-analysis results for the training period (Dunn’s test).

Comparison Group	Mean Difference (%)	p-Value
10–30 m	2.987	0.319
10–60 m	−0.545	0.989
30–60 m	−3.532	0.181
10 m–1 d	−48.991	0.000
30 m–1 d	46.005	0.000
60 m–1 d	49.537	0.000

Table 5. Cryptocurrencies in the high-frequency group selected for the portfolio.

Ticker Symbol	Average ROR (%)
TRXUSDT	26.51
BNBUSDT	17.48
BTCUSDT	16.96
ETHUSDT	13.23
ATOMUSDT	7.13
ADAUSDT	6.51
BCHUSDT	5.31
XTZUSDT	3.94
THETAUSDT	3.77
DOTUSDT	3.75

Table 6. Cryptocurrencies in the low-frequency group selected for the portfolio.

Ticker Symbol	Average ROR (%)
TRXUSDT	10.53
BNBUSDT	7.59
BTCUSDT	6.59
ETHUSDT	6.11
BCHUSDT	3.20
ATOMUSDT	3.19
ADAUSDT	2.83

Table 7. Average RORs of cryptocurrencies in the high-frequency group selected for the portfolio during the testing period.

Ticker Symbol	Average ROR (%)
Ticker Symbol	10 min (%)	30 min (%)	60 min (%)
ADAUSDT	3.256	3.329	3.356
ATOMUSDT	7.396	7.996	3.458
BCHUSDT	7.897	7.997	3.458
BNBUSDT	7.489	8.000	7.989
BTCUSDT	3.358	7.990	3.458
DOTUSDT	3.123	7.996	3.423
ETHUSDT	3.423	7.997	3.423
THETAUSDT	3.223	7.998	3.423
TRXUSDT	9.512	8.003	8.012
XTZUSDT	3.058	7.992	3.458
Averages	5.174	7.530	4.346

Table 8. Average RORs of the cryptocurrencies in the low-frequency group selected for the portfolio during the testing period.

Ticker Symbol	Average ROR (%)
Ticker Symbol	Daily
ADAUSDT	47.491
ATOMUSDT	32.114
BCHUSDT	69.761
BNBUSDT	28.571
BTCUSDT	49.616
ETHUSDT	35.049
TRXUSDT	38.843
Average	43.064

Table 9. T-test results on the differences in ROR between the two groups.

Frequency	N	Mean ROR (%)	Standard Deviation	t-Test	p-Value
High	300	5.68	2.36	−19.377	0.000
Low	70	43.06	16.10

Table 10. Multifactor ANOVA results for the test problem.

Source of Variation	Sum of Squares	Degrees of Freedom	Mean Square	F	p-Value
Type of Frequency (A)	299,444.23	1	299,444.23	1157.75	0.001
Portfolio Selection (B)	2147.51	1	2147.51	8.30	0.004
AB Interaction	10,378.82	1	10,378.82	40.13	0.001
Residual	185,188.61	716	258.64

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Cryptocurrency Futures Portfolio Trading System Using Reinforcement Learning

Abstract

1. Introduction

2. Background and Related Works

2.1. Blockchain and Cryptocurrency

2.2. Reinforcement Learning

2.3. System Trading and Portfolio Management

3. Materials and Methods

3.1. Stage 1: Construction of Input Dataset

3.2. Stage 2: Model Training and Portfolio Construction

3.3. Stage 3: Evaluation of CPTS

4. Experiments

4.1. Data Description

4.2. CPTS Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics