3.1. Data Preparation and Feature Engineering
This research selects a representative sample of ten leading technology stocks listed on the NASDAQ market: AAPL, AMD, AMZN, AVGO, GOOG, META, MSFT, NVDA, ORCL, and TSLA. These firms hold dominant positions in the global technology sector and exhibit high liquidity, substantial market capitalization, and pronounced sensitivity to market fluctuations, making them well-suited for evaluating predictive models. Daily market data spanning from January 2017 to December 2024 are obtained from official NASDAQ sources [
41]. This period encompasses multiple market regimes, including periods of bullish momentum, high volatility, and sector-wide corrections, thereby providing a robust basis for model validation.
In contrast to conventional approaches that primarily focus on univariate targets such as directional price movements or Closing Price Differences, this research introduces an innovative flexible target prediction framework. It is designed to simultaneously capture complementary aspects of price dynamics through the following predictive objectives: the next-day’s Closing price Difference (
), the next-day’s MA Difference (
) and the next-day’s EMA Difference (
). Among them, both MA and EMA are selected as the moving average values of the past five days. This flexible target design enables us to evaluate the relative performance of the model in predicting absolute price movements (
) and smoothing trend momentum (
,
). The specific calculation formula for the dependent variable is as follows:
To quantify the uncertainties associated with the next-day
,
, and
, we employ information entropy as the theoretical foundation for our calculations. Information entropy is a fundamental concept in information theory that measures the uncertainty inherent in a random variable [
42]. For a discrete random variable X taking possible values
with a corresponding probability distribution
, the information entropy
is defined as:
The information entropy is measured in bits. A higher entropy value indicates greater uncertainty associated with the variable, which corresponds to a larger amount of embedded information and increased difficulty in predicting its fluctuation pattern. Conversely, a lower entropy reflects a more stable time series with reduced uncertainty. Among the three difference variables examined in this research, we assign a value of −1 when the difference is negative and 1 when it is positive. Following this binary discretization, the probabilities
and
are derived, with their sum equaling 1. Based on the formal definition of information entropy, the entropy for each of the three price differences is computed, as summarized in
Table 1. The results indicate that the entropy of the
is higher than that of the
and the
, implying greater predictive uncertainty. In contrast, the smoothed EMA and MA differences exhibit lower entropy and reduced uncertainty, providing a empirical rationale for selecting appropriate forecasting schemes based on different MA types.
The feature engineering framework employed in this research is designed to comprehensively capture information pertaining to market price trends, momentum, volatility, and trading volume. For each of the three dependent variables under consideration, we construct a set of trend and momentum attributes. These include Close, MA and EMA computed over multiple historical windows, which are widely adopted in financial forecasting to represent near-term, medium-term, and long-term price behaviors. While longer horizons such as the 50 days MA and 200 days MA are used as input variables [
43,
44], another recent paper incorporates averaged trading volumes as predictive features [
45]. In our setup, the 5 days, 10 days, and 20 days MA or EMA are selected as inputs, corresponding to short-term, medium-term, and long-term trend signals commonly used in actual trading environments. Prior to feature construction, initial data cleaning is performed to remove missing values. In addition, to mitigate the influence of scale variation across variables, all numerical features are standardized using Z-score normalization, which enhances model stability and convergence during training.
For the dependent variable, we construct the Closing prices for the past five days (, , …, ) and its daily price difference (, …, ).
For the dependent variable , we construct the MA prices for the past five days (, , …, ) including the consecutive 5 days MA, 10 days MA, and 20 days MA, and its daily degree difference (, …, ).
For the dependent variable , we construct the EMA prices for the past five days (, , …, ) including the consecutive 5 days EMA, 10 days EMA, and 20 days EMA, and its daily degree difference (, …, ).
To construct a predictive model capable of comprehensively capturing market dynamics, we construct a set of financial technical indicators spanning multiple dimensions, including price trends, momentum, volatility, trading volume, and market sentiment to provide a unified representation of market conditions for all models. The selected features include Bollinger Bands, Relative Strength Index (RSI), Average True Range (ATR), On-Balance Volume (OBV), trading volume over the past four days, trading volume difference, Commodity Channel Index (CCI), Stochastic Oscillator (SlowK and SlowD), and Moving Average Convergence Divergence (MACD).
Bollinger Bands, a widely recognized volatility indicator developed by John Bollinger, consist of a central MA line flanked by an upper band and a lower band. These bands are positioned at a distance determined by the standard deviation of price, allowing the indicator to adapt dynamically to changing market volatility conditions. In practice, when the price touches the upper band, it is often interpreted as an overbought signal; conversely, touching the lower band is generally viewed as an oversold signal [
46]. We incorporate both the upper and lower Bollinger Bands as input features. The calculations are performed as follows:
The RSI is used to measure the speed of price changes. It assesses whether an asset is in an overbought or oversold state by comparing the average increase and average decrease of the Closing price within a certain period [
47]. The RSI range is from 0 to 100. It is generally believed that a value above 70 indicates overbought conditions and a value below 30 indicates oversold conditions. It is calculated using the standard 14-days cycle.
The ATR indicator is used to measure the volatility of the market. It can be calculated based on the average difference of highest price and lowest price over a period of time, and can effectively capture the intensity of price fluctuations [
48]. When the ATR value rises, it indicates increasing volatility; when the ATR value drops, it suggests that the market is calming down. It is calculated using a 14-days cycle.
OBV is a momentum indicator that combines trading volume with price changes. Its movement can be used to confirm the price trend or detect divergence. When the Closing price rises, the trading volume of the day will be added to the OBV. When the Closing price drops, the trading volume of the day will be reduced by OBV [
49].
CCI is used to determine whether asset prices have deviated. It compares the current price with the average price over a period and divided by the average deviation during that period [
50]. The range of CCI usually varies between −100 and 100. Beyond this range, it may indicate a change in the strength of the trend.
SlowK and SlowD are momentum indicators that determine the strength of a trend and potential turning points by comparing the Closing price with the price range within a specific period of time. It consists of two lines: the fast line %K, and the slow line %D, which is the MA of %K [
51].
MACD shows the direction, momentum, duration and intensity of a trend by calculating the difference between two EMA lines of different periods [
52].
3.2. Model Construction and Training
This section outlines the predictive modeling framework developed in this research, which is designed as a flexible target integrated learning system. The framework consists of three core components: Ensemble Models, Fusion Models, and Transfer Learning-enhanced methods. To ensure robust and comparable performance, five well-established machine learning algorithms: AdaBoost, Decision Tree, LightGBM, Random Forest, and XGBoost are selected as base models. These models span a range of structures from individual tree-based methods to ensemble techniques founded on distinct principles such as bagging and boosting, enabling the capture of a broad spectrum of potential linear and nonlinear patterns inherent in financial data [
53]. To fully exploit the predictive capability of each model, hyperparameter optimization is systematically performed using a Grid Search strategy, with the principal objective of enhancing generalization performance. The detailed search spaces for the hyperparameters of each model are provided below.
AdaBoost: n_estimators (50, 100, 200, 300), learning_rate (0.001, 0.01, 0.1, 1).
Decision Tree: criterion (‘squared_error’, ‘friedman_mse’, ‘absolute_error’), max_depth (None, 10, 15, 20), max_features (None, ‘sqrt’, ‘log2’), min_samples_split (2, 5, 10), min_samples_leaf (2, 4, 8).
LightGBM: learning_rate (0.01, 0.05, 0.1), n_estimators (100, 200, 300), max_depth (3, 5, 7), reg_alpha (0, 0.1, 0.5), reg_lambda (0, 0.1, 1), feature_fraction (0.8, 0.9, 1.0).
Random Forest: n_estimators (50, 100, 200, 300), max_depth (None, 10, 20), max_features (‘auto’, ‘sqrt’, ‘log2’), min_samples_split (2, 5, 10).
XGBoost: n_estimators (25, 50, 100, 200), max_depth (3, 5, 8), subsample (0.6, 0.8, 1.0), reg_alpha (0, 0.1, 0.5, 1), reg_lambda (0, 0.1, 0.5, 1).
To integrate the strengths of diverse base models, this research implements three established Fusion Model strategies: Voting, Stacking, and Blending. For consistency and comparability, the same set of optimized Ensemble Models serves as the first-layer base models across all fusion approaches. In the Voting method, the regression predictions from all base models for future price differences are aggregated by computing their arithmetic mean, which is then adopted as the final fused output. The Stacking strategy employs a five-fold cross-validation procedure to generate out-of-fold predictions from the base models, which are subsequently used as meta-features. A Linear Regression model acts as the meta model to avoid overfitting problem, trained to optimally combine these predictions. For the Blending method, the training set is partitioned into a base model training subset and a held-out validation set. Predictions from the base models on this validation set form the meta-features used to train the meta model, reducing the risk of information leakage compared to Stacking.
The methodological innovation of this research lies in integrating Dynamic Time Warping (DTW) distance with Fusion Models, thereby introducing a Transfer Learning-enhanced ensemble approach for stock price prediction. In the context of time series prediction, DTW serves as a robust similarity measure that quantifies the alignment between sequences, even in the presence of nonlinear temporal distortions, thus enabling more informed knowledge transfer from source to target domains. By computing the optimal warping path between two sequences, DTW effectively captures similarities in shape, trend, and fluctuation patterns, which is critical for identifying suitable source domains from candidate datasets. Now DTW has been widely adopted in non-financial domains such as wind power prediction and energy forecasting [
54,
55,
56]. And in stock prediction, DTW is also used to measure the similarity between different stocks for Transfer Learning [
57]. In our research, DTW is employed to measure pairwise similarity among ten American technology stocks, all of which operate in related market segments and exhibit strong real world business linkages and correlated price movements. For each target stock, the DTW distances to the other nine stocks are computed based on their Closing sequences from 2017 to 2021. The resulting distance
is used to assign a transfer weight
, defined as the reciprocal of the distance:
We introduce three integrated variants of Fusion Models combined with Transfer Learning: Transfer Voting, Transfer Stacking, and Transfer Blending. For each source stock i, a distinct Fusion Model is independently trained to predict a target variable specific to that source domain. This process yields a set of source-specific models
, each capturing domain-specific predictive characteristics. The final prediction for a target stock is obtained by computing a weighted aggregation of the predictions from all nine source-domain models. The weight
assigned to each source model
is determined by the inverse of the DTW distance between the source stock i and all the target stocks, thereby assigning greater influence to models originating from more similar market dynamics. The aggregated prediction is formally expressed as:
This approach ensures that predictions from models trained on source stocks exhibiting higher dynamic pattern similarity with lower DTW distance are assigned greater influence in the final aggregated forecast. To evaluate the effectiveness of the proposed modeling frameworks, we compare the Transfer Learning-enhanced variants against two baseline categories: Fusion Models trained solely on the target stock’s data, and Ensemble Models. Predictive performance is quantified using four widely adopted regression metrics: the coefficient of determination (R Squared), Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). These metrics collectively assess the deviation between predicted and actual values from complementary perspectives. The R Squared metric measures the proportion of variance in the dependent variable that is predictable from the independent variables. A value closer to 1 indicates a better model fit, meaning the model more effectively explains the variability of the target output. The MAE, MSE, and RMSE reflect different aspects of prediction error, with values closer to zero signifying higher accuracy. MAE provides a linear score of the average error magnitude, offering an intuitive interpretation. In contrast, MSE and RMSE, by squaring the errors, assign a disproportionately higher penalty to large prediction errors, thereby highlighting the model’s robustness to outliers. The formulas for calculating these metrics are provided below:
3.3. Quantitative Trading Strategy
Quantitative trading represents a systematic investment approach that relies on computational systems to execute automated trading decisions. By analyzing historical market data and identifying statistical patterns, this methodology aims to minimize the impact of investors’ subjective emotions, thereby improving investment efficiency and stability through disciplined strategy implementation. The innovation of the flexible target machine learning quantitative trading strategy proposed in this research lies in its comprehensive integration of multiple technical indicators to forecast short-term price movements, followed by rule-based trading operations derived from these predictions.
Innovatively, we employs three distinct price-based indicators:
,
and
as prediction targets to capture market momentum characteristics across multiple time horizons. Five Ensemble Models are utilized for prediction, and their outputs are integrated through Voting, Stacking, and Blending. To further enhance model robustness, Transfer Learning is incorporated to develop upgraded variants of each fusion approach. In total, eleven prediction models are constructed. Each model generates forecasts for the three target variables, yielding 33 distinct prediction results that collectively form a comprehensive quantitative trading signal. The trading strategy is designed as follows: a predicted positive difference for the next day is interpreted as a bullish signal, triggering a buy order when no position is held. Conversely, a predicted negative difference is treated as a bearish signal, prompting a sell order when a position exists. This logic is grounded in the assumption that a positive difference indicates strengthening momentum or an emerging uptrend, while a negative value suggests potential momentum decay or a trend reversal [
58].
To evaluate the effectiveness of the trading strategy, we employ a backtesting framework to simulate real-market trading conditions. The simulation is implemented using the Backtrader platform, and the initial capital is
$100,000, the commission for trading stocks is 0.025%, and slippage is also set. Annualized Return and Maximum Drawdown are selected as core performance metrics. The Annualized Return reflects the expected return of the strategy over a one-year horizon. A higher value indicates stronger profitability. It is calculated as follows:
Maximum Drawdown measures the largest peak-to-trough decline in portfolio value during the investment period, serving as a key indicator of strategy risk. A smaller Maximum Drawdown implies better capital preservation and lower downside risk.