A Multi-Model Machine Learning Framework for Daily Stock Price Prediction

Rai, Bharatendra; Soltanisehat, Leili

doi:10.3390/bdcc9100248

Open AccessArticle

A Multi-Model Machine Learning Framework for Daily Stock Price Prediction

by

Bharatendra Rai

^*

and

Leili Soltanisehat

Department of Decision and Information Sciences, Charlton College of Business, University of Massachusetts Dartmouth, 285 Westport Road, North Dartmouth, MA 02747, USA

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2025, 9(10), 248; https://doi.org/10.3390/bdcc9100248

Submission received: 26 August 2025 / Revised: 22 September 2025 / Accepted: 23 September 2025 / Published: 28 September 2025

(This article belongs to the Topic Electronic Communications, IOT and Big Data, 2nd Volume)

Download

Browse Figures

Versions Notes

Abstract

Stock price prediction remains a challenging problem due to the inherent volatility and complexity of financial markets. This study proposes a multi-model machine learning framework for one-day-ahead stock price prediction using thirty-six features derived from technical indicators. Empirical analysis is conducted on data from Apple, Tesla, and NVIDIA, employing nine classification algorithms, including support vector machines, random forests, extreme gradient boosting, and logistic regression. Results indicate that momentum-based indicators are the most influential predictors. While support vector machines achieve the highest accuracy for Apple, extreme gradient boosting performed best for NVIDIA and Tesla. In addition, explainable AI techniques are applied to interpret individual model predictions, thereby enhancing transparency and trust in the results. The study contributes to financial analytics research by providing a comparative evaluation of diverse machine learning methods and highlighting key indicators critical for short-term stock price forecasting.

Keywords:

machine learning; stock price prediction; model performance

1. Introduction

Stock price prediction is a well-established yet challenging research domain that has generated significant interest among scholars. Two key theoretical foundations dominate this field: the Efficient Market Hypothesis (EMH), which asserts that current prices fully reflect all relevant information [1], and the Random Walk Hypothesis, which suggests that past price movements have no predictive power for future prices [2]. Prediction approaches typically rely on fundamental analysis, which examines macroeconomic indicators, company financials, and market position, or technical analysis, which studies historical price and volume data along with demand–supply dynamics.

Technical analysis often employs quantitative technical indicators—such as Moving Average (MA), Moving Average Convergence/Divergence (MACD), Relative Strength Index (RSI), and stochastic oscillators—to capture market trends and potential reversals [3,4,5,6]. While many indicators are available, excessive use can complicate decision-making for traders. Predictive modeling can address this challenge by combining selected indicators to classify stock movements, typically into two classes (up, down) or three classes (up, neutral, down). Model performance is influenced by factors such as indicator selection, classification method, and prediction horizon, which may range from minutes to days in short-term forecasts, or weeks to months in long-term forecasts [7].

Daily closing prices are a common and easily accessible input for such models, obtainable from platforms like Yahoo Finance. However, for short-term predictions, extensive historical data may be less useful due to evolving market conditions, technological advancements, and changes in company strategy. For example, while Apple’s stock existed in the 1990s, its price dynamics and investor sentiment today are shaped by innovations like the iPhone. Such shifts underscore the importance of using recent, contextually relevant data in developing effective stock prediction models.

In this paper, we evaluate the performance of nine widely used machine learning models—including Extreme Gradient Boosting (XGBoost), Random Forest (RF), and Support Vector Machine (SVM)—using thirty-six technical indicators derived from historical stock price data. The predictive models are designed to generate one-day-ahead forecasts of stock price direction, classifying movements as up, down, or neutral. The study focuses on three prominent companies—Apple, Tesla, and NVIDIA—selected to represent different stages in their corporate lifecycle. Model performances are assessed using multiple evaluation metrics to provide a comprehensive comparison.

The remainder of the paper is organized as follows: Section 2 presents a literature review relevant to this research. Section 3 outlines the three phases of overall framework employed for developing the one-day-ahead prediction models. Section 4 describes the data collection process and the technical indicator-based feature extraction. Section 5 details the application of nine machine learning methods and their performance evaluation. Section 6 discusses the results and key findings. Finally, Section 7 concludes the paper with a summary of contributions, limitations and potential directions for future research.

2. Literature Review

The prediction of financial asset returns is a multidisciplinary subject drawing upon financial econometrics, investment analysis, corporate finance, and, more recently, behavioral finance. The roots of technical analysis can be traced to 18th century Japan, where Munehisa Homma pioneered the candlestick charting method for analyzing rice market trends [8]. In the context of the modern stock market, one of the earliest academic contributions is Cowles’ analysis of 45 professional forecasting agencies, which examined their ability to identify stocks with superior investment potential or to predict overall price movements [9].

Subsequent work has explored the relevance of technical analysis across various asset classes. Menkhoff and Taylor investigated technical analysis in foreign exchange markets, highlighting that markets may not always be fully rational and discussing why mispricing and arbitrage opportunities may persist [10]. Zhu and Zhou emphasized the role of technical analysis in model specification, demonstrating that moving average-based trading strategies can outperform optimal allocation rules when predictive uncertainty exists, and priors are not overly informative [11]. Their findings suggest that technical analysis can add value even to widely used asset allocation strategies when returns are predictable.

Technical analysis has also been integrated with computational intelligence methods. Gorgulho et al. applied genetic algorithms to manage financial portfolios using technical indicators during the 2003–2009 period, which included the global financial crisis. In their framework, initial random individuals represented alternative asset classification models, though the results indicated performance comparable to random trading strategies over the full study period [12]. Chang et al. introduced evolved partially connected neural networks (EPCNNs) to forecast stock price trends from technical indicators [13]. The EPCNN architecture employed random neuron connections, multiple hidden layers, genetically evolved weights, and a sinusoidal activation function, achieving promising results in financial time series forecasting while addressing limitations of gradient descent learning.

Ensemble learning approaches have also been extensively examined. Ballings et al. compared ensemble models—Random Forest, AdaBoost, and Kernel Factory—with single classifiers such as Neural Networks, Logistic Regression, Support Vector Machines (SVM), and K-Nearest Neighbor, using data from 5767 publicly listed European companies for one-year-ahead predictions [14]. Model performance was evaluated using the area under the ROC curve, with Random Forest achieving the highest median value, followed by SVM, Kernel Factory, and AdaBoost, indicating that ensemble methods often outperform single classifiers. Patel et al. analyzed the Indian stock market, predicting both stock movements and index direction using Artificial Neural Networks, SVM, Random Forest, and Naïve Bayes [15]. They evaluated two input approaches—technical parameters computed from ten years of historical data and trend representations of those parameters—finding that Random Forest performed best in the first approach, and that all models improved when using trend-based inputs.

Hybrid techniques have also been proposed for multi-class classification in trading decisions. Dash and Dash used six technical indicators as inputs to a computationally efficient Functional Link Artificial Neural Network (CEFLANN) to generate continuous trading signals in the range of 0–1 [16]. These signals were mapped to three discrete classes—buy, hold, and sell—demonstrating the potential of CEFLANN models in capturing non-linear relationships between technical indicators and market trends.

3. Three Phases of This Study

In this study, nine widely used machine learning methods such as Support Vector Machine (SVM), Random Forest, and Extreme Gradient Boosting are used for developing classification models and the overall process is divided into three phases (see Figure 1).

The first phase involves feature extraction where data are obtained from publicly available source and then technical indicators are arrived at using these data. This study uses a quantitative financial modeling framework available from R programming package “quantmod” to obtain data that are sourced from Yahoo. These data are for Apple, NVIDIA, and Tesla for the time period from 1 January 2018 to 30 June 2025. Using these data, 36 technical indicator-based features are extracted. For model development in phase-2, machine learning algorithms are applied using features obtained in phase-1 as input variables and response classified as up, down or neutral as output variable. For developing these models, a subset of the data called the training dataset is used. In phase three model assessment is performed using testing dataset based on the results obtained in phase-2.

4. Feature Extraction Process

This section provides details about the dataset used for the study, process used for extracting features, and criteria used for classifying the response variable as up, down, and neutral.

4.1. Data

The dataset used in this study is based on a 90-month time period from 1 January 2018 to 30 June 2025. Data on three popular companies viz., Apple, NVIDIA, and Tesla were obtained from Yahoo Finance, which is a publicly available data source. The dataset includes opening price, closing price, high, low and trading volume for 1863 trading days. These data are used for arriving at technical indicators that provide investors with clues about future directions of the stock price. As an example, Figure 2 shows pattern of the market prices for Apple from January 2022 to June 2025.

Figure 2 shows stock prices of Apple to vary between about USD 120 and USD 260 from the beginning of 2022 to June 2025. When the stock price falls below a short-term 20-day moving average, depicted in blue color, usually the downtrend continues for many days. Similarly, when the prices cross the 20-day moving average, the stock price continues the uptrend for many days. Such patterns provide an investor with clues about the direction of future prices and help to make “buy” and “sell” decisions. Investors utilize various technical indicators to gain insights or clues that help in decision-making.

4.2. Thirty-Six Input Features

For developing the one-day-ahead prediction model, thirty-six input features are used in this study. These features are grouped in three categories as trend, momentum, and volatility indicators. Table 1 provides all 36 features used in this study and [17] covers various technical indicators with interesting examples in detail.

It is to be noted that initially there were historical data for 1863 trading days. However, due to calculations needed for the 36 features used in this study, data rows resulting in missing values were removed from the dataset. A total of the first 134 trading days of data was removed during the process. The final dataset consists of 1731 trading days.

4.3. Response Classification and Threshold

In this study, one-day-ahead data points are classified into one of the three categories, viz., up, down, or neutral. This classification is based on the comparison of the one-day-ahead closing stock price with an average of the previous five days of closing prices. When the difference percentage is greater than a predefined threshold value, the response is classified as “up”. And when it is below the threshold value, it is classified as “down”. All other cases are classified as “neutral”. This is illustrated in the equation below.

y_{t} = \{\begin{matrix} u p, i f 100 \times \frac{{C_{t} - M}_{n}}{M_{n}} > T \\ d o w n, i f 100 \times \frac{{C_{t} - M}_{n}}{M_{n}} < - T \\ n e u t r a l, i f - T \leq 100 \times \frac{{C_{t} - M}_{n}}{M_{n}} \leq T \end{matrix}

(1)

where

y_t is one-day-ahead classification of stock price in time period t;
M_n is the moving average of n days before t;
C_t is the closing stock price value on day t;
T is a percentage threshold (e.g., 0.5%).

In this study we use n = 5, which means the one-day-ahead closing stock price compared to the average of the closing stock prices of the previous five days. Four different threshold values used in this study are 0.5%, 1.0%, 1.5%, and 2.0%. Table 2 provides a summary of number of data points in each category for different threshold levels.

From Table 2, it is observed that changes in threshold value T causes the number of data points in the up, down, and neutral categories to change. An increase in threshold value leads more data points to be classified in the neutral category compared to the other two categories. This pattern can be observed for all three companies used in this study. However, the distribution of data points in the three categories is different for each company due to variability in the historical stock prices. For example, a threshold of 2% causes over 1000 trading days of data in the neutral category for Apple. On the other hand, a threshold of 2% for Tesla that exhibits higher volatility has only 552 trading days of data in the neutral category. The last column of the table provides an imbalance ratio based on the number of maximum data points divided by the number of minimum data points. Higher imbalance ratio values suggest an uneven spread of data points in the three categories, whereas lower imbalance ratio values indicate an even spread of data points. For example, Tesla with T = 0.5% has an imbalance ratio of 5.99, indicating that the “up” category with 862 data points has approximately 6 times more trading days than the “neutral” category that only has data for 144 trading days. Using the imbalance ratio, we chose threshold values in such a way that for each company there was an approximately even spread of data points in the three categories. The threshold values used for Apple, NVIDIA, and Tesla based on minimum imbalance ratio values in this study are 1%, 2%, and 2%, respectively.

5. Model Development

To evaluate the effectiveness of the classification models, this study benchmarked nine widely used machine learning algorithms. A key challenge in applying these models lies in hyperparameter tuning, which plays a critical role in achieving reliable and accurate classifications. The results obtained in this research not only highlight the comparative performance of these models but also provide insights into the impact of systematic hyperparameter optimization. The nine models were selected to represent diverse learning paradigms, including tree-based methods (Random Forest, Decision Tree, Gradient Boosting, XGBoost), ensemble techniques (AdaBoost), distance-based and probabilistic approaches (K-Nearest Neighbors, Naïve Bayes), and linear classifiers (Logistic Regression, Support Vector Machine).

5.1. Hyperparameter Tuning Methods and Settings

To ensure both validity and predictive performance, this study employed a systematic hyperparameter tuning strategy across nine classification algorithms. Grid Search Cross-Validation (GridSearchCV) with 3-fold stratified cross-validation was applied, using a predefined hyperparameter grid informed by established best practices in the literature and domain expertise. Table 3 summarizes the tuning configurations, reflecting widely accepted standards in the financial prediction domain [18,19].

Appropriate hyperparameter tuning enhances predictive efficacy by mitigating the risks of overfitting or underfitting while balancing model complexity with computational cost (grid size). In the GridSearchCV approach, all parameter combinations within each grid were exhaustively evaluated using accuracy as the primary scoring metric. For models sensitive to input scaling—such as SVM, Logistic Regression, and KNN—features were standardized with StandardScaler. The tuning and evaluation process was conducted separately for each company-specific dataset, employing a fixed train–test split (samples 1–1200 for training and 1201–1732 for testing). The best estimator for each model was selected based on cross-validation accuracy, and final performance was assessed on the test set using accuracy, precision, recall, and F1-score. This systematic and reproducible framework ensures a fair comparison of model performance while effectively addressing the trade-offs between overfitting and variance in financial time series prediction.

5.2. Prediction Results

Table 4 shows the efficacy of the prediction models for each company, based on the performance metrics such as accuracy, precision, recall, F1-score, and training time.

Results summarized in Table 4 show the prediction performance for different stocks using various methods used in this study. The best method based on results obtained are highlighted in gray color. For the prediction of Apple stock, the SVM model has the highest level of accuracy, recall, and F1-score, being the best model in most of the metrics. Gradient Boosting shows a high precision level but comes with higher computational cost, highlighting a trade-off between predictive accuracy and computational efficiency. In addition, for NVIDIA’s stock prediction, the XGBoost method shows the overall better prediction performance in four main metrics including accuracy, precision, recall, and F1-score. For Tesla’s stock too, prediction performance suggests that the XGBoost model provides the best results in for accuracy, precision, recall, and F1-score.

Overall, these results indicate that machine learning models such as XGBoost and SVM show higher predictive power in terms of accuracy and reliability. This is because of the effective mechanism of SVM in finding clear decision boundaries in complex data and combining it with strong generalization ability with balanced classification metrics. Also, the Gradient Boosting method has high precision because of building many trees sequentially, while resulting in higher computational time. The other models such as Logistic Regression or Decision Trees may outperform the ensemble methods when faster training time or lower computational cost is a priority.

Model training related results presented in Figure 3 provide the reliability of cross-validation (CV) tuning by comparing the CV scores against out-of-sample test accuracy, for each model and across three companies.

The strong alignment between validation and test performance for Tesla (r = 0.904) suggests that its models are both robust and generalizable. Apple showed a moderate positive correlation (r = 0.668), with most models achieving accuracies in the 0.65–0.75 range. In contrast, NVIDIA exhibited a weaker correlation (r = 0.364), indicating that higher cross-validation scores did not consistently translate into better generalization on test data. This inconsistency is likely attributable to greater volatility and non-stationary behavior in NVIDIA’s stock.

Additionally, the trade-off between test precision and recall was examined across all models, as illustrated in Figure 4.

Among the three companies, Tesla demonstrated the most consistent performance, with a near-perfect linear correlation (r = 0.991) indicating balanced classification outcomes across models and minimal trade-offs between false positives and false negatives. Apple also achieved strong balance (r = 0.987), whereas NVIDIA’s results were more scattered (r = 0.523), reflecting inconsistent behavior likely driven by its higher volatility. Across all models, Naïve Bayes consistently underperformed, yielding high recall but very low precision—suggesting an excessive rate of false positives. These findings underscore the importance of evaluating multiple performance metrics such as precision, recall, and F1-score, rather than relying on accuracy alone. This is particularly critical for noisy financial data, where considering a single metric may result in a misleading judgment about model performance.

6. Model Assessment and Discussions

This section presents a comprehensive evaluation of model performance in classifying stock movements as up, down, or neutral. The analysis emphasizes comparisons across key technical indicator-based features, the three companies under study, and the nine machine learning models applied.

6.1. Feature Importance

Figure 5 highlights the top eight features that play the most significant role in driving predictive performance across Apple, Tesla, and NVIDIA.

Figure 5 indicates that momentum-based indicators consistently dominate predictive performance. Among them, the fastK (X17) stochastic oscillator emerged as the single most influential feature across all three companies, with average importance scores of 0.896, 0.920, and 0.928 for Apple, Tesla, and NVIDIA, respectively. Other key contributors—such as CCI (X11), pctB (X32), and slowD (X19)—further highlight the central role of stochastic and momentum indicators in capturing directional shifts under varying market conditions. While Apple and Tesla display nearly identical feature hierarchies, NVIDIA exhibits greater sensitivity to volatility, as reflected in the higher importance of price range–based indicators.

These findings align with the prior literature, which frequently categorizes technical indicators into market-based groups to enhance interpretability [20,21]. For example, stochastic oscillators such as fastK (X17) and fastD (X18) are effective in detecting momentum shifts and potential reversal points, whereas Bollinger Bands and support/resistance levels capture volatility patterns and psychological boundaries in pricing behavior.

Figure 6 shows the comprehensive result for feature importance analysis considering the combined categories.

It is observed from Figure 6 that momentum indicators have the highest average importance score (~0.049) followed by volatility indicators (~0.017) and trend indicators (~0.007) for the three stocks under study. It suggests that short-term momentum signals are much more influential than long-term trend or volatility measures for predicting the stock movement directions. In addition, for all three companies (Apple, NVIDIA, Tesla) momentum indicators show the highest average importance, with Apple slightly leading. Trend indicators have the lowest importance scores with Apple marginally higher than the other two companies. For the volatility indicators, Tesla shows a higher importance compared to Apple and NVIDIA. While momentum indicators dominate across all companies, Tesla relies more on volatility features than the others. According to the pie chart, over half of the most important features belong to momentum-based metrics, aligning with the performance results of the first graph. Trend and volatility play a smaller but still notable role in the predictions. The model preferences by category (stacked bar chart on the bottom right), also shows that different models prioritize categories differently. XGBoost generally values all categories more than decision tree, especially for volatility.

Overall, momentum indicators are the most critical feature category for stock movement prediction across all companies and models. While, Tesla’s models rely more on volatility indicators, Apple or NVIDIA rely more on momentum indicators.

6.2. Sensitivity for Down, Neutral and up Categories

The sensitivity metric, also called recall or true positive rate, is directly calculated from the confusion matrix. In a multi-class prediction problem, we can compute sensitivity for each class, showing how well the model identifies correct classes. Table 5 shows the results of sensitivity values for the down, neutral, and up categories.

It is observed from Table 5 that machine learning models struggle to provide high prediction performances in all three categories. In general, simple model such as Naive Bayes show extreme patterns with either perfect prediction on one class or total collapse on others, thereby indicating a lack of robustness. Ensemble methods such as Random Forest, Gradient Boosting, and XGBoost generally perform better but still suffer from inferior performance in one or two categories. Regularized models such as Logistic Regression and SVM also provide conservative performances one or two categories. This highlights difficulty in learning discriminative boundaries for the three categories “neutral”, “up”, and “down”.

For Apple, Naive Bayes predicts almost all categories as “neutral”, suggesting assumption of feature independence not holding and is a poor choice for capturing stock market patterns. Random forest provides balanced performance across classes and shows robustness due to averaging multiple trees, though performance for the “up” category remains weaker. KNN provides reasonably balanced prediction performance for the three categories. This suggests local neighborhood structure works well for Apple stock, but sensitivity to distance metric may cap performance. Gradient Boosting has strong prediction performance for “neutral”, but weak performance for “up” category. This suggests it captures some non-linear trends better than logistic regression.

For NVIDIA, SVM achieves the highest sensitivity when classifying the “down” category, but its performance declines sharply for “neutral” and “up”. This indicates that the decision boundary aligns well with one class but fails to generalize across the others. KNN also performs strongly on “down” but collapses on the remaining categories, likely due to overfitting to local data clusters and limited generalization capability. In contrast, XGBoost delivers more balanced performance across all three categories, suggesting that it captures complex non-linear relationships effectively and highlighting why boosting ensembles are particularly well-suited for structured tabular data.

Similarly, for Tesla, Naïve Bayes has high sensitivity value when correctly classifying “down” category; however, it struggles with “neutral” and “up” categories. This again highlights Naïve Bayes’ poor assumption fit for market data. Random Forest, SVM, and XGBoost show consistent balanced performance across classes, indicating they capture more generalizable structure. Logistic regression has strong prediction performance for “up”, but weak for “neutral” category suggesting regularization helps, but linearity limits neutral separation. AdaBoost has balanced prediction performance across all three categories suggesting adaptive reweighting helped catch minority classes.

Overall, ensemble methods such as Random Forest, Gradient Boosting, XGBoost, and AdaBoost performed well as they can handle non-linearity, reduce variance by combining learners, and are more robust to imbalance when tuned. KNN performed well in some cases as it captures local structure when classes are well-clustered. On the other hand, Naive Bayes performed poorly due to unrealistic independence assumption for stock features as technical indicators can be highly correlated. Lack of good performance from Decision Tree can be attributed to overfitting to noise and poor generalization. When comparing different machine learning methods across “down”, “neutral” and “up” categories, XGBoost provides the highest overall sensitivity average of about 0.593.

6.3. Understanding Individual Predictions

One of the long-standing challenges of machine learning models is their “black-box” nature, which makes it difficult to understand how predictions are generated. In recent years, however, significant efforts have been made to develop techniques that provide interpretability and explainability for these models. Figure 7 illustrates one such approach, which focuses on explaining individual predictions to enhance transparency and trust in model outcomes.

Figure 7 illustrates cases 1201 and 1202 to provide deeper insights into individual predictions. Each prediction provides information about the case number, label indicating prediction category, probability for the prediction category, and explanation fit that indicates how well the model explains this specific prediction. It also includes a plot of five most influential features that help explain increase or decrease in the prediction probability for each situation.

For case 1201, the model predicts the “neutral” category with a probability of 0.60. Explanation fit for this prediction is 4.46%, which is at the highest for this case when predicting “neutral” category. The strongest contributor to this outcome is feature X02 (exponential moving average) being greater than 188. In addition, features—X12 and X28 cause increase in the prediction probability whereas features X10 and X35 cause decrease in prediction probability. However, these four features make smaller contributions to the prediction. The alternative categories, “down” and “up”, receive much lower probabilities of 0.21 and 0.20, respectively.

For case 1202, the model predicts “neutral” with a probability of 0.55. Here, the most influential factor is feature X11 (CCI) falling between −90.2 and −74. However, unlike case 1201, there is no single dominant feature in this case. Instead, several features—X32, X25, X17, and X20—jointly contribute to supporting the “neutral” prediction. The alternative categories, “down” and “up”, receive much lower probabilities of 0.25 and 0.20, respectively.

This type of analysis enhances interpretability by showing how specific features influence model outcomes, allowing users to gain valuable insights into the reasoning behind a machine learning prediction.

7. Conclusions, Limitations, and Future Research

This study evaluated nine machine learning models for one-day-ahead stock price prediction of Apple, Tesla, and NVIDIA using technical indicator-based features. Results show that momentum indicators consistently dominate predictive performance, while SVM excelled for Apple and XGBoost performed best for NVIDIA and Tesla. Naïve Bayes underperformed across all datasets. Category-level sensitivity analysis revealed that models often struggled to balance performance across “up,” “down,” and “neutral” classes, underscoring the difficulty of achieving robust classification in volatile financial markets. This study offers several actionable recommendations for investors and traders:

Focus on momentum indicators: Stochastic oscillators, CCI, and Bollinger Band–based features consistently emerged as dominant predictors. Traders should prioritize these metrics to identify short-term shifts and refine entry/exit timing.
Adopt a stock-specific modeling approach: No single model excelled across all equities. SVM performed best for Apple, while XGBoost outperformed for NVIDIA and Tesla. Investors should tailor algorithms to the characteristics of each stock rather than rely on a one-size-fits-all approach.
Leverage explainable AI for trust and adoption: Interpretability tools clarify which indicators drive predictions, helping traders validate signals and enabling institutional investors to justify model-based decisions to stakeholders.
Integrate risk management with model outputs: Since model sensitivity varies across “up,” “down,” and “neutral” categories, traders can minimize exposure by acting only in categories where prediction reliability is high and avoiding trades under uncertain conditions.

By aligning these insights with trading practices, investors can better balance predictive accuracy with transparency, ultimately improving strategy robustness in volatile financial markets.

However, this study also has certain limitations. The analysis was confined to three technology firms, limiting generalizability, and focused exclusively on technical indicators without incorporating sentiment or macroeconomic factors. Fixed thresholds for classification and reliance on a single train–test split may not fully capture market dynamics and limit robustness. Another limitation is the choice of data time range from January 1st, 2018 to June 30th, 2025. It includes events such as the COVID-19 pandemic or chip shortages, thereby affecting the generalization. Future work should expand to other sectors, integrate sentiment and fundamental data, and explore advanced deep learning or hybrid models. Rolling-window validation and dynamic thresholding could further improve adaptability, while broader use of explainability tools may enhance trust and usability in financial decision-making.

Author Contributions

Formal analysis, L.S.; Writing—original draft, B.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study are openly available from Yahoo finance.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AdaBoost	Adaptive Boosting
ADX	Average Directional Movement Index
BBands	Bollinger Bands
CCI	Commodity Channel Index
CEFLANN	Computationally Efficient Functional Link Artificial Neural Network
CMO	Chande Momentum Oscillator
CV	Cross Validation
DVI	David Varadi Intermediate Oscillator
EMH	Efficient Market Hypothesis
EPCNN	Evolved Partially Connected Neural Networks
GB	Gradient Boosting
GridSearchCV	Grid Search Cross-Validation
KNN	K-Nearest Neighbor
KST	Know Sure Thing
MA	Moving Average
MACD	Moving Average Convergence/Divergence
MFI	Money Flow Index
OBV	On Balance Volume
Pbands	Volatility bands around prices
RF	Random Forest
ROC	Rate of Change / Momentum
RSI	Relative Strength Index
SVM	Support Vector Machine
TDI	Trend Detection Index
TRIX	Triple Smoothed Exponential Oscillator
VHF	Vertical Horizontal Filter
XGBoost	Extreme Gradient Boosting

References

Fama, E.F. The Behavior of Stock-Market Prices. J. Bus. 1965, 38, 34–105. [Google Scholar] [CrossRef]
Fama, E.F. Random walks in stock market prices. Financ. Anal. J. 1995, 51, 75–80. [Google Scholar] [CrossRef]
Majhi, R.; Panda, G.; Sahoo, G. Development and performance evaluation of FLANN based model for forecasting of stock markets. Expert Syst. Appl. 2008, 36, 6800–6808. [Google Scholar] [CrossRef]
Cheng, C.H.; Chen, T.L.; Wei, L.Y. A hybrid model based on rough sets theory and genetic algorithms for stock price forecasting. Inf. Sci. 2010, 180, 1610–1629. [Google Scholar] [CrossRef]
Chourmouziadis, K.; Chatzoglou, P.D. An intelligent short term stock trading fuzzy system for assisting investors in portfolio management. Expert Syst. Appl. 2016, 43, 298–311. [Google Scholar] [CrossRef]
da Costa, T.R.C.C.; Nazário, R.T.; Bergo, G.S.Z.; Sobreiro, V.A.; Kimura, H. Trading System based on the use of technical analysis: A computational experiment. J. Behav. Exp. Financ. 2015, 6, 42–55. [Google Scholar] [CrossRef]
Zhang, J.; Cui, S.; Xu, Y.; Li, Q.; Li, T. A novel data-driven stock price trend prediction system. Expert Syst. Appl. 2018, 97, 60–69. [Google Scholar] [CrossRef]
Northcott, A. The complete guide to using candlestick charting; how to earn high rates of return—Safely. Atl. Publ. 2009, 24, 288. [Google Scholar]
Cowles, A. Can stock market forecasters forecast? Econometrica 1933, 1, 309–324. [Google Scholar] [CrossRef]
Menkhoff, L.; Taylor, M.P. The Obstinate Passion of Foreign Exchange Professionals: Technical Analysis. J. Econ. Lit. 2007, 45, 936–972. [Google Scholar] [CrossRef]
Zhu, Y.; Zhou, G. Technical analysis: An asset allocation perspective on the use of moving averages. J. Financ. Econ. 2009, 92, 519–544. [Google Scholar] [CrossRef]
Gorgulho, A.; Neves, R.; Horta, N. Applying a GA kernel on optimizing technical analysis rules for stock picking and portfolio composition. Expert Syst. Appl. 2011, 38, 14072–14085. [Google Scholar] [CrossRef]
Chang, P.C.; Wang, D.D.; Zhou, C.L. A novel model by evolving partially connected neural network for stock price trend forecasting. Expert Syst. Appl. 2012, 39, 611–620. [Google Scholar] [CrossRef]
Ballings, M.; Poel, D.V.; Hespeels, N.; Gryp, R. Evaluating multiple classifiers for stock price direction prediction. Expert Syst. Appl. 2015, 42, 7046–7056. [Google Scholar] [CrossRef]
Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Syst. Appl. 2015, 42, 259–268. [Google Scholar] [CrossRef]
Dash, R.; Dash, P.K. A hybrid stock trading framework integrating technical analysis with machine learning techniques. J. Financ. Data Sci. 2016, 2, 42–57. [Google Scholar] [CrossRef]
Ayalon, Y. Technical Analysis Indicators 101: A Practical Guide to Technical Analysis Indicators; Independently Published: Chicago, IL, USA, 2025; ISBN-13: 979-8292402558. [Google Scholar]
Probst, P.; Wright, M.N.; Boulesteix, A.L. Hyperparameters and tuning strategies for random forest. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef]
Putatunda, S.; Rama, K. A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost. In Proceedings of the 2018 International Conference on Signal Processing and Machine Learning (SPML ‘18), Shanghai, China, 28–30 November 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 6–10. [Google Scholar] [CrossRef]
Mostafavi, S.M.; Hooman, A.R. Key technical indicators for stock market prediction. Mach. Learn. Appl. 2025, 20, 100631. [Google Scholar] [CrossRef]
Saud, A.S.; Shakya, S. Technical indicator empowered intelligent strategies to predict stock trading signals. J. Open Innov. Technol. Mark. Complex. 2024, 10, 100398. [Google Scholar] [CrossRef]

Figure 1. Three phases for the proposed study.

Figure 2. Chart depicting stock price for Apple with blue line indicating 20-day moving averages.

Figure 3. Cross-validation (CV) score obtained during tuning with the test accuracy achieved on test data.

Figure 4. Trade-off between test precision and recall across all classification models.

Figure 5. Top eight important features comparison across three companies.

Figure 6. Category-based feature importance analysis: (a) Feature category performance analysis. (b) Performance by Company. (c) Distribution of feature category performance. (d) Model and feature categories.

Figure 7. Explaining individual predictions.

Table 1. Thirty-six technical indicator-based features capturing trend, momentum, and volatility.

Trend Indicators

Momentum Indicators

Volatility Indicators

Simple moving verage (x01)
Exponential moving average (x02)
ADX (x03)
Aroon (x04)
TDI (x05, x06)
Donchian Channel (x07, x08, x09)
VHF (x10)
CCI (x11)
MFI (x12)
OBV (x13)

MACD (x14, x15)
RSI (x16)
Stochastics (x17, x18, x19)
CMO (x20)
KST (x21, x22)
TRIX (x23, x24)
ROC (x25)
DVI (x26, x27, x28)
Momentum (x29)

BBands

Down (x30)
Up (x31)
% (x32)

Volatility (x33)
Pbands

Down (x34)
Center (x35)
Up (x36)

Table 2. Number of data points in different categories with threshold levels 0.5%, 1.0%, 1.5%, and 2.0%.

Company Name	Threshold % (T)	Up	Down	Neutral	Imbalance Ratio
Apple	0.5	863	603	266	3.24
Apple	1.0	708	491	533	1.44
Apple	1.5	558	396	778	1.96
Apple	2.0	401	305	1026	3.36
NVIDIA	0.5	907	640	185	4.90
NVIDIA	1.0	809	569	354	2.29
NVIDIA	1.5	735	509	488	1.51
NVIDIA	2.0	654	451	627	1.45
Tesla	0.5	862	726	144	5.99
Tesla	1.0	769	663	300	2.56
Tesla	1.5	703	605	424	1.66
Tesla	2.0	640	540	552	1.19

Table 3. The grid setup for the hyperparameter tuning for each predictive model.

Model	Key Hyperparameters Tuned	Rationale
Random Forest	n_estimators: [50, 100, 200] max_depth: [10, 20, None] min_samples_split: [2, 5, 10] min_samples_leaf: [1, 2, 4]	Controls ensemble size and tree complexity to balance bias-variance
Gradient Boosting	n_estimators: [50, 100, 200] learning_rate: [0.01, 0.1, 0.2] max_depth: [3, 5, 7] min_samples_split: [2, 5, 10]	Balances learning step size and tree complexity
XGBoost	n_estimators: [50, 100, 200] learning_rate: [0.01, 0.1, 0.2] max_depth: [3, 5, 7] min_child_weight: [1, 3, 5]	Controls overfitting via regularized boosting and leaf constraints
AdaBoost	n_estimators: [50, 100, 200] learning_rate: [0.01, 0.1, 0.5, 1.0] algorithm: [‘SAMME’, ‘SAMME.R’]	Tests different boosting strategies and step sizes
SVM	C: [0.1, 1, 10, 100] kernel: [‘linear’, ‘rbf’, ‘poly’] gamma: [‘scale’, ‘auto’, 0.001, 0.01]	Explores regularization and kernel-specific behavior
Logistic Regression	C: [0.01, 0.1, 1, 10, 100] penalty: [‘l1’, ‘l2’] solver: [‘liblinear’, ‘saga’] max_iter: [1000, 2000]	Tests different solvers and regularization schemes
KNN	n_neighbors: [3, 5, 7, 9, 11] weights: [‘uniform’, ‘distance’] metric: [‘euclidean’, ‘manhattan’, ‘minkowski’]	Optimizes distance metric and voting strategy
Naive Bayes	var_smoothing: [1 × 10⁻⁹, 8 × 10⁻³, 1 × 10⁻⁷, 1 × 10⁻⁶, 1 × 10⁻⁵]	Smooths probabilities to avoid numerical instability
Decision Tree	max_depth: [5, 10, 20, None] min_samples_split: [2, 5, 10, 20] min_samples_leaf: [1, 2, 5, 10] criterion: [‘gini’, ‘entropy’]	Balances depth and purity criteria to control overfitting

Table 4. Prediction results after hyperparameter tuning.

Company	Model	CV Score	Test Accuracy	Test Precision	Test Recall	Test F1-Score	Training Time
Apple	AdaBoost	0.5938	0.6767	0.6633	0.6767	0.6551	4.3376
Apple	Decision Tree	0.5512	0.7011	0.6843	0.7011	0.6776	2.3036
Apple	Gradient Boosting	0.5253	0.7011	0.7203	0.7011	0.6531	249.9475
Apple	K-Nearest Neighbors	0.5188	0.6654	0.6660	0.6654	0.6628	0.2829
Apple	Logistic Regression	0.5997	0.6955	0.6832	0.6955	0.6822	3.9341
Apple	Naive Bayes	0.4988	0.6504	0.4230	0.6504	0.5126	0.0437
Apple	Random Forest	0.5136	0.6523	0.6622	0.6523	0.6500	38.0964
Apple	SVM	0.5955	0.7274	0.7174	0.7274	0.7053	36.3812
Apple	XGBoost	0.5411	0.6861	0.6856	0.6861	0.6574	28.9349
NVIDIA	AdaBoost	0.5721	0.6071	0.6436	0.6071	0.6014	5.0047
NVIDIA	Decision Tree	0.4829	0.5677	0.5953	0.5677	0.5694	2.2485
NVIDIA	Gradient Boosting	0.4670	0.5432	0.6190	0.5432	0.5477	229.0147
NVIDIA	K-Nearest Neighbors	0.4679	0.3365	0.5669	0.3365	0.2692	0.2014
NVIDIA	Logistic Regression	0.5354	0.5695	0.5646	0.5695	0.5201	4.0674
NVIDIA	Naive Bayes	0.3995	0.4079	0.1664	0.4079	0.2364	0.0461
NVIDIA	Random Forest	0.4812	0.5188	0.5631	0.5188	0.5209	18.1426
NVIDIA	SVM	0.5421	0.3929	0.5556	0.3929	0.3483	44.5505
NVIDIA	XGBoost	0.4829	0.6353	0.6574	0.6353	0.6399	30.0013
Tesla	AdaBoost	0.6180	0.6147	0.6186	0.6147	0.6160	4.1529
Tesla	Decision Tree	0.5171	0.5357	0.5429	0.5357	0.5384	2.1489
Tesla	Gradient Boosting	0.4920	0.5301	0.5306	0.5301	0.5303	221.2618
Tesla	K-Nearest Neighbors	0.4896	0.5771	0.5826	0.5771	0.5771	0.1880
Tesla	Logistic Regression	0.5355	0.5940	0.5773	0.5940	0.5758	3.9646
Tesla	Naive Bayes	0.3336	0.3346	0.2590	0.3346	0.2088	0.0340
Tesla	Random Forest	0.5188	0.6109	0.5996	0.6109	0.5984	17.8899
Tesla	SVM	0.5488	0.6278	0.6222	0.6278	0.6224	36.4019
Tesla	XGBoost	0.5296	0.6316	0.6271	0.6316	0.6289	29.3081

Table 5. Technical indicators and sensitivity.

Company	Model	Sensitivity
		Down	Neutral	Up
Apple	AdaBoost	0.345	0.853	0.353
Apple	Decision Tree	0.381	0.882	0.353
Apple	Gradient Boosting	0.405	0.934	0.157
Apple	K-Nearest Neighbors	0.571	0.760	0.422
Apple	Logistic Regression	0.500	0.838	0.373
Apple	Naive Bayes	0.000	1.000	0.000
Apple	Random Forest	0.631	0.743	0.363
Apple	SVM	0.452	0.902	0.363
Apple	XGBoost	0.452	0.873	0.245
NVIDIA	AdaBoost	0.382	0.581	0.770
NVIDIA	Decision Tree	0.441	0.564	0.650
NVIDIA	Gradient Boosting	0.772	0.352	0.558
NVIDIA	K-Nearest Neighbors	0.949	0.106	0.143
NVIDIA	Logistic Regression	0.853	0.140	0.747
NVIDIA	Naive Bayes	0.000	0.000	1.000
NVIDIA	Random Forest	0.676	0.497	0.438
NVIDIA	SVM	0.956	0.168	0.226
NVIDIA	XGBoost	0.551	0.654	0.673
Tesla	AdaBoost	0.690	0.480	0.670
Tesla	Decision Tree	0.532	0.451	0.617
Tesla	Gradient Boosting	0.632	0.410	0.548
Tesla	K-Nearest Neighbors	0.690	0.486	0.559
Tesla	Logistic Regression	0.684	0.306	0.777
Tesla	Naive Bayes	0.942	0.000	0.090
Tesla	Random Forest	0.737	0.358	0.729
Tesla	SVM	0.719	0.434	0.723
Tesla	XGBoost	0.713	0.468	0.707

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rai, B.; Soltanisehat, L. A Multi-Model Machine Learning Framework for Daily Stock Price Prediction. Big Data Cogn. Comput. 2025, 9, 248. https://doi.org/10.3390/bdcc9100248

AMA Style

Rai B, Soltanisehat L. A Multi-Model Machine Learning Framework for Daily Stock Price Prediction. Big Data and Cognitive Computing. 2025; 9(10):248. https://doi.org/10.3390/bdcc9100248

Chicago/Turabian Style

Rai, Bharatendra, and Leili Soltanisehat. 2025. "A Multi-Model Machine Learning Framework for Daily Stock Price Prediction" Big Data and Cognitive Computing 9, no. 10: 248. https://doi.org/10.3390/bdcc9100248

APA Style

Rai, B., & Soltanisehat, L. (2025). A Multi-Model Machine Learning Framework for Daily Stock Price Prediction. Big Data and Cognitive Computing, 9(10), 248. https://doi.org/10.3390/bdcc9100248

Article Menu

A Multi-Model Machine Learning Framework for Daily Stock Price Prediction

Abstract

1. Introduction

2. Literature Review

3. Three Phases of This Study

4. Feature Extraction Process

4.1. Data

4.2. Thirty-Six Input Features

4.3. Response Classification and Threshold

5. Model Development

5.1. Hyperparameter Tuning Methods and Settings

5.2. Prediction Results

6. Model Assessment and Discussions

6.1. Feature Importance

6.2. Sensitivity for Down, Neutral and up Categories

6.3. Understanding Individual Predictions

7. Conclusions, Limitations, and Future Research

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI