Machine Learning Models to Predict Google Stock Prices

Bucura, Cosmina Elena; Giudici, Paolo

doi:10.3390/a18020081

Open AccessArticle

Machine Learning Models to Predict Google Stock Prices

by

Cosmina Elena Bucura

and

Paolo Giudici

^*

Department of Economics and Management, University of Pavia, 27100 Pavia, Italy

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(2), 81; https://doi.org/10.3390/a18020081

Submission received: 9 December 2024 / Revised: 11 January 2025 / Accepted: 31 January 2025 / Published: 3 February 2025

(This article belongs to the Special Issue Machine Learning for Edge Computing)

Download

Browse Figure

Versions Notes

Abstract

The aim of this paper is to predict Google stock price using different datasets and machine learning models, and understand which models perform better. The novelty of our approach is that we compare models not only by predictive accuracy but also by explainability and robustness. Our findings show that the choice of the best model to employ to predict Google stock prices depends on the desired objective. If the goal is accuracy, the recurrent neural network is the best model, while, for robustness, the Ridge regression model is the most resilient to changes and, for explainability, the Gradient Boosting model is the best choice.

Keywords:

Google stock prices; machine learning models; accuracy; explainability; robustness

1. Introduction

Google services have shifted the use of technology in our daily lives, enhancing our communication, collaboration and access to information. The aim of this paper is to predict Google stock prices using different datasets and machine learning models, and understand which model performs better.

More specifically, we have built three different databases, considering the existing economic literature and the context in which Google (Alphabet) operates. We then applied to the databases different state-of-the-art machine learning models, ranging from linear regression models with Ridge regularization, to Gradient Boosting and two types of neural networks, i.e., artificial and recurrent ones.

The original contribution of this paper is that the models have been compared not only in terms of predictive accuracy, but also in terms of robustness and explainability, in line with the recently proposed S.A.F.E. AI model (Babaei et al., 2025) [1].

The application of S.A.F.E. AI metrics to our data reveals that the choice of the best model to employ to predict Google stock prices depends on the desired objective. When prioritizing accuracy, the recurrent neural network emerges as the best model. If the main concern is robustness, the Ridge regression model is the most resilient to changes. Meanwhile, for explainability, the Gradient Boosting model is the best choice.

The rest of the paper is as follows. Section 2 introduces the considered models; Section 3 introduces the database we have built for the analysis; Section 4 presents the empirical findings we have obtained applying the models to the data and comparing them in terms of S.A.F.E. AI metrics. Finally, Section 5 presents some concluding remarks.

2. Models

We considered a range of representative models to predict Google (Alphabet) stock prices: from statistical learning models, such as from ridge linear regression to machine learning models, and from gradient boosting to deep learning models, such as neural networks and recurrent neural networks.

2.1. Ridge Linear Regression

The Ridge linear regression model used for Alphabet’s price prediction adds a penalty term λ to the standard linear regression model. In this way, the size of the regression coefficients is controlled so that they do not become too large. Moreover, regularization helps in reducing the variance of the model and to improve predictive accuracy, see, e.g., James et al. (2021) [2], Miller et al. (2022) [3] and Pereira et al. (2016) [4]. We choose λ = 1, a moderate penalty.

2.2. Extreme Gradient Boosting

Extreme Gradient Boosting is an ensemble of decision trees, optimized for supervised learning tasks. For Alphabet’s price prediction, we implemented a version of the algorithm introduced by Chen and Guestrin (2016) [5]. The model employs Bayesian Optimization to search for optimal hyperparameters, which are the learning rate, the maximum depth, the size of the data samples and the number of sampled features for each tree. For more details, see also Tarwidi et al. (2023) [6].

2.3. Feedforward Neural Network

A neural network is a computational model able to process information similarly to the human brain, thanks to its large number of tightly interconnected processing elements. The main types of neural network architecture are Feedforward (FNN), Convolutional (CNN) and Recurrent Neural Networks (RNN). (See, for example, Maind and Wankar, 2014 [7]). In an FNN, the information flows in one direction, from the input layer, through the hidden layers, to the output layer, with no feedback loops. The model we consider in this paper is a sequential model consisting of two hidden layers, each with 50 neurons, and one neuron for the output.

2.4. Long Short-Term Memory Model

The Long-Short Term Memory (LSTM) model is a recurrent neural network (RNN) whose aim is to capture long-range dependencies in temporal sequences, such as time series data, see, e.g., Vasileiadis et al. (2024) [8] and Giudici et al. (2024) [1]. Houdt et al. (2020) [9] describe the simplest LSTM version, which consists of three key components representing memory blocks: a cell, an output gate and a forget gate. At each step, the model processes sequences of data, defined by a loopback parameter, keeping track of relevant information over long periods (see e.g., Sherstinsk, 2020 [10]). In the model considered here, the loopback parameter has been set to 5 days, representing a week of working days. We consider an LSTM model with two layers, where each layer is composed of 50 neurons.

2.5. The SAFE A.I. Model

Babaei et al. (2025) [1] introduced a model to assess and then monitor and mitigate the risks generated by artificial intelligence (AI) applications. The model measures the risks of AI, relating the probability of occurrence of a harm due to AI to its lack of sustainability, accuracy, fairness and explainability. Each of the latter four attributes is measured in terms of normalized metrics that extend the well-known Area Under the ROC Curve and Gini coefficient: from a binary response to a more general ordinal or continuous response, but keeping the simple interpretability of the binary coefficients, expressed in percentages.

Sustainability measures how resilient applications of artificial intelligence are to perturbations, deriving from cyber-attacks or extreme events. To this aim, Babaei et al. (2025) have extended the notion of Lorenz Zonoid to measure the variation in model output induced on different population percentiles, leading to the notion of the Rank Graduation Robustness (RGR) metrics.

The accuracy metrics directly extend the Area Under the ROC Curve (AUC) to all types of response variables, leading to the Rank Graduation Accuracy metric (RGA), based on the notion of Lorenz Zonoid and on the related concordance curve. Such a measure allows the assessment of model accuracy to become more independent for the underlying data and models.

The fairness metrics are based on the idea of comparing the Gini inequality coefficient for a model, separately calculated in different population groups. This leads to a Rank Graduation Fairness (RGF) measure.

Finally, the explainability metrics allow us to interpret the impact of each explanatory variable in terms of its contribution to the predictive accuracy, by means of the Rank Graduation Explainability (RGE) metrics.

3. Data

Several variables can affect the stock price of Alphabet. We chose those that are more frequently used in the reference literature and that are more related to the operational context of Google. We then sourced from Yahoo Finance, besides the closing price of Alphabet, which would be our response variable, the explanatory variables described in Table 1 below, for the period from June 2014 to May 2024. To avoid inconsistencies, explanatory variables that are functions of the daily price were lagged by one day. To build and compare the machine learning models, we randomly sampled 80% of the observations as the training set and used the remaining 20% as the test set.

In Figure 1, we plotted the correlations of the variables described in Table 1 with the Alphabet closing price (Close).

From Figure 1, note that the target variable, Alphabet’s Close Price, has a high correlation (0.98) with IXIC (Nasdaq Composite Index) and with TRXFLDUSP (Bloomberg U.S. Dollar Total Return Index), but also with gold Close price (0.85), the difference between high and low (0.69) and the crude oil Close price (0.56).

Figure 1 also underlines the strong collinearity among the explanatory variables. The presence of multicollinearity may reduce predictive accuracy, and the interpretability of the models. To improve interpretability, in this paper we have subdivided the collected data into three data sets, which differ by the contained features. The first dataset contains only the lags from 1 day to 5 days of GOOG’s close price. The second dataset includes both financial market data related to the Google stock price, such as retGOOG and HlfiddGOOG, and variables describing the market context in which Alphabet works, such as retGOLD, retCrudeOil, DXYNYB and VIX. The third dataset includes data from annual financial reports, such as retGOOGvsCostRD, VarAdjClosevsRDrev and AdjClosevsTTMNetEPS. All datasets include also the lagged stock price data. In this way, as we moved from the first dataset to the third, we could assess the additional (structural) importance of groups of variables: stock market variables, context variables and report variables.

4. Results

In this section, we compare the performance of the proposed models on the three built datasets. The results of the models, applied to all variables, are shown in Table 2.

Table 2 shows that the Long-Short Term model (LSTM), is the most accurate model, as it reaches the lowest MSE, RMSE, MAE and MAPE, as well as the highest R-squared, in the test set. This, however, is not the best fit model in the training set.

4.1. LSTM

We then applied the LSTM model separately to each of the three datasets. The results are reported in Table 3.

From Table 3, it is possible to note a decline in the results of the LSTM going from the first to the third dataset. We remark that the decrease in accuracy may be compensated by an improvement of explainability and robustness. To understand whether this is true, we present the application of the S.A.F.E. AI metrics, based on the unifying concept of Lorenz inequality, in Table 4, for the LSTM model.

From Table 4, it appears that the second dataset shows the highest total explainability. Connecting Table 3 with Table 4, this means that the high accuracy of the LSTM for the first database mainly derives from the lagged price variables, whereas the second database gives a more structural explanation. Finally, note that the LSTM model has a similar robustness across all datasets.

4.2. ANN

We then repeated the detailed analysis performed for the LSTM for the ANN model, the second in performance, as shown in Table 2. We applied the LSTM model separately to each of the three datasets. The results are reported in Table 5.

From Table 5, note that the ANN model performs best in predictive accuracy on the first dataset, similarly to the LSTM model. The worst performance occurs for the second dataset. We now consider the application of the S.A.F.E. AI metrics in Table 6.

From Table 6, note that all models have very low explainability and robustness as, in ANN, lagged variables prevail. In the case of the second dataset, the metrics are equal to zero. A possible strategy to improve the performance of the ANN for the dataset is to reduce the number of lags, possibly reducing them to one. A possible exception is dataset 3, which provides about 10% explanation, which is highly robust. The low RGR results underline the fact that, in a possibly evolving environment, the results of ANN will dramatically change.

4.3. XG Boost

We now consider the XG Boost model in Table 7.

From Table 7, the XG Boost model shows, in line with previous models, that the model performed best on dataset1, followed by dataset2 and dataset3. The most explainable features in the second datasets were IXIC and DXYNYB (0.5581 and 0.0273) while in the third one, the most explainable features were IXIC, Crude Oil price and retGOOG, respectively, 0.1921, 0.0279 and 0.0199.

We now consider the application of the S.A.F.E. AI metrics in Table 8.

The S.A.F.E. AI metrics in Table 8 denote generally good accuracy but a big difference with respect to explainability. The second dataset is the most explainable, in line with the result from the LSTM model. The main issue concerning this model is the overall robustness when perturbing data with small values of RGR.

4.4. Ridge Linear Regression

We finally consider the Ridge regression model in Table 9.

Table 9 shows a result in line with the previous models: the model is more accurate for the first database, followed by the second and the third. The S.A.F.E. AI metrics are shown in Table 10.

The S.A.F.E. AI metrics in Table 10 denote good accuracy but low explainability. The second dataset is the most explainable, in line with the result from the previous model. The robustness is high, as we expect from this type of model.

5. Concluding Remarks

Our aim was to predict Alphabet’s stock price, taking into consideration the possible explanatory factors. To achieve this goal, we built three different datasets, considering autoregressive financial market data (dataset1), economic context data (dataset2) and company financial report data (dataset3).

We then compared four machine learning models for the three considered datasets. The models were compared not only in terms of predictive accuracy, but also in terms of robustness and explainability, in line with the recently proposed S.A.F.E. AI approach.

Our findings underscore the importance of careful model selection in light of the desired outcomes. If the primary goal is predictive accuracy, all of the models show acceptable RGA values, ranging from 0.9625 to 0.9955. If the focus is instead on robustness, the Ridge regression model seems to be the most suitable choice, together with LSTM. If explainability is prioritized over robustness, XG Boost will be the best choice.

The trade-off between all those aspects should guide the choice of the model, depending on the specific application and goals. Alternatively, the value of each metric can be mapped onto a probability, and the three resulting probabilities can be multiplied together. This led, in the case of Dataset2, to a probability of about 0.38 for LSTM, 0.19 for XGboost and 0.33 for Ridge, indicating that the LSTM is the model with the highest probability of being safe, in this case.

More generally, taking into account also data comparison, if the focus is mainly in obtaining a high predictive accuracy, a user should consider an LSTM model, possibly using the first dataset. If the focus is mainly on obtaining a model which is easier to explain and understand, a user should consider XGBOOST, possibly using the third dataset. If the focus is on obtaining a model that is robust to data perturbations and outliers, a user should consider Ridge linear regression for one of the three datasets.

From a financial management practice, the paper indicates how stock price can be predicted using several independent variables, with their lagged values.

Future research may consider hybrid models integrating LSTM, XGBOOST and RIDGE to jointly improve explainability, robustness and accuracy and/or add further variables, such as technical analysis indicators.

Additionally, the work can be extended to predict other stock prices, and, possibly, to use Google stock prices to predict other stock prices.

Author Contributions

Conceptualization, P.G.; methodology, P.G.; software, C.E.B.; validation, P.G.; formal analysis, C.E.B.; investigation, C.E.B.; resources, P.G.; data curation, C.E.B.; writing—original draft preparation, C.E.B.; writing—review and editing, C.E.B.; visualization, C.E.B.; supervision, P.G.; project administration, P.G.; no funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Babaei, G.; Giudici, P.; Raffinetti, E. A Rank graduation box for SAFE Artificial Intelligence. Expert Syst. Appl. 2025, 29, 125329. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning with Applications in R, Second Edition. 2021. Available online: https://www.statlearning.com/ (accessed on 19 July 2024).
Miller, A.; Panneerselvam, J.; Liu, L. A review of regression and classification techniques for analysis of common and rare variants and gene-environmental factors. Neurocomputing 2022, 489, 466–485. [Google Scholar] [CrossRef]
Pereira, J.M.; Basto, M.; Ferreira da Silva, A. The Logistic Lasso and Ridge Regression in Predicting Corporate Failure. Procedia Econ. Financ. 2016, 39, 634–641. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
Tarwidi, D.; Pudjaprasetya, S.R.; Adytia, D.; Apri, M. An optimized XGBoost-based machine learning method for predicting wave run-up on a sloping beach. MethodsX 2023, 10, 102119. [Google Scholar] [CrossRef] [PubMed]
Maind, B.; Wankar, P. Basics of Artificial Neural Network. Int. J. Recent Innov. Trends Comput. Commun. 2014, 2, 96–100. [Google Scholar] [CrossRef]
Vasileiadis, A.; Alexandrou, E.; Paschalidou, L. Artificial Neural Networks and Its Applications. arXiv 2024, arXiv:2110.09021. [Google Scholar] [CrossRef]
Van Houdt, G.; Mosquera, C.; Nápoles, G. A Review on the Long Short-Term Memory Model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]

Figure 1. Correlation among feature variables. Source: Yahoo Finance and Bloomberg.

Table 1. The considered variables. Source: Yahoo Finance and Bloomberg. Authors’ Elaboration.

Variable	Description
retGOOG	Alphabet’s daily return computed as: $r_{t + 1} = \frac{p_{t + 1} - p_{t}}{p_{t}}$ (lagged).
HldiffGOOG	Difference between high and low daily price: $d_{H l} = p_{H} - p_{l}$ (lagged).
OCdiffGOOG	Difference between Open and Close daily price: $d_{O C} = p_{O} - p_{C}$ (lagged).
HLprodOC	Products between HldiffGOOG and OCdiffGOOG (lagged).
StdChangeVolGOOG	Standardised daily change in volume ${S C V}_{t} = \frac{c ν_{t} - μ_{c v}}{σ_{c v}}$ .
IXIC	Nasdaq Composite Index (IXIC).
DXYNYB	U.S. Dollar Index, measuring the value of the United States dollar relative to a basket of foreign currencies (EUR, JPY, GBP, CAD, SEK, CHF), sourced from Yahoo Finance.
TRXFLDUSP	Bloomberg U.S. Dollar Total Return Index.
VIX	Volatility Index, a real-time market index representing the market’s expectations for volatility over the coming 30 days, sourced from Yahoo Finance.
Gold	Gold adjusted Close price on Yahoo Finance.
retGOLD	Return on Gold adjusted close price: $r_{t + 1} = \frac{p_{t + 1} - p_{t}}{p_{t}}$ .
CrudeOil	Crude Oil adjusted Close price on Yahoo Finance.
retCrudeOil	Return on Crude Oil adjusted Close price: $r_{t + 1} = \frac{p_{t + 1} - p_{t}}{p_{t}}$ .
retGOOGvsCostRD	Ratio between daily percentage of change in prices over the quarterly percentage of change in the previous quarter.
VarAdjClosevsRDrev	Ratio between retGOOG and the percentage of R&D in terms of revenues related to the previous quarter.
AdjClosevsPE	Ratio between daily price over PE in that quarter.
AdjClosevsTTMNetEPS	Ratio between daily prices over TTM Net EPS (Trailing Twelve Months Net Earnings Per Share) in that quarter. This ratio could reflect how the current stock price compares to recent earnings over a rolling 12-month period.

Table 2. Model comparison using all variables.

Models	Train					Test
Models	MSE	RMSE	MAE	MAPE	R²	MSE	RMSE	MAE	MAPE	R²
LSTM	3.5651	1.8882	1.4078	2.43%	0.9969	66.8111	8.1738	6.6792	6.01%	0.8686
ANN	1.7655	1.3287	1.0444	2.12%	0.9985	123.2245	11.1007	8.8661	8.25%	0.7556
XG Boost	0	0.0014	0.0009	0%	1	144.3545	12.0148	9.6100	8.52%	0.7137
Ridge	9.2765	3.0457	2.4585	5.03%	0.9920	123.9198	11.1319	8.6786	8.20%	0.7542

Table 3. Accuracy of Long-Short Term models.

	LSTM
	Train					Test
	MSE	RMSE	MAE	MAPE	R²	MSE	RMSE	MAE	MAPE	R²
Dataset1	2.1799	1.4764	0.9852	1.55%	0.9981	9.2161	3.0358	23.322	2.00%	0.98.12
Dataset2	14.878	3.8572	2.8738	5.73%	0.9871	134.5846	11.6011	9.1761	7.18%	0.7353
Dataset3	7.4785	2.7347	2.0244	3.26%	0.9936	177.4575	13.3213	10.686	8.08%	0.6510

Table 4. SAFE AI metrics for LSTM models.

LSTM
	Dataset1	Dataset2	Dataset3
RGA	0.9955	0.9652	0.9797
RGE Tot	0.0014	0.4124	0.2479
RGR All	0.9996	0.9955	0.9986

Table 5. Accuracy of ANN models.

	ANN
	Train					Test
	MSE	RMSE	MAE	MAPE	R²	MSE	RMSE	MAE	MAPE	R²
Dataset1	2.6512	1.6283	1.0441	1.62%	0.9977	11.9960	3.3466	2.6264	2.22%	0.9770
Dataset2	17.408	4.1722	3.0457	5.91%	0.9850	389.4413	19.7342	18.290	15.6%	0.2276
Dataset3	6.793	2.6064	1.9849	3.85%	0.9941	55.336	7.439	5.8120	4.87%	0.890

Table 6. Safe AI metrics for ANN models.

ANN
	Dataset1	Dataset2	Dataset3
RGA	0.9940	-	0.9796
RGE Tot	0.0086	-	0.1083
RGR All	0.0012	-	1

Table 7. Accuracy of XG Boost models.

	XG Boost
	Train					Test
	MSE	RMSE	MAE	MAPE	R²	MSE	RMSE	MAE	MAPE	R²
Dataset1	0.8826	0.9394	0.6359	1.05%	0.9992	38.8085	5.2296	3.4671	2.62%	0.9199
Dataset2	6.0138	2.4523	1.8266	3.66%	0.9948	84.4070	9.1873	7.1013	6.09%	0.8326
Dataset3	1.1386	1.0671	0.8050	1.47%	0.9990	103.5682	10.1768	7.7770	6.60%	0.7946

Table 8. Safe AI metrics for XG Boost model.

ANN
	Dataset1	Dataset2	Dataset3
RGA	0.9933	0.9753	0.9830
RGE Tot	0.0042	0.6297	0.3338
RGR All	0.0038	0.3263	0.1338

Table 9. Accuracy of Ridge regression model.

	Ridge
	Train					Test
	MSE	RMSE	MAE	MAPE	R²	MSE	RMSE	MAE	MAPE	R²
Dataset1	2.2978	1.5158	0.9610	1.50%	0.9980	9.6751	3.1100	2.4100	2.05%	0.9800
Dataset2	43.845	6.6126	5.1162	8.67%	0.9622	65.7159	8.1065	6.4220	5.24%	0.8697
Dataset3	32.772	5.7246	4.7163	8.85%	0.9718	88.0646	9.3842	7.5181	6.07%	0.8253

Table 10. Safe AI metrics for the Ridge regression model.

ANN
	Dataset1	Dataset2	Dataset3
RGA	0.9948	0.9754	0.9760
RGE Tot	0.0015	0.351	0.0966
RGR All	1	1	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bucura, C.E.; Giudici, P. Machine Learning Models to Predict Google Stock Prices. Algorithms 2025, 18, 81. https://doi.org/10.3390/a18020081

AMA Style

Bucura CE, Giudici P. Machine Learning Models to Predict Google Stock Prices. Algorithms. 2025; 18(2):81. https://doi.org/10.3390/a18020081

Chicago/Turabian Style

Bucura, Cosmina Elena, and Paolo Giudici. 2025. "Machine Learning Models to Predict Google Stock Prices" Algorithms 18, no. 2: 81. https://doi.org/10.3390/a18020081

APA Style

Bucura, C. E., & Giudici, P. (2025). Machine Learning Models to Predict Google Stock Prices. Algorithms, 18(2), 81. https://doi.org/10.3390/a18020081

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Models to Predict Google Stock Prices

Abstract

1. Introduction

2. Models

2.1. Ridge Linear Regression

2.2. Extreme Gradient Boosting

2.3. Feedforward Neural Network

2.4. Long Short-Term Memory Model

2.5. The SAFE A.I. Model

3. Data

4. Results

4.1. LSTM

4.2. ANN

4.3. XG Boost

4.4. Ridge Linear Regression

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI