4.1. Classification
The heatmap (
Figure 4) indicates the Pearson correlation coefficients between various variables, including ‘Stock Returns’ and ‘Israel Presence’, which are of particular interest in this analysis. These two variables are being considered as dependent variables in separate contexts: ‘Stock Returns’ for regression problems and ‘Israel Presence’ for classification problems. From the heatmap, it seems that neither ‘Stock Returns’ nor ‘Israel Presence’ exhibits a strong correlation coefficient (near 0.7 or −0.7) with any other variables, suggesting that predictive models for both will likely need to consider a combination of factors, rather than relying on a single predictor. In the regression model for ‘Stock Returns’, variables with higher absolute correlations might be weighted more heavily, while for the classification model predicting ‘Israel Presence’, the choice of variables might also depend on domain knowledge and model interpretability, not just on the correlation values. The absence of a strong linear relationship between the independent variables and the dependent variables in both the regression and classification scenarios presents a challenge for traditional modeling techniques, which often rely on such associations to make accurate predictions.
The results from the logistic regression in
Table 4 suggest that the examined variables lack statistically significant connections with the predicted outcome. The other predictors, including the announcement date, donation presence, donation amount, number of articles, Text Sentiment, Title Sentiment, and type of statement, have odds ratios around 1 or indicate a reduced likelihood of the outcome, but their
p-values, which range from 0.135 to 0.916, fail to reach statistical significance. Furthermore, the broad confidence intervals encompass a range of potential outcomes, both positive and negative. The collective insignificance of these variables suggests that the current logistic regression model does not adequately identify the factors that influence the outcome, indicating a need for further analysis or the exploration of different modeling approaches to discover impactful predictors.
Building on this foundational analysis, further investigations employed alternative classification techniques to assess their predictive capabilities in the same dataset. These methods—Decision Tree, Gradient Boosting, K-Nearest Neighbors, Naive Bayes, neural network, and Random Forest—were evaluated across several metrics, including accuracy, sensitivity, specificity, ROC AUC, and F1 Score, to provide a comprehensive comparison.
The results (
Table 5) revealed a spectrum of performance across these models. For instance, the Random Forest model emerged as the most effective, with the highest scores in accuracy (0.871795), sensitivity (0.95), and F1 Score (0.857143), indicating its robustness in capturing the nuances of the dataset. Similarly, the Gradient Boosting and neural network models also demonstrated strong predictive power, each with distinct advantages in terms of sensitivity and overall accuracy. In contrast, models like the Naive Bayes and K-Nearest Neighbors exhibited lower performance metrics, highlighting their relative limitations in this specific context.
This comparative analysis underscores the varied strengths and weaknesses of different classification approaches in handling binary outcome predictions. While logistic regression provided valuable insights into the influence of specific predictors, the subsequent exploration of alternative models highlighted the potential for achieving higher accuracy and predictive performance. Such findings advocate for a multifaceted approach in classification problems, where initial insights from logistic regression can inform more nuanced analyses using a range of advanced machine learning techniques.
The analysis of ROC (Receiver Operating Characteristic) curves in
Figure 5 for various classification models applied to the context of actions, news discussions, and donations in the Israel–Palestine scenario reveals significant insights into the performance of these models in distinguishing between binary classes. The Gradient Boosting model stands out with the highest AUC (Area Under the Curve) score of 0.90, indicating its superior capability to correctly classify the positive cases (related to actions, news discussion, and donations) with a high true positive rate (TPR) across various thresholds while maintaining a low false positive rate (FPR). This suggests that Gradient Boosting is particularly effective for this complex problem, likely due to its ability to capture non-linear patterns through the combination of multiple Decision Trees.
Following Gradient Boosting, the neural network, with an AUC of 0.85, shows a good discrimination ability, suggesting that it can effectively capture the underlying patterns in the data through its complex architecture. The Decision Tree and Support Vector Machine (SVM) models, both with an AUC of 0.84, also demonstrate solid performance. The Decision Tree’s ‘stepped’ ROC curve reflects its binary decision-making process, which can result in sudden changes in TPR and FPR at certain thresholds. In contrast, the smooth ROC curve of the SVM indicates its consistent performance across different thresholds, likely due to its ability to find the optimal hyperplane that separates the classes.
The Naive Bayes model, with an AUC of 0.82, and the logistic regression model, with an AUC of 0.77, show reasonable but comparatively lower performance. The slightly lesser ability of the Naive Bayes model to discriminate between the classes might stem from its assumption of feature independence, which may not hold true in complex scenarios like this. The logistic regression model’s moderate performance could be attributed to its linear nature, which might not effectively capture the complexity of the relationships in the data as well as the other models.
These results signify that for the specific problem of classifying actions, news discussions, and donations in the Israel–Palestine context, Gradient Boosting is the most effective model, followed by the neural network, Decision Tree, SVM, Naive Bayes, and logistic regression. The varying AUC scores and the characteristics of the ROC curves provide valuable insights into each model’s ability to handle the intricacies of real-world data, guiding the selection of the most appropriate model based on the specific requirements and nuances of the classification task at hand.
The SHAP values shown in
Figure 6 serve as an interpretive tool to gauge feature importance within the predictive model for Israeli affiliations. A negative skew for ‘Title Sentiment’ suggests an inverse relationship; more optimistic Title Sentiments tend to decrease the model’s likelihood of associating a company with Israel. Conversely, the ‘Number of Articles’ displays a broad influence on the predictive outcomes, where an increased article count appears to elevate the probability of a company’s linkage to Israel, as per the model’s assessment. Further, ‘Text Sentiment’ mirrors the trend of ‘Title Sentiment’, where more positive sentiments within the body text of articles tend to reduce the model’s prediction of a firm’s presence in Israel. ‘Days since conflict’ has a cluster of values around the center, indicating that the time elapsed since the start of the conflict has a less pronounced effect on the model’s output.
Interestingly, ‘Donation’ shows a notable shift towards negative SHAP values, which could imply that the model associates the act of donating with a lower probability of a firm’s presence in Israel. Finally, ‘Announcement Type’ exhibits a significant spread across both positive and negative values, reflecting the varied ways in which the nature of a company’s announcements influences the model’s predictions.
In evaluating different ensemble learning models for predicting companies’ connections to Israel,
Table 6 shows the stacking model as the top performer with 91.38% accuracy, showing strong sensitivity and specificity. In contrast, bagging and boosting both achieved 90.13% accuracy, with high sensitivity but lower specificity, indicating possible overestimation of ties to Israel. The voting model had the least accuracy at 82.76% and lower specificity, suggesting challenges in correctly identifying companies without links to Israel. The stacking ensemble method outshines previous individual machine learning approaches detailed in
Table 5, proving to be the most effective for the given problem.
In our study within the Multifaceted Analysis of Conflict framework, focusing on the Israel–Palestine scenario, we explored the impact of various features, actions taken (A), media sentiment and volume (B), and financial donations (C), both individually and in combination, to understand their predictive power in conflict outcomes. The ablation study results from
Table 7 revealed that individual features, such as actions taken, demonstrated high accuracy, sensitivity, and a respectable ROC AUC score, underscoring the significance of the nature and intensity of actions within the conflict. In contrast, media sentiment and volume alone showed lower predictive accuracy and sensitivity but higher specificity, indicating their utility in identifying non-significant events rather than predicting conflict outcomes directly. Donations, while exhibiting the lowest accuracy and sensitivity among the individual features, had the highest specificity, suggesting their effectiveness in recognizing instances without external support. The combination of features, particularly actions and news articles and the comprehensive integration of all three features (A + B + C), significantly enhanced model performance, achieving higher accuracy and sensitivity. This suggests that the content and sentiment of public announcements and news articles play a critical role in the model’s ability to predict a firm’s ties to Israel, highlighting their predominant influence in such geopolitical association analyses.
4.2. Regression
In analyzing complex geopolitical scenarios, such as the Israel–Palestine conflict, it is crucial to employ a variety of analytical techniques to capture the multifaceted nature of the environment. Initially, we focus on classification models to discern the presence of Israel-related events within the conflict. The classification stage is critical in filtering the vast amounts of data to identify those instances that have marked relevance to Israel’s involvement in the conflict. Upon establishing a reliable classification model that can accurately identify the presence of Israel in the conflict-related events, we proceed to leverage these predictions to construct a regression model aimed at forecasting future stock returns. The rationale behind this approach is grounded in the observation that geopolitical events, particularly those involving Israel in the context of the Middle East, can have significant ripple effects on financial markets.
The regression model integrates the outputs from the classification model as a key input variable, hypothesizing that the frequency, nature, and scale of Israel’s involvement in the conflict may have predictive power over market fluctuations. It also considers additional quantitative metrics and market indicators to forecast stock returns, thus providing investors and analysts with insights that can guide investment strategies. By combining the classification and regression models, we create a comprehensive analytical toolset. The classification model serves as a preliminary filter, distilling the vast landscape of conflict data into focused instances with potential market impact. Subsequently, the regression model takes these instances as a base to forecast economic outcomes, embodying a sophisticated synthesis of geopolitical and financial analysis. The resulting forecasts represent a convergence of political science and economic prediction, providing a nuanced perspective on the interplay between conflict events and their economic ramifications.
We constructed a regression model with the daily stock returns of various corporations serving as the dependent variable. We meticulously tracked the stock performance over a period of 30 days after the events, treating each day as an independent observation to capture the immediate and short-term effects on stock valuation. Our independent variables comprised outputs from our classification models, which included indicators of Israel’s presence in media coverage and associated actions, alongside financial metrics and sentiment analyses derived from our comprehensive dataset.
The analysis presented in
Table 8 draws from a linear regression model, which investigates the impact of various factors on stock returns, with a particular focus on the presence of Israel-related variables. The model incorporates a diverse set of predictors, including announcement dates, the number of articles, Text and Title Sentiment, types of statements, and the presence or absence of donations, alongside the binary state of Israel’s presence as a specific interest. The coefficients, alongside their standard errors (in parentheses), provide insights into the magnitude and significance of each predictor’s effect on stock returns.
The positive coefficients for the announcement date in contexts where Israel’s presence is considered and a donation was made (0.0183 and 0.019, both significant at ** level) suggest that announcement dates in these specific scenarios have a statistically significant, albeit small, positive effect on stock returns. These variables show varying effects across different scenarios, but most lack statistical significance, indicated by the absence of asterisks. This suggests that their impact on stock returns is not consistently strong or detectable across all scenarios. The F-statistic is significant (* level) only in the scenario where Israel’s presence is considered, and a donation was made, indicating that the model is overall significant in this specific scenario. In other contexts, the F-statistics are below the threshold for significance, suggesting that the model does not explain a significant portion of the variance in stock returns. These values are very low across all scenarios, ranging from 0.004 to 0.021. While they indicate that the model accounts for a very small fraction of the variance in stock returns, this is not uncommon in financial data analysis, where stock returns can be influenced by myriad unpredictable factors.
Given the limitations of the traditional linear regression model in capturing the complexities and dynamics of stock returns, as evidenced by low adjusted R-squared values and limited predictive power, machine learning (ML)-based regression methods offer a promising alternative. These methods can handle non-linear relationships, interact with a high number of features, and better capture the intricacies of financial market data.
The evaluation of machine learning models for predicting daily cumulative abnormal stock returns over a 30-day post-announcement period has yielded a range of performance outcomes. The models were assessed based on mean squared error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the R-squared (R2) value.
Random Forest Regression demonstrated the best performance with the lowest error, suggesting a strong fit relative to other models as shown in
Table 9. This indicates its robustness in capturing the variability of stock returns with a higher degree of precision. The Multi-Layer Perceptron (MLP) model performed commendably, with a positive R
2 score suggesting its proficiency in deciphering complex stock return patterns. Linear models like Lasso, ElasticNet, and Ridge Regression, as well as linear regression, had comparable results but slightly underperformed based on the performance metrics.
The process of rigorously testing our Random Forest model involved the careful exclusion of a subset of data from four distinct companies, ensuring that the model’s proficiency in predicting market behavior was thoroughly assessed in diverse market scenarios and business environments. The resulting time series plots (
Figure 7) showcase the model’s performance in forecasting daily abnormal returns for these corporations, an endeavor crucial for gauging stock behavior and potential investment risks.
The predictions outlined in the visualizations highlight that the model holds the capability to forecast stock returns anomalously, though the degree of precision and reliability of these predictions can substantially fluctuate, as evidenced by the expansive 95% confidence intervals. In the immediate aftermath of the onset of the Israel–Palestine conflict—a period ranging from 5 to 10 days—all selected companies registered a downturn, with varying intensities. Notably, the model’s predictions did not fully anticipate the extent of the downturn for Microsoft, Sales Force, and Tesla, in contrast to Boeing, which exhibited an initial dip in its stock returns. Overall, the erratic nature of daily stock returns poses a considerable challenge; however, the Random Forest algorithm, identified as the superior method from our previous analyses, demonstrated a reasonable capacity to capture these fluctuations to a certain extent.
The displayed SHAP summary plot in
Figure 8 provides a visual representation of the influence that various features exert on the predictive model’s output for daily cumulative abnormal returns. Each data point, color-coded to indicate the magnitude of the feature value, conveys the SHAP value assigned to a feature for a specific instance. Features are ranked by their importance based on the distribution breadth of their SHAP values; a wider dispersal signifies a more pronounced effect on the model’s predictions.
For instance, the ‘Days’ feature exhibits a diverse range of impacts, with SHAP values straddling both sides of the zero mark, hinting at how the elapsed time since the initial incident affects returns differently as the situation progresses. ‘Docs’, which could denote the count of relevant documents or articles, also displays a considerable spread around the zero point, indicating its variable influence on returns, both positively and negatively. Notably, sentiment-related features such as ‘Title Sentiment’ and ‘Text Sentiment’ are generally skewed towards the negative, suggesting that adverse media narratives typically suppress predicted returns. ‘Type’ presents a balanced array of positive and negative SHAP values, implying its impact can sway in either direction depending on other factors.
Reflecting on the empirical results provided in
Table 10, it is evident that the Multi-Layer Perceptron (MLP) model, with a specific configuration of an activation function ‘tanh’ and hidden layer sizes of 50, significantly outperforms other models in predicting daily cumulative abnormal stock returns.
We found significant performance disparities among the models. Linear models, despite their simplicity, struggled with the complex, non-linear nature of stock returns. Ridge Regression showed modest success due to its regularization approach, but like Lasso and ElasticNet, it failed to capture the market’s volatility fully. Non-linear models like Decision Tree and Random Forest offered better predictions, with Random Forest emerging as particularly effective due to its ensemble approach, reducing overfitting and capturing stock return movements more accurately.
In
Figure 9, the boxplot illustrates the distribution of predicted average abnormal stock returns over a 30-day span following pivotal developments in the Israel–Palestine conflict. These returns account for corporate responses and social media activity. Employing the Multi-Layer Perceptron algorithm for our regression analysis reveals varied predictive accuracy for different companies. The model tends to overestimate the returns for Tesla, while it underestimates the returns for the remaining three firms. Specifically, the actual observed returns for Salesforce align within the interquartile range, denoting predictions that are consistent with the actual performance. Conversely, the predictions diverge significantly for Tesla and Microsoft, where the observed returns fall just outside the anticipated range of the boxplot. This suggests the model’s predictions do not entirely encapsulate the market’s actual response for these companies, potentially indicating the influence of external variables not accounted for by the model. The observed discrepancies between predicted and actual stock returns indicate that the Random Forest model may not be comprehensively accounting for all determinants that swayed the average abnormal stock returns in the 30-day period following the respective announcements concerning the Israel–Palestine conflict. A significant contributing factor to this could be the restricted dataset size; with only one record per company as opposed to 30 observations per entity in prior models, the model is constrained by a smaller sample size. This reduction in data points limits the model’s ability to learn and adapt to the complexities of the stock market’s behavior. Additionally, the substantial variability among companies’ returns presents an inherent challenge, suggesting that stock performance is subject to a diverse array of influences, possibly extending beyond the scope of the model’s features. Factors such as unquantified market sentiments, undisclosed financial strategies, or evolving geopolitical scenarios might play pivotal roles in shaping the returns, thereby impacting the precision of the model’s forecasts.
In the short-term regression analysis, as visualized by the SHAP summary plot (
Figure 10), the sentiment of titles and textual content within documents (‘Title Sentiment’ and ‘Text Sentiment’) is depicted as having an influence on the model’s predictions. However, the relatively confined spread of the SHAP values, gravitating towards zero, suggests that the effect of sentiment on the model’s output is moderate. ‘Docs’, representing the count of related documents or articles, exhibits a mix of positive and negative SHAP values, indicating an inconsistent contribution to the prediction of returns, which could be indicative of the varying influence of media coverage on stock performance.
‘Days since conflict’ shows a notable aggregation of SHAP values around the zero mark, implying a negligible or very limited effect on the predictive output. This could be due to the market’s desensitization to the conflict over time or the incorporation of its effects into stock prices. On the other hand, ‘Donation’ predominantly aligns with negative SHAP values, inferring a potential adverse impact on returns when a company engages in philanthropic activities, perhaps reflecting investor concerns over the allocation of resources.
The ‘Type’ feature spans across both sides of the SHAP value spectrum, denoting an inconsistent impact on the model’s predictions. This inconsistency may reflect the complexity of the model’s interaction with various types or categories of data. Similarly, ‘Israel Presence’ congregates around a neutral SHAP value, suggesting it does not substantially affect the model’s output in this scenario.
Contrasting with the previous SHAP plot, ‘Days’ here demonstrates a concentration of SHAP values at zero, leading to the conclusion that in this model’s configuration, the temporal aspect does not significantly affect the prediction of cumulative abnormal returns. This is a departure from the earlier plot, where ‘Days’ exhibited a wider range of SHAP values.
The overall tighter clustering of SHAP values around the zero baseline in this plot compared to the previous one indicates a more balanced and potentially more nuanced model output. The implications of these findings are multifaceted; while the contributions of some features to stock return predictions remain consistently impactful, others show variable influence. These distinctions may arise from changes in the data, adjustments in the model’s parameters, or different market conditions under review, each altering the relative importance and impact of the features.
When analyzing combinations of variables, the predictive power of the model exhibits nuanced differences, as shown in
Table 11. For instance, the combination of actions and news articles and the combination of actions and donations show marginally different outcomes in terms of MSE, RMSE, and MAE, albeit with a slight decrease in the R² value, suggesting a complex relationship between these variables when predicting stock returns. In the context of our analysis on the predictive modeling of stock returns, it is noteworthy that the incorporation of all three factors—actions (A), news articles (B), and donations (C)—collectively does not substantially improve the model’s performance according to the observed performance metrics. This finding suggests that the combined influence of these variables, when analyzed together, does not provide a significantly better prediction of stock returns than when these factors are considered individually or in other combinations.
Among the individual factors, the analysis reveals that news articles, characterized by the number of mentions, as well as the sentiment of the title and text, exhibit the most predictive performance. This holds true not only when news articles are considered as a standalone factor but also in comparison with other individual or combined factors. This indicates the considerable impact of media sentiment and exposure on stock market dynamics, suggesting that news articles alone offer a strong predictive signal for stock returns. Furthermore, the slight decline observed in the adjusted R-squared values across different combinations of factors indicates the presence of multicollinearity.
This analysis highlights the nuanced and interconnected nature of the factors influencing stock market movements. While news articles emerge as a particularly strong predictor, the interaction between different types of data reflective of market actions, media sentiment, and financial support highlight the complexity of accurately forecasting stock returns. Further investigation into these dynamics, possibly through more advanced modeling techniques or the inclusion of additional variables, may help in clarifying these relationships and improving the predictive accuracy of the model.