NLP-Based Application for Analyzing Private and Public Banks Stocks Reaction to News Events in the Indian Stock Exchange

Dogra, Varun; Alharithi, Fahd S.; Álvarez, Roberto Marcelo; Singh, Aman; Qahtani, Abdulrahman M.

doi:10.3390/systems10060233

Open AccessArticle

NLP-Based Application for Analyzing Private and Public Banks Stocks Reaction to News Events in the Indian Stock Exchange

by

Varun Dogra

¹

,

Fahd S. Alharithi

²

,

Roberto Marcelo Álvarez

^3,4

,

Aman Singh

^3,5,*

and

Abdulrahman M. Qahtani

²

¹

Department of Computer Science and Engineering, Lovely Professional University, Phagwara 144411, India

²

Department of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

³

Higher Polytechnic School, Universidad Europea del Atlántico, C/Isabel Torres 21, 39011 Santander, Spain

⁴

Department of Project Management, Universidad Internacional Iberoamericana, Campeche 24560, Mexico

⁵

Department of Engineering, Universidad Internacional Iberoamericana, Arecibo, PR 00613, USA

^*

Author to whom correspondence should be addressed.

Systems 2022, 10(6), 233; https://doi.org/10.3390/systems10060233

Submission received: 1 October 2022 / Revised: 26 October 2022 / Accepted: 21 November 2022 / Published: 24 November 2022

Download

Browse Figures

Versions Notes

Abstract

This is an effort to analyze the reaction of stock prices of Indian public and private banks listed in NSE and BSE to the announcement of seven best case news events. Several recent studies have analyzed the correlation between stock prices and news announcements; however, there is no evidence on how private and public sector Indian bank stocks react to important news events independently. We examine these features by concentrating on a sample of banking and government news events. We classify these news events to create a group of negative and a group of positive tone of announcements (sentiments). The statistical results show that the negative banking news announcements had a one-month impact on private banks, with statistically significant negative mean CARs. However, with highly statistically substantial negative mean CARs, the influence of the negative banking news announcements on public banks was observed for two months after the news was published. Furthermore, the influence of the positive banking news on private banks persisted a month after the news was published. Positive banking news events had an influence on public banks for five days after they were published. The study concludes that public bank stocks react more to negative news announcements than positive news announcements in the same manner as the sentimental polarity of the news announcements as compared to private bank stocks. First, we retrieved the news articles published in prominent online financial news portals between 2017 and 2020, and the seven major news events were extracted and classified using multi-class text classification. The Random Forest classifier produced a significant accuracy of 94% with pre-trained embeddings of DistilBERT, a neural network model, which outperformed the traditional feature representation technique, TF-IDF. The training data for the classifier were balanced using the SMOTE sampling technique.

Keywords:

deep learning; transfer learning; DistilBERT; event study; sentiment analysis

1. Introduction

Analysts are routinely required to estimate the impact of certain economic events on the worth of businesses. Initially, this looks to be a difficult task; nonetheless, an event study can quickly develop a method of measurement. Using data from the financial markets, a study on any event investigates the impact of a certain event on a company’s wealth. The utility of these studies involves the idea that, assuming market rationality, the impacts of such occurrences will be immediately reflected in stock price [1]. As a result, the stock price recorded over a relatively short time can be employed to design a metric of the economic influence of an occurrence. The event study has several uses. In accounting and finance research, event studies have been employed to study various company and economy-wide news events. Mergers and acquisitions, earnings results releases, fraud announcements, expert ratings, and the trade deficit are simple examples of macroeconomic issues [2]. Moreover, the finance industry has been becoming a significant test platform for NLP and Information Retrieval (IR) approaches for the automatic analysis of financial news and opinions online due to its reliance on the interpretation of numerous unstructured and structured data sources and its demand for quick and thorough decision making [3].

The primary phase in executing an event study is to describe the intriguing event and determine the time frame for examining the stock prices of the enterprises participating in the event (the event window). For instance, if you are considering quarterly or yearly results with daily returns, the event is an earnings announcement, and the event window will be one day after the news announcement. It is a common exercise to make the event window larger than the time of interest. This allows for the investigation of the time periods surrounding the event. The attention period is typically designed to cover at least the day of publication of the story and the next day. This reveals the price impact of news after the stock market closes on the day of the publication. The period preceding and following the occurrence might be of interest. The event can belong to any particular sector such as banking, pharmaceutical, technology, etc., of any country.

The banking sector is critical to a country’s growth and development, and banks are regarded as the backbone of any business. With a creative and foresighted assignment to strengthen the banking zone and its operations within the financial system, India’s banking sector has seen substantial development. In the Indian banking system, there are private and public sector banks. Further, 14 private banks were nationalized in 1969 by the Indian government, with the nationalization of 6 more private banks advancing the economic progress of the country in 1980. On 1 April 2017, SBI merged with other SBI partners and Bhartiya Mahila Bank to form India’s leading bank. Vijaya Bank and Dena Bank merged to form the Bank of Baroda in 2019. Smt. Nirmala Sitaraman, India’s finance minister, said on 30 August 2019 that 10 public sector banks will be merged with 4 big banks, starting 1 April 2020, lowering the number of public sector banks from 27 to 12. In the banking sector, mergers and acquisitions are, furthermore, more important in developing markets than in the US stock market [4]. Moreover, consolidation of the banking domain is important for a variety of reasons, including the need for more capital, risk assessment, funding development projects, technological advancements, and improved customer service [5,6]. However, the stock market, being a well-organized market, is influenced by the spread of any uncertain news occurrence in any economy. Unprecedented news about the COVID-19 epidemic in recent times impacted developing and established markets, with the Asian market being the most affected [7]. The news event that two public-sector banks would be privatized would have an immediate impact on the capital markets.

Despite being well-regulated, the banking system is confronted with a number of problems, especially financial difficulties and a lack of standard regulations. As per the annual report of the RBI (2019) and an article in the Economic Times, fraud has increased dramatically in both number and value during the past 10 years (2019). Bank frauds increased from 4669 in 2009–2010 to 6801 in 2018–2019, with a total value of 71,542.93 crores. The number of fraud cases increased by 45.66% between 20092010 and 2018–2019, with the amount involved growing by more than 35 times. Fraud at a bank or any other business entity is a completely unforeseen occurrence that has far-reaching economic and social ramifications. The negative impact of fraud on stock prices was highlighted in a reference [8].

Government policies have far-reaching implications on a country’s economy [9], particularly the banking industry. Uncertainty regarding government policy and election outcomes has serious financial implications. Existing research on policy uncertainty’s impact on economic outcomes suggests that increasing economic policy uncertainty causes enterprises to postpone investments [10], firms to be less involved in new mergers and acquisitions [11], and the amount of foreign direct investment to fall [12]. The subject of whether and how government policy uncertainty affects the banking industry is notably absent from this research. One of the objectives of this research is to fill this gap by analyzing the performance of banking stocks before and after government policy announcements.

Several factors impact the stock market, including national and international news. Some company-specific news or releases influence a single industry, whilst others, such as inflation, GDP growth, and the repo rate, affect the market as a whole [13]. In a similar vein, we are interested in analyzing the influence of certain events on the banking industry in this paper. To examine an analytical approximation of the banking events that make up daily financial news, we collected 10,000 financial news articles. What categories would we need to create for news articles to separate the banking news articles from the rest of the collection, which seems to be more or less a representation of the national or global news landscape? To categorize these news articles, we have separated them into four groups: banking, government (national), global, and non-banking (others). We solely analyzed news about banking and related topics for this study. Additionally, whether they occurred frequently or infrequently, we searched for the best examples of news events that had an impact on the banking and financial markets. In the context of a developing market such as India, our research has contributed to the present works by looking at the short-term reaction of stock prices on the Indian market to the banking sectors best-case events such as mergers and acquisitions, frauds, expert ratings, earnings results, government policies, and RBI policies. Further, the banking sector in India is divided into public and private banking. We were specifically interested in finding the impact of these news event announcements on the Indian public and private banks’ stock prices separately. An existing study also compares the returns on PSB (public sector bank) stocks to the returns on the Sensex to assess the performance of public sector banks following disinvestment (Indian stock market) [14]. The mentioned study also calculated private sector bank relative returns to the Sensex (Refer to Appendix A Table A1), which were compared to the performance of public and private sector banks. It was discovered that PSB stock performance was not considerably distinct from that of the private sector banks or Sensex. Another study looked into the effect of interest rates and foreign currency rates on the movement of banking stocks in India. The study’s findings demonstrate that all banks’ returns are heavily influenced by the performance of the Bank Nifty Index (Refer to Appendix A Table A1). There was stronger evidence of return spillover from private sector banks than of public sector bank equities. In the case of volatility spillover, however, there is evidence of bilateral spillover between private and public bank stocks [15]. The authors of that study also published another attempt look at how the US Federal Reserve and the European Central Bank’s policy interest rate announcements affect stock returns and volatility for commercial banks listed on the NYSE and the DAX in Germany [16]. They discovered that the most significant impact of Federal news on both US and German bank shares was that an unexpected policy rate hike diminished returns and increased volatility in the majority of situations.

Important financial news is increasingly available in electronic form on the WWW, and it has evolved to be a very useful data source for event studies [17] incorporating stock market evaluations [18]. By following up on a variety of online news sources and building a news classification system, investors in the Indian stock market’s banking sector can be notified of potential financial banking events. To our knowledge, there is presently no news classification system created exclusively for the banking sector. Therefore, online financial news is classified as: banking, other related news articles of interest, and non-banking. Further, the news articles on ‘events’ of our interest—mergers and acquisitions, frauds, expert ratings, earnings results, government policies, and RBI policies—are extracted from banking news and other news articles using our classification system. Finally, the tone with which news events of the banking sector are broadcast is correlated to observable stock price volatility. Therefore, we intend to classify the above-mentioned six banking news events into positive and negative sentiments and assess their influence on the stock prices of the private and public banks involved.

Sentiment analysis is a computational method for handling document’s subjectivity, sentiments, and views [19]. For monitoring and spotting key events and suspicious behaviors, this problem is very important [20]. This is also regarded as a subsection of text mining, information retrieval, and natural language processing [21,22]. Sentiment analysis in the banking financial domain is the process of interpreting readers’ sentiments (negative or positive) about banking news events using computational intelligence, such as Machine Learning or other rules-based approaches. Text sentiment classification is a basic subfield in NLP. However, the sentiment classification process is very domain-specific. To classify a text in any domain, the most popular method is to employ domain-specific data samples. Machine learning-based text sentiment analysis, on the other hand, requires enough labeled training data. A transfer learning approach is widely used to overcome this problem [23]. With the quick growth of deep learning, various applicable approaches to transfer learning are now being used, and numerous notable findings are being obtained. Transfer learning fine-tunes pre-trained deep learning models [24,25], using even small domain-specific data (banking financial domain in our current study). To collect banking financial domain data for further sentiment analysis and event study for Indian private and public banks, a test classification framework was designed.

To extract the news of the banking sector and additional financial news that is relevant to the banking news collected from the various online news sources, the text classification approach was used. In natural language processing, text classification is a well-known subject wherein labels are assigned to texts such as phrases or documents. It may be used for a number of activities, such as answering questions, spam filtering, topic modelling, news classification, and so forth [26]. Moreover, any text is a very valuable reference, although extracting ideas from it may be challenging and time-consuming due to an often unstructured nature [27]. Manual annotation or automatic labeling are both options for text classification. Automatic text classification is becoming more significant as the amount of text data in industry sectors grows. However, when working with extreme circumstances or sectors where public or synthetic databases are insufficient or unavailable, manual labelling is very crucial. In our study, manual labelling was the preferred method for text classification. Furthermore, in our study, the Banking Financial NLP is an unaddressed edge case. Banking Financial would not have been able to generate synthetic data in this circumstance. Here, it was both practical and efficient to use a team of financial experts to manually label the data. The following four steps may be dissected in most text classification and document categorization systems; extraction of features, dimensionality reductions, selection classifiers, and assessments are all part of the process [26]. The study shows how an effective financial news classification framework can lead to ascertaining the impact of the tone of such information on the stock prices of related public and private banks, as well as what financial news must be retrieved, and what type of news is most appropriate to banking stakeholders.

The key contributions of the study are mentioned below:

Extracting news events that are relevant to the banking sector from the overall financial news articles;
Performing sentiment classification on banking financial news into positive and negative classes using the state-of-the-art NLP approach;
Performing an event study on banking stocks listed in an Indian stock exchange, i.e., BSE and NSE (Refer to Appendix A Table A1).

The remainder of this document is divided into different sections. The literature review is discussed in Section 2, the methodology is discussed in Section 3, Section 4 mentions the data and experimentation, Section 5 contains the conclusion.

2. Literature Review

Although recent research has looked at a variety of market aspects, this research takes a new approach, following the current academic focus on online financial information disclosure and other banking events. We emphasize the importance of an efficient financial news classification framework, what financial news must be extracted, what type of news is most relevant to banking stakeholders, and what direction the financial data framework may take in response to the disclosure of the impact of the tone of such information on related public and private banks’ stock prices. The literature on the creation of the event study methodology and its use to measure the effects of various new events on the returns on banking stocks is covered in this subsection.

The authors investigated the importance of news articles about the privatization of two public sector banks in India. The statistical findings explore that both the private and public sector banks will be negatively impacted by the announcement’s overall effect, despite the fact that private sector banks, on the event day, saw positive average abnormal returns. Significant results prior to the release date are also shown by the statistical findings to support information leakage [28]. By employing the CSRP daily return from 1990 to 2005, and a sample of 1000 portfolios of 50 stocks, the authors also looked at the issue of cross-sectional correlations in event studies [29]. Between May 2013 and June 2014, the short-term anomalous returns of equities listed on the BSE were analyzed. The authors came to the conclusion that, by making a systematic investment during times of political instability, investors could achieve atypical returns. The nation’s political and economic structure saw a noticeable change as a result of important national elections [30]. The authors worked on creating a dictionary-based sentiment analysis model, creating a sentiment analysis lexicon for the financial industry, and assessing the model for determining how news sentiments affect stock prices [31]. It is clear from these studies that certain news events have to be studied to understand the sentiments in their tone and their impact in terms of short to long term return on the stock entities involved in these events. To properly formulate our research question, we delve into the literature on financial event studies and frameworks in the following sub-sections.

2.1. Event Study

Unexpected stock market news events have the potential to have an impact on a company’s financial performance. The existing studies on financial market reactions to disclosure and financial reporting are extensive and include topics such as fraud announcements, merger and acquisitions reports, dividend notifications, and government or RBI policy changes. The majority of the literature on fraud focuses on the negative impact of fraud on company performance and shareholder wealth [8]. Similarly, unfavorable government policies [32] and announcements by the RBI (Refer to Appendix A Table A1) may impact banking stocks [33]. Some financial industry sectors have positive event returns while others have negative ones [34]. The authors show that earnings news releases have a statistically significant negative influence on stock prices and trade quantities in the days after their release [35]. However, the existing studies do not provide a clear picture of how investors respond to financial news specifically from the banking sector.

2.2. Text Classification

Over the last few decades, text classification problems have been extensively researched and handled in a variety of real-world applications [36,37,38,39]. Many researchers are increasingly interested in building applications that use text classification algorithms, especially in light of recent advancements in Natural Language Processing and text mining. Text classification refers to the process of assigning pre-defined tags to new text data or sentences based on the training data of a trained classifier. The training examples are labeled with these pre-defined categories throughout the training phase. Labeling is frequently done using hand-coded rules, called manual labeling, in this system [40]. By examining the features of a group of documents that have been manually categorized under this category, Machine Learning automatically creates a classifier for that category [41]. Most classification techniques based on Machine Learning, however, are infeasible due to the enormous dimensionality of text classification issues. Furthermore, many features may be useless or noisy, and only a tiny fraction of the terms are truly useful for categorization. To prevent overfitting, feature selection is used to limit the amount of features [42]. Also, to choose a technique for deploying text classification systems, it is required to analyze text classification performance [43]. The test collection is prepared and divided into two sets: a training set for learning and a test set for classification and assessment. The approaches that were used in the experiment learned from examples in the training set and classified them in the test set. We modified two regularly used efficacy metrics for assessment. The percentage of documents accurately categorized as positives was measured with precision. The percentage of correctly categorized positive documents was measured by recall [41]. The F1 measures of the techniques were also computed (both accuracy and recall are given equal relevance), and there is a weighted F1 score that allows for varied recall and precision weightage [43,44].

2.3. News Classification System

Text classification is a well-understood topic. Many approaches have been developed, and many of them may be immediately used for news categorization as long as each established category has a good collection of training documents [45,46]. However, the classification problem can become considerably more complicated if the categories (i.e., customized categories) are established and training documents are not readily available [47]. The Web is fast evolving for participation in content development and utilization, with a rising number of participants resorting to online news sources for day-to-day updates. Around the same time as the internet became popular, financial online newspapers began to develop. Every day, a large number of financial news items are produced by the many news portals that are available on the web, and the rate of production is increasing dramatically.

An online newspaper can take many different formats [48]. The electronic edition of a printed newspaper is one example. The online version may be read in the same way as a printed paper; however, there is no classification, neither in terms of content nor in terms of presentation. A news website is another type of online newspaper that allows users to browse via menus sorted by subject areas and sub-categories. The reader is presumed to obtain the news via a computer system while connecting to a specific news provider over the internet in most of the above types of online newspapers. These services, however, may not be enough for many readers and reading contexts [49]. Many newspaper readers prefer reading and analyzing news from numerous sources. Readers are frequently only interested in news items related to their areas of interest [50]. Therefore, investors in financial markets must read all the news articles of various events related to a specific sector to obtain the news items of their interest. An investor who is interested in the banking sector news has to look over all of the news items from different news websites for that sector and it becomes tedious to spend the time reading the news to analyze the specific events of that particular sector. Therefore, an investor would likely prefer a system that would extract and categorize the news for that sector out of overall financial news collected from various websites. This is not limited to financial news classification.

The authors have created a system for syndromic surveillance that allows for automated online news monitoring and categorization [17,51]. However, gathering data is an important first step in the development of such a system. These systems’ data sources are supposed to offer timely event indications and are generally stored and transferred electronically [52]. Different data needed for the banking sector include banks mergers, fraud announcements, dividend or results announcements, expert recommendations, and government and RBI policies announcements. Online news portals are becoming essential resources for a new generation of financial event monitoring systems. In this study, we used online public news portals as financial news sources to categorize the news for our area of interest, the banking sector, and training documents for the classification system of the banking sector were extracted from these financial news items. However, the overall financial news obtained an unequal number of news articles for the banking sector and other sectors for the multiclass classification system [53]. This required data balances for each category in the classification system [54,55].

2.4. Data Imbalances in the Multiclass Classification System

In the multiclass classification system, if the occurrence of instances representing every class is unequal, a dataset is said to be class-imbalanced. In the research on the classification of news articles, dealing with an unbalanced dataset has been a prominent topic. When confronted with imbalanced datasets, traditional Machine Learning techniques may induce biases [56]. Many classification algorithms’ accuracy is thought to be affected by data imbalances [57]. The category of classification algorithms includes the vast majority of binary text classification applications, with negative samples of the class of interest greatly outnumbering positive examples [58].

In the case of an unbalanced data set, an output of the classifier tends to be biased towards specific classifications (majority class) [59]. The challenges of unbalanced classification are that the amount of entries in every class to achieve a classification process varies greatly, and the facility of extrapolating on different data sets, have continued key concerns in Natural Language Processing and Machine Learning [60]. Furthermore, classifiers may often be intended to maximize accuracy, which may not be a realistic criterion for judging performance in the situation of unbalanced training data. As a result, our study addresses the analysis of some Machine Learning classification approaches that may lead to high precision even with unbalanced datasets. However, it is worth experimenting with some of the challenges we encountered when dealing with imbalanced data and evaluating certain metrics in addition to precision to assess performance. We also used Machine Learning to accomplish multi-classification of documents, where the data sample did not perfectly fit into one of the several categories.

2.5. Sentiment Classification

Sentiment dictionary and Machine Learning approaches are two types of traditional sentiment classification methods [61,62]. The authors offer the SENTiVENT corpus of English business news, which includes annotations at the token level for implicit polarity (positive, negative, or neutral investor sentiment, respectively), target spans, and polar spans [63]. A manually organized and produced sentiment dictionary library is mostly used in the classification approach centered on sentiment dictionaries. The text’s sentiment score is then computed using the defined rules. The floating point values for the sentiment scores for each text instance ranged from −1 (negative) to 1 (positive), with 0 denoting a neutral sentiment; this approach was also implemented in [64]. The size of the entire sentiment dictionary properties affects the accuracy of classification outcomes [65,66]. When there are not enough dictionaries properties available, classification results are frequently inaccurate. A successful feature extraction and classifier are at the basis of Machine Learning approaches. The classifier is trained on the features of the training set when the feature set is created, and the sentiment tag is applied to the test text when the feature set is deployed. Machine Learning approaches, on the other hand, rely on a huge number of labeled data, and manually labeled datasets are insufficient for sentiment classification. The described domain concept is strongly connected to the expression of sentiment in a text; however, using text sentiment analysis classifiers that have been trained in other domains will result in reduced applicability. Cross-domain sentiment classifiers based on transfer learning technologies have recently become a focus of study.

2.6. Transfer Learning: Pre-Trained Deep Learning Models

Transfer learning is a Machine Learning method which extracts information from one domain and applies it to another (financial domain in this study) [67]. For cross-domain challenges, it is mostly used to resolve distribution disparities. For example, our job with regard to banking news events sentiment classification is to automatically classify news about a bank, such as a merger or acquisition, into negative and positive polarities. To begin this classification work, we would first gather and annotate a large number of news articles about banks. Next, we would use the news to train a classifier with their associated labels. We would require a significant quantity of labeled data to train the news sentiment classification models for banks since the distribution of news data among various banks might be extremely diverse. Maintaining good classification results, it would be important to gather a huge number of labeled data. This data-labeling procedure, however, can be rather costly. We might wish to modify a classification model that has been trained on certain banks for news events to help learn classification models for some other banks’ news events to reduce the work of annotating a huge number of news articles for multiple banks. Transfer learning can save a large amount of labeling time in these situations [68]. It fine-tunes pre-trained deep learning models using even small domain-specific data (target domain). A model that has been pre-trained has already been built and learned to address a similar problem. The amount of time spent training in this work is substantially less than the amount of time spent training from scratch, and it is generally preferable for two reasons to fine-tune the pre-trained model: to attain even greater accuracy and to ensure that the output is correctly formatted.

Sequential transfer learning, so far, has resulted in the most significant gains in this field. The standard procedure is to pretrain representations on a large unlabeled text corpus using our preferred approach, then adapt them to a supervised target task using annotated data [69]. BERT (Bidirectional Encoder Representations from Transformers) is the state-of-the-art in language representation that is created to pretrain deep bidirectional representations from unlabeled text and then fine-tune them for diverse NLP applications using tagged text [70]. BERT can represent various downstream tasks because of the Transformer’s self-attention mechanism [71], which makes fine-tuning simple. We simply feed in the appropriate inputs and outputs into BERT for each task and fine-tune all of the parameters.

However, the trend concerning larger models creates numerous difficulties. The most significant is the environmental cost of exponentially increasing the processing overhead of these models [72]. Next, when we run these models in real-time on-device, it is likely to allow a variety of interesting and unusual language handling uses; the models’ rising computational and memory necessities might limit their widespread use by using much smaller language models, and it is likely to obtain equivalent outcomes on many downstream tasks requiring a less operational training cost, and the model named DistilBert [73]. The model illustrates that a 40% smaller Transformer [71] pre-trained by distillation under the supervision of a larger Transformer language model may attain identical efficiency on a range of downstream challenges, although it still performs 60% faster at inference time, using a triple loss. This study deals with the following research questions:

How can we keep track of online news articles for the banking sector?
For data balances in domain-specific news classification for the banking sector, which data sampling approach can perform well?
Is it possible for a pre-trained deep learning model to outperform classic text representation techniques in domain-specific online news classification for banking news events and sentiment classification?
Which Machine Learning method is superior for the classification and sentiment classification of online financial news?
Do events announcements on mergers and acquisitions, frauds, expert ratings, earnings results, government policies, and RBI policies paired with positive or negative sentiments have a considerable and equal influence on the private and public sector banks’ stocks concerning benchmark index Bank Nifty in the Indian stock market?

3. System Architecture: A Proposed Methodology

For the event study of banking news events on the Indian private and public banks, the financial news has been considered. The primary concept was to choose news events from the overall financial news in the database linked to the study period that covered mergers and acquisitions, frauds, expert ratings, earnings reports, government policies, and RBI policies for Indian private and public banks. The news events, collectively, are, again, classified into positive and negative polarities. Followed by this, the effect of the news events with negative and positive sentiments are analyzed on private and public banks stocks listed in NSE and BSE. The system architecture is shown in Figure 1. The study is achieved in five phases which are explained in the later sub-sections.

3.1. Data Acquisition

We scraped news from Times of India, Financial Express, Bloomberg, and Money Control in phase one using Python-written code. From 2017 to 2020, we collected about 10,000 news articles. These were pre-processed in order for Machine Learning models to be examined from the training sample and applied to the test data set in a manner that is acceptable.

3.2. News Classification: Extracting Banking News from financial News Corpus

In the second phase, we wanted to separate banking industry-related news articles and other news on the most linked areas from the group of financial news articles. We think that the financial news of a country is inextricably linked to its ‘government news events’ that comprise articles on government proposals for worthy governance, elections in the states or nation, modifications or novel expansions in governmental policy, and ‘global’ financial news. As indicated in Table 1, to gather banking and other related news, such as global and government news, from all financial news articles, we worked on a four-class classification challenge. There are a total of 10,000 samples, and we choose to classify news articles into banking, global, governmental, and non-banking classes. Non-banking events includes any financial information gleaned from various news sources which do not fit into one of these three classes (global, governmental, and banking). We chose manual labeling [40] of news articles with the assistance of financial specialists, where overlapping cases were selected to be discarded without causing harm to one of the classes [56].

3.2.1. Dealing with Class Imbalances in Multiclass News Classification

When dealing with unbalanced sets of data, the main goal is to increase the occurrence of the minority class while lowering the occurrence of the majority class. This is implemented so that all classes acquire the same occurrence of samples. Under-sampling assists in class distribution optimization by removing samples of majority classes at random. When the samples of the majority and minority classes are fully balanced, this is achieved. By raising the degree of imbalance, evolutionary under-sampling outperforms non-evolutionary models [74]. Cluster-based instance selection, a unique under-sampling approach that merges clustering analysis with sample selection, is used [75]. The clustering analysis framework divides the majority class dataset into subclasses based on identical data samples, whereas unaccountable data samples are separated from each category by the sample selection framework. Under-sampling using KNN has also been shown to be the most successful technique [76].

By randomly reproducing minority class cases, oversampling increases the number of occurrences of minority classes in research work. The author of one study proposed a Random Walk Over-Sampling method for matching distinct class examples by producing synthetic examples by arbitrarily traversing from actual data [77]. By providing certain samples of a synthetic minority class, this strategy of sampling is intended to consider the uneven data grouping. The simulated samples are pooled with the original examples to create a much more efficient complete dataset, which is then utilized to generate unbiased classifiers. Nevertheless, conventional over-sampling algorithms have demonstrated their various flaws, such as causing major over-generalization or failing to successfully improve the class imbalance in data space, when confronted with the more difficult challenge of a binary class imbalance scenario.

The author of one study proposed SMOM, a synthetic minority oversampling strategy based on k-nearest neighbors (k-NN), to address multi-class imbalance difficulties [78]. SMOTE is a state-of-the-art over-sampling method that incorporates sample drawing methods to increase the number of positive classes by repeating the data at random until the number of positive and negative classifications are equal [79]. In our study, due to the addition of synthetic samples, which raised the number of training samples required to divide the data evenly over four distinct labels, multiclass classification was achieved with a proportional dataset among several classes using SMOTE to overcome data imbalances.

3.2.2. Machine Learning Classifier

It is desirable to examine the efficacy of a classifier after it has been constructed in a research context. The original corpus is divided into two sets before classifier creation in this application, not always of the same size; training and a test set. The essential notion is that the classifier receives a set of training data representing existing instances of classes and, using the knowledge received from the training data, conducts a statistical analysis of the training data to decide which classes additional unknown data belongs to. A test set is used to evaluate the efficacy of classifiers. The k-fold cross-validation method is another option [80]. In our study, the Machine Learning classifiers used for news classification are Decision Tree [81], Linear SVC [46], Logistic Regression [82], Random Forest [83], and Multilayer Perceptron [84]. Machine learning classifiers are also compared in terms of performance based on two metrics named Precision and Recall. The results are also verified with cross-validation. Before the news articles are fed into classifiers, the text documents are represented with the conventional TF-IDF approach [85], and a neural network model, DistilBERT [73].

3.3. Banking News Events Extraction and Classification

In phase 3, we were primarily focused on classifying banks and other media articles that are closely linked (the output of phase 2 discussed in Section 3.2) further into seven events, this will be discussed in further depth in the next section. Event extraction and classification is the process of finding and extracting structured information about events from any text and giving it the appropriate label. Due to the completeness of the acquired information and its relevance to a range of real-world scenarios, this method has attracted the attention of many researchers and businesses [86]. Moreover, a rising number of approaches to improving the quality of produced data and the efficacy of event extraction systems has been reported. In this study, we designed a hybrid model [87] that integrates rule-based and Machine Learning approaches to find superior outcomes by combining them.

3.3.1. News Events Representation: A Transfer Learning Approach

Due to the vast unlabelled corpus, one of the most fascinating uses of both unsupervised and transfer learning is embedding. The demand for transfer learning is overwhelming. The transformer-based concept has certain advantages, one of which is that these models accept the entire sequence as input instead of the token by token that is standard in RNN-based models, which is a huge gain and allows the GPUs to speed up the model. We don’t require labelled data to pre-train these models. We simply need a large amount of unlabelled text input to train a transformer-based model. This transfer learning model can be utilized for the NLP task of news-events classification.

While large-scale pre-trained systems for transfer learning are becoming more popular in Natural Language Processing, using limited CPU training or inference budgets to run these huge models remains a challenge. The researchers propose a model for pre-training DistilBERT, a general-purpose language representation model that may be fine-tuned for a variety of purposes in the future, including its larger models, with amazing results [73]. BERT can represent various downstream tasks because of the Transformer’s self-attention mechanism, which makes fine-tuning simple. The Transformer uses an encoder–decoder architecture with layered encoder and decoder layers. The two sublayers that make up encoder layers are self-attention and a position-wise feed-forward layer. The three sublayers that make up the decoder layers are self-attention, encoder–decoder attention, and a position-wise feed-forward layer.

In RNN, we create predictions based on the input

x_{t}

and the prior hidden state

h_{t - 1}

. In an attention-based system, however, the input x is substituted by attention as given in Equation (1).

h_{t} = f (x, h_{t - 1})

h_{t} = f (a t t e n t i o n (x, h_{t - 1}), h_{t - 1})

(1)

We can imagine that the attention process stores information that is now relevant and important. For each input feature

x_{t}

, for example, we may train a fully connected layer to score how significant feature i is in the context of the preceding hidden state h.

s_{i} = t a n h (W_{c} h_{t - 1} + W_{x} x_{i})

(2)

Following that, we normalize the score with a softmax function, written in Equation (3), to generate the attention weights.

α_{i} = s o f t m a x (s_{1}, s_{2}, s_{3}, \dots, s_{i}, \dots)

(3)

Finally, the weighted output of the input features based on attention

Z

will be used to replace the input

x

as shown in Equation (4).

Z = \sum_{i} α_{i} x_{i}

(4)

The (query, key, value) model is used to describe the idea of attention. A query Q is a ‘context’ and the prior hidden state is used as the query context in earlier equations. We want to know what comes next based on what we already know. The value represents the input features. The term “key” is merely an encoded version of the word “value.” The relevance between the query and the keys is determined to create attention. The corresponding values that are not relevant to the query are then hidden out. The Transformer architecture is a unique encoder–decoder paradigm with an attention mechanism. Q, K, and V are not fed directly to the attention module in the Transformer (according to the issue, K, Q, and V are representations of the encoder and decoder states. The linear transformation of the states is reflected in the weight matrices. They undergo training together with the rest of the neural network block). These are first transformed using the trainable parameters matrices

W_{Q}

,

W_{K}

, and

W_{V}

. These parameter matrices are distinct for each layer and attention head.

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{K}}}) V

(5)

The sentence is parsed with an RNN to produce an embedding vector to create a dense representation of the sentence. A query is run on each word to encode the entire sentence. The query is made up of the word itself. With each key in the sentence, the relevance of this query is calculated. This word’s representation is just a weighted totality of the values based on relevance, which is the attention output. The Transformer employs learned word embedding to turn words into word embedding vectors during the encoding process. They are then fed through an attention-based encoder, which produces the context-sensitive representation for each word. There will be one output vector

h_{i}

for each word embedding. Then, the attention for all the words can be computed concurrently. The queries, keys, and values are put into the matrices Q, K, and V in that sequence. The matrix product

Q K^{T}

will calculate the degree of similarity between the queries and the keys as shown in Equation (5). When the size d is significant, however, an issue with the dot products

Q K^{T}

may occur.

Position embeddings show that the positions of relative placements of words are important. Transformers encode a word with all of its contexts and take positional information into account as part of the word embedding. Even if the fixed position embedding is not used, we may argue that the model weights will ultimately learn to take a relative position into account. The attention formula reformulates and inserts two parameters (one for values and one for keys) that take into account the relative position of words. Equation (6) mentions the values of the parameters learned between positions i and j.

a_{i j}^{V}, a_{i j}^{K}

(6)

The contribution from the jth word is modified using

a_{i j}

when creating the attention

z_{i}

for the ith word. They can be trained rather than having their values fixed. The attention is calculated, as given in Equations (7)–(11), with relative position.

Z_{i} = \sum_{j = 1}^{n} α_{i j} (x_{j} W^{V})

(7)

α_{i j} = \frac{e x p (e_{i j})}{\sum_{k = 1}^{n} e x p (e_{i k})}

(8)

e_{i j} = \frac{(x_{i} W^{Q}) {(x_{j} W^{K})}^{T}}{\sqrt{d_{Z}}}

(9)

Z_{i} = \sum_{j = 1}^{n} α_{i j} (x_{j} W^{V} + a_{i j}^{V})

(10)

e_{i j} = \frac{(x_{i} W^{Q}) {(x_{j} W^{K} + a_{i j}^{K})}^{T}}{\sqrt{d_{Z}}}

(11)

The Transformer employs fixed position embedding because it performs similarly to other techniques but can handle sequence lengths that are larger than the ones trained. To generate the vector representation, BERT employs the mentioned Transformer encoder. The tendency toward bigger models with more parameters raises various difficulties [88]. The findings revealed that a 40% smaller Transformer, DistilBert, pre-trained by distillation under the supervision of a larger Transformer (based on BERT) language model, could produce equivalent efficiency while inferring 60% faster on a range of downstream tasks. In our study, we use the DistilBert for representing text news events articles before these documents are fed into classifiers.

3.3.2. News Events Classification: A Hybrid Approach

In this part of phase 3, the Random Forest ensemble classifier was used to classify news events. The basic model was fine-tuned in our rule-based part and this was based on simple or challenging logical expressions of NLP (Random Forest for classifier + DistilBERT for document embedding). We have specific rule descriptions in this part of our hybrid model that can impose such limits on phrases that are found in the data or not. However, in this research’s experimentation, a basic vital language was created for making rule creation easier. One or more rules may be associated with each event label. To assess if the input news article fit the rule’s requirements, the supplied news item was compared to each rule. When a label is rejected, the classifier’s number of false positives is kept to a minimum, which increases accuracy. Due to the addition of a new class, false negatives would be removed and recall would continue to improve. With this technique, we enhanced the accuracy by 1.0% against the standard model by fine-tuning the classifier and establishing unique rules for each label. Writing rules for each label did not require a substantial amount of expert labour when compared to the efforts required to train the classifier on a training set that includes a collection of documents for each label with specified rules.

3.4. Sentiment Classification of Banking News Events

In phase 4 of the study, the news was divided into two polarities: negative and positive. The model input a set of news events as input and generated a sentiment label: {negative, positive} as shown in Figure 2. To create a sentiment classifier, we employed a DistilBERT model that has been pre-trained for transfer learning. The model was trained and tested on news events from banking and other correlated domains extracted from overall financial news as mentioned in Section 3.3.

3.5. Event Study: Indian Banking (Private and Public Banks)

In the Indian banking system, the banks are categorized into private and public sectors. We investigated how private and public bank stocks react to banking news, as well as the way these responses fluctuate based on whether the news is negative or positive. We also investigated whether the private and public banks react equally to the negative and positive tone of news events.

Using the event study approach, the impact of six banking and governmental events on the stock returns of Indian private and public banks was examined. This approach estimates abnormal returns on the day ‘t’ after a banking news event is made public. The announcement of the banking news event explains fluctuations in the corresponding bank’s stock values. The market model has been the most broadly adopted anticipated return model. It is based on the actual returns of a reference market as well as the relationship between the company’s shares and the reference market. The difference between the actual returns of a bank’s shares reported on a given day ‘t’ (

R_{i, t})

and the expected returns that the bank’s stock would have shown in the absence of the event calculates the abnormal returns, which are calculated based on two parameters: the distinctive relationship between the bank’s stock and the reference index (stated by the α and β parameters), and the actual market’s return being referenced (

R_{m, t})

. The following equation expresses the model:

A R_{i, t} = R_{i, t} - (α_{i} + β_{i} R_{m, t})

(12)

The detailed analysis applied for several occurrences of the same sort may have different stock market reaction patterns. The abnormal returns linked with diverse periods before and after the event day is calculated as follows:

A A R = \frac{1}{N} \sum_{i = 1}^{N} A R_{i, t}

(13)

The total abnormal return is calculated by adding individual abnormal returns, which represents the entire impact of an event over a certain time period (called an event window and represented as

(t_{1}, t_{2})

). The following equation calculates the cumulative abnormal returns:

C A R (t_{1}, t_{2}) = \sum_{t = t_{1}}^{t_{2}} A R_{i, t}

(14)

With the multiple observations of some event types in an event study, the cumulative average abnormal returns (CAAR) (Refer to Appendix A Table A1) are calculated, which represent the mean values of similar events. The following equation calculates the cumulative average abnormal returns:

C A A R = \frac{1}{N} \sum_{i = 1}^{N} C A R (t_{1}, t_{2})

(15)

The Standardised Cross-Sectional t-test technique is used to assess the statistical significance of the CARs estimates [89] (Refer to Appendix A Table A1).

4. Experiments and Analysis

We used Python-based code created in Google Colab to obtain news for our experiments from public news sources such as Times of India, Money Control, Bloomberg, and Financial Express. The python script was collecting news articles many times each day. As a result, we accumulated roughly 10,000 financial news articles from between 2017 and 2020. We cleaned and prepared the news articles using the Tableau prep tool. To extract banking and other relevant news, we chose to classify the news stories as banking, government, global, and non-banking. These four categories were manually assigned to the news items. The technique of manually labeling text articles by human experts (or users) is time-consuming and labor-intensive, but it produces greater accuracy since expert knowledge is used to label the texts with the proper information. We classified a selection of representative media articles for each class as we went along. The labelers were experts in the financial industry and financial markets. In a four-class classification, a team of three experts performs feature selection to find the key or representative terms for each class. Next, each text document is examined and assigned to the appropriate class based on the representative words for each class. The classification tests were run on Python 3.8 with a variety of Python-supported libraries (scikit-learn and imblearn) that included Machine Learning and deep learning classifiers. The data had been skewed in nature as shown in Figure 3 (Refer to Appendix A Table A2). As a result, several sampling procedures were employed to balance the data across classes.

We further classified the news articles separated from the overall financial news in the previous phase into seven events (Refer to Appendix A Table A3): RBI Policies, Merger or Acquisition, Results, Rating Agencies or Expert’s View, Governmental, Global, and Fraud [87]. The news events were relevant to the private and public Indian banking sectors. We also used transfer learning to divide the news events into negative and positive polarity for sentiment categorization as shown in Table 2. While inputting our hand-labeled data set into the supervised Machine Learning classifier Random Forest, the pre-trained DistilBERT model was fine-tuned. Even with professionals in the field, classifying financial reports or documents is challenging. Competence of the annotator is needed for the tagging of news items with suitable sentiment. For classification, we have conducted experiments using the TensorFlow library in python created and released by Google.

In this study, we investigated the relationship between CARs of Indian banks by sector (private and public) and news sentiments (negative and positive) following banking news events. The dependent variable was the CAR obtained from the data in the event study. We created different event windows with lengths ranging from 120 days (−60 to +60) to zero days (i.e., a one-day computation). The Standardised Cross-Sectional t-test technique was used to assess the statistical significance of the CARs estimates. As mentioned earlier, the banks fall under the public and private sectors. We evaluated how public and private banks listed in NSE and BSE reacted to banking news events from short (1–5 days prior and post news event publication) to long term (60 days prior and post news event publication).

We ran numerous experiments on our pre-processed data collected from web news portals, applying the usual Machine Learning techniques described in the prior section. The main goal of these trials was to find the best classifier for each situation. The classification output of each classifier was derived using the metrics Precision, Recall, and F1-score. Accuracy was achieved for all classifiers using a train–/test split of 75% and 25%, respectively, and five-fold cross-validation.

SMOTE helps to balance class representation by duplicating minority class cases at random. In comparison to alternative down and up-sampling strategies, the Random Forest classifier with balanced data employing SMOTE has the maximum accuracy, according to previous research [90]. The data were vectorized with the TF-IDF feature representation approach, and the data were balanced with SMOTE before being fed into Machine Learning classifiers. The data were additionally vectorized using the DistilBERT feature representation approach, and the data were balanced with SMOTE. The resultant feature set was fed into different Machine Learning classifiers. The selected classifiers’ findings are provided in the tables below.

Table 3 and Table 4 demonstrate the results of each classifier when the TF-IDF feature extraction approach was used to vectorize the data and it was balanced between classes using the SMOTE over-sampling method.

Among the classifiers—Multilayer Perceptron, Logistic Regression, Random Forest, Decision Tree, and Linear SVC for all classes with balanced datasets using SMOTE up-sampling—the Random Forest performed best in terms of accuracy, with 93% using the train/test method and 94% cross-validation, as shown in Table 4. For the classifications—Banking, Global, Non-Banking, and Governmental—the Random Forest achieved F1 scores of 0.90, 0.94, 0.90, and 1.00, respectively. Table 3 and Table 4 compare all of the described classifiers for the four different classes.

Table 5 shows the results of each classifier when data were vectorized using the DistilBERT feature extraction and when data were balanced across classes using the over-sampling approach SMOTE.

Among the different classifiers—Linear SVC, Decision Tree, Random Forest, Logistic Regression, and Multilayer Perceptron for all classes with balanced datasets using SMOTE up-sampling—the Random Forest performed best, with 94% accuracy using the train/test method and 94% cross-validation, as shown in Table 6. For the classifications—Banking, Global, Non-Banking, and Governmental—the Random Forest achieved F1 scores of 0.93, 0.94, 0.87, and 1.00, respectively. Table 5 and Table 6 compare all of the described classifiers for four different classes.

The Random Forest classifier is shown to be effective with a pre-trained neural model DistilBERT feature extraction and representation technique, and it does better in terms of classification accuracy than that with the TF-IDF feature extraction and representation technique by 1%. Although the MLP classifier with DistilBERT feature representation also produced 94% accuracy using the train/test method, the same as of Random Forest classifier, it was slightly less accurate with cross-validation by 0.02%. Therefore, the Random Forest with DistilBERT is considered the best classifier among all, with the highest accuracy using both the train/test split method and cross-validation.

In addition, we used a hybrid strategy that combines a rule-based approach with a machine-learning algorithm to perform an experimental evaluation of Indian banking news for event extraction and categorization to identify event scope and event triggers. The banking news was first labeled into seven classes or events: Results, Rating Agencies or Expert’s View, Merger or Acquisition, Governmental, Global, Fraud, and RBI Policies. The accuracy, recall, and F1 score of the generated hybrid model and DistilBERT fine-tuned using banking news events dataset and Random Forest classifier are shown in Table 7.

Table 8 shows that, of the two techniques, DistilBERT using Random Forest classifier and suggested Hybrid model (i.e., transfer learning through DistilBERT and fine tuning with own Rules for Random Forest), the Hybrid model performed best with an accuracy of 100%.

Furthermore, these news events were classified intonegative, positive, and neutral sentiments. On banking news-event sentiments, Table 9 illustrates the accuracy, recall, and F1 score of DistilBERT fine-tuned using different Machine Learning classifiers.

As shown in Table 10, with an accuracy of 78%, the Random Forest outperformed the other classifiers: Decision Tree, Logistic Regression, and Linear SVC. The influence of these news events on sentiments on private and public banking stocks listed on the NSE and BSE is examined.

In Table 11, we observe highly statistically substantial negative mean CARs of −1.05% and −5.05% in the event windows (−5, 5) and (D, 30) for private banks following the publication of a negative banking news event. This indicates that investors have responded in an identical manner to the tone of news events. The influence of negative banking news articles on private banks continued one month after the news was published. As a result, we assume that the banking news events with negative polarity will have an adverse impact for a short to medium period on private banking stocks or indexes.

However, we observe highly statistically substantial negative mean CARs of −2.15%, −3.96%, −8.74%, −11.44%, −0.76%, −4.64%, −9.55%, −14.86%, −24.81%, −17.25%, −8.3%, and −2.87% in the event windows (D, 1), (D, 5), (D, 30), (D, 60), (−1, D), (−5, −1), (−30, −1), (−60, −1), (−60, 60), (−30, 30), (−5, 5), and (−1, 1) for public banks following the publication of a negative banking news event. This indicates that investors have responded in an identical manner to the polarity of news articles. The impact of negative banking news events on public banks was observed for two months after the news was released. As a result, we anticipate that banking news articles with negative polarity will have a long-term adverse influence on public banking stocks or indexes. Furthermore, the mean CARs are negative in all event windows prior to the publishing of banking news with a negative polarity. This demonstrates that investors may forecast negative news events on public banking stocks before they occur. Banking news events with a negative polarity have a greater impact on public banking stocks or index returns than private banking stocks.

In Table 12, we observe highly statistically substantial negative mean CARs of −2.46% and −6.23% in the event windows (D, 5) and (D, 30) for private banks following the announcement of a banking news event with positive polarity. The impact of positive banking news events on private banks lasted one month after the news was published. As a result, it is assumed that positive banking news events will have a substantial influence on private banking stocks or indexes for a short to medium period. In the symmetric event window (−30,30), a statistically significant negative mean CARs of −6.99% is observed. This demonstrates that investors may forecast positive news events on private banking stocks before they occur.

We also observe the statistically substantial negative mean CARs of −1.61% and −3.66% in the event windows (D, 1) and (D, 5) for public banks following the publication of a positive banking news event. The impact of positive banking news events on public banks lasted for five days after the news was published. As a result, it is assumed that positive banking news events will have a substantial impact on public banking stocks or indexes for a short period. It is also clear that public banks’ stocks react more to negative news events as compared to positive news events in the same manner as the tone of the news events.

5. Conclusions and Future Works

The goal of this paper was to perform an event study on private and public bank stocks listed in NSE and BSE. It was found that the influence of banking news with negative polarity on private banks lasted one month after the news was published, with statistically substantial negative mean CARs of −1.05% and −5.05% in the event windows (−5, 5) and (D, 30) following the announcement of the negative banking news event. Conversely, the impact of negative banking news events on public banks was observed for two months after the news was released, with highly statistically substantial negative mean CARs of −2.15%, −3.96%, −8.74%, −11.44%, −0.76%, −4.64%, −9.55%, −14.86%, −24.81%, −17.25%, −8.3%, and −2.87% in the event windows (D, 1), (D, 5), (D,30), (D,60), (−1, D), (−5, −1), (−30, −1), (−60, −1), (−60, 60), (−30, 30), (−5, 5), and (−1, 1) for public banks following the announcement of the banking news with negative polarity. As a result, it is anticipated that banking news with negative polarity will have a long-term negative influence on public banking stocks or indexes as compared to private bank stocks.

Moreover, the impact of positive banking news events on private banks lasted one month after the news was published. As a result, it is assumed that positive banking news events will have a substantial impact on private banking stocks or indexes for a short to medium period. The impact of positive banking news events on public banks lasted for five days after the news was published. As a result, it is assumed that positive banking news events will have a substantial impact on public banking stocks or indexes for a very short period. It is also clear that public bank stocks react more to negative news events as compared to positive news events, in the same manner as the tone of the news events. We looked at the effects of news events on the banking sector in India. However, future studies might concentrate on how large event news announcements’ tones affect other sectors. Furthermore, even though we analyzed data from India, future studies might be focused on capturing cross-country implications.

It is quite visible that the Random Forest classifier performs better for multiclass classification on financial news datasets than Linear SVC, Decision Tree, MLP, and Logistic Regression Machine Learning models. Furthermore, when it comes to financial news classification, event classification, and sentiment classification, transformers-based pre-trained DistilBERT word embeddings outperform standard TF-IDF with the Random Forest classifier. It can also be seen that the SMOTE sampling technique deals with the unbalanced datasets perfectly, produces appropriate samples for each class in multiclass classification, and results in a highly accurate classification with the Random Forest classifier. We intend to examine further appropriate sentiment classification applications to new analytic areas in the future. We would want to obtain enough training data to apply to the model using novel transfer learning-based approaches. In addition, for multi-class prediction, we would want to increase the number of classification labels.

Author Contributions

Conceptualization, V.D., F.S.A. and R.M.Á.; Methodology, A.S., A.M.Q. and V.D.; Investigation, A.S. and F.S.A.; Data Curation, V.D.; Writing-Original Draft, V.D., A.S., R.M.Á. and A.M.Q.; Writing—Review Editing, A.S.; Supervision, R.M.Á.; Project Administration, F.S.A. and A.M.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Taif University Researchers Supporting Project number (TURSP-2020/347), Taif University, Taif, Saudi Arabia.

Acknowledgments

This research was supported by Taif University Researchers Supporting Project number (TURSP-2020/347), Taif University, Taif, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The terms and symbols used in the event study.

Term/Symbol	Description
BSE	Bombay Stock Exchange
NSE	National Stock Exchange
RBI	Reserve Bank of India
Nifty	‘National Stock Exchange Fifty’, the benchmark index for the National Stock Exchange in India
Sensex	The benchmark index of the Bombay Stock Exchange in India
CAR	Cumulative Abnormal Return
CAAR	Cumulative Average Abnormal Return

Table A2. The class-wise documents count before the data-balances technique for the year 2017–2020.

Class	Document Count
Banking	916
Global	1609
Governmental	421
Non-Banking (Others)	7054

Table A3. The event-wise documents count used in the event study for the year 2017–2020.

Event	Document Count
RBI Policies	100
Merger or Acquisition	100
Results	100
Rating Agencies or Expert’s View	100
Governmental	100
Global	100
Fraud	100

References

MacKinlay, A.C. Event Studies in Economics and Finance. J. Econ. Lit. 1997, 35, 13–39. [Google Scholar]
Acharya, S. A Generalized Econometric Model and Tests of a Signalling Hypothesis with Two Discrete Signals. J. Finance 1988, 43, 413–429. [Google Scholar] [CrossRef]
Maia, M.; Handschuh, S.; Freitas, A.; Davis, B.; McDermott, R.; Zarrouk, M.; Balahur, A. WWW’18 Open Challenge. In Proceedings of theThe Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 1941–1942. [Google Scholar] [CrossRef]
Rai, V.K.; Pandey, D.K. Does privatization of public sector banks affect stock prices? An event study approach on the Indian banking sector stocks. Asian J. Account. Res. 2021, 7, 71–83. [Google Scholar] [CrossRef]
Yadav, Y.; Aggarwal, S. Impact of mergers and acquisitions on the performance of the Indian bank’s share price: An event study approach. Int. J. Econ. Res. 2017, 14, 237–248. [Google Scholar]
Gugler, K.; Mueller, D.C.; Yurtoglu, B.B.; Zulehner, C. The effects of mergers: An international comparison. Int. J. Ind. Organ. 2003, 21, 625–653. [Google Scholar] [CrossRef]
DK Pandey, V.K. An event study on the impacts of COVID-19 on the global stock markets. Int. J. Financ. Mark. Deriv. 2021, 8, 148–168. [Google Scholar] [CrossRef]
Sharma, D.; Verma, R. Reaction of Stock Price to Frauds’ Announcements: Evidence from Indian Banking Sector. Asia-Pacific J. Manag. Res. Innov. 2020, 16, 157–166. [Google Scholar] [CrossRef]
McGrattan, E.R.; Prescott, E.C. Taxes, Regulations, and the Value of U.S. and U.K. Corporations. Rev. Econ. Stud. 2005, 72, 767–796. [Google Scholar] [CrossRef]
Azzimonti, M. Partisan conflict and private investment. J. Monet. Econ. 2018, 93, 114–131. [Google Scholar] [CrossRef]
Bonaime, A.; Gulen, H.; Ion, M. Does policy uncertainty affect mergers and acquisitions? J. Financ. Econ. 2018, 129, 531–558. [Google Scholar] [CrossRef]
Julio, B.; Yook, Y. Policy uncertainty, irreversibility, and cross-border flows of capital. J. Int. Econ. 2016, 103, 13–26. [Google Scholar] [CrossRef]
Singh, G.; Padmakumari, L. Stock market reaction to inflation announcement in the Indian stock market: A sectoral analysis. Cogent Econ. Financ. 2020, 8, 1723827. [Google Scholar] [CrossRef]
Mohan, T.T.R. Long-run Performance of Public and Private Sector Bank Stocks. Econ. Polit. Wkly. 2003, 38, 785–788. [Google Scholar]
Shahani, R.; Nagpal, N. A Study of the Movement of Interest Rates and Spillover of Volatility and Returns Amongst the Leading Bank Stocks in India. IUP J. Financ. Risk Manag. 2019, 16, 7–22. [Google Scholar]
Kim, S.J.; Lee, L.; Wu, E. The Impact of Domestic and International Monetary Policy News on U.S. and German Bank Stocks; Emerald Group Publishing Limited: Bingley, UK, 2013; Volume 14, ISBN 9781783501700. [Google Scholar]
Zhang, Y.; Dang, Y.; Chen, H.; Thurmond, M.; Larson, C. Automatic online news monitoring and classification for syndromic surveillance. Decis. Support Syst. 2009, 47, 508–517. [Google Scholar] [CrossRef] [PubMed]
Vicari, M.; Gaspari, M. Analysis of news sentiments using natural language processing and deep learning. AI Soc. 2021, 36, 931–937. [Google Scholar] [CrossRef]
Medhat, W.; Hassan, A.; Korashy, H. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 2014, 5, 1093–1113. [Google Scholar] [CrossRef]
Caschera, M.C.; Ferri, F.; Grifoni, P. Sentiment analysis from textual to multimodal features in digital environments. In Proceedings of the 8th International Conference on Management of Digital EcoSystems, Hendaye, France, 2–4 November 2016; pp. 137–144. [Google Scholar] [CrossRef]
Hemmatian, F.; Sohrabi, M.K. A survey on classification techniques for opinion mining and sentiment analysis. Artif. Intell. Rev. 2019, 52, 1495–1545. [Google Scholar] [CrossRef]
Wankhade, M.; Rao, A.C.S.; Kulkarni, C. A Survey on Sentiment Analysis Methods, Applications, and Challenges; Springer: Dordrecht, The Netherlands, 2022; ISBN 0123456789. [Google Scholar]
Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Wang, S.; Sun, Y.; Xiang, Y.; Wu, Z.; Ding, S.; Gong, W.; Feng, S.; Shang, J.; Zhao, Y.; Pang, C.; et al. ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation. arXiv 2021, arXiv:2112.12731. [Google Scholar]
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 1–11. [Google Scholar]
Kowsari, K.; Meimandi, K.J.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text classification algorithms: A survey. Information 2019, 10, 150. [Google Scholar] [CrossRef]
Minaee, S. Deep Learning Based Text Classification: A Comprehensive Review. ACM Comput. Surv. 2020, 1, 1–42. [Google Scholar] [CrossRef]
Okpanachi, E. Privatisation and universal access to water: Examining the recent phase of water governance in Nigeria. In Water and Urban Development Paradigms; CRC Press: Boca Raton, FL, USA, 2008; pp. 637–664. [Google Scholar]
Kolari, J.W.; Pynn€onen, S. Event study testing with cross-sectional correlation of abnormal returns. Rev. Financ. Stud. 2010, 23, 3996–4025. [Google Scholar] [CrossRef]
Savita and Ramesh, A. Return volatility around national elections: Evidence from India. Procedia Soc. Behav. Sci. 2015, 189, 163–168. [Google Scholar] [CrossRef]
Shah, D.; Isah, H.; Zulkernine, F. Predicting the Effects of News Sentiments on the Stock Market. In Proceedings of the 2018 IEEE International Conference on Big Data, Seattle, WA, USA, 10–13 December 2018; pp. 4705–4708. [Google Scholar] [CrossRef]
Pástor, Ľ.; Veronesi, P. Uncertainty about Government Policy and Stock Prices. J. Finance 2012, 67, 1219–1264. [Google Scholar] [CrossRef]
Khuntia, S.; Hiremath, G.S. Monetary Policy Announcements and Stock Returns: Some Further Evidence from India. J. Quant. Econ. 2019, 17, 801–827. [Google Scholar] [CrossRef]
Carow, K.A.; Kane, E.J. Event-study evidence of the value of relaxing long-standing regulatory restraints on banks, 1970–2000. Q. Rev. Econ. Financ. 2002, 42, 439–463. [Google Scholar] [CrossRef]
Lagasio, V.; Brogi, M. Market reaction to banks’ interim press releases: An event study analysis. J. Manag. Gov. 2021, 25, 95–119. [Google Scholar] [CrossRef]
Atkins, A.; Niranjan, M.; Gerding, E. Financial news predicts stock market volatility better than close price. J. Financ. Data Sci. 2018, 4, 120–137. [Google Scholar] [CrossRef]
Schumaker, R.P.; Chen, H. A quantitative stock prediction system based on financial news. Inf. Process. Manag. 2009, 45, 571–583. [Google Scholar] [CrossRef]
Tanguy, L.; Tulechki, N.; Urieli, A.; Hermann, E.; Raynal, C. Natural language processing for aviation safety reports: From classification to interactive analysis. Comput. Ind. 2016, 78, 80–95. [Google Scholar] [CrossRef]
Majumder, N.; Politécnico, I.; Soujanya Poria, N.; Gelbukh, A.; Nacional, I.P.; Cambria, E. Affective Computing and Sentiment Analysis Deep Learning-Based Document Modeling for Personality Detection from Text. IEEE Intell. Syst. 2017, 32, 74–79. [Google Scholar] [CrossRef]
Dogra, V.; Verma, S. Challenges and Opportunities in Labeling for Text Classification. Think India 2019, 22, 4390–4400. [Google Scholar]
Montañés, E.; Díaz, I.; Ranilla, J.; Combarro, E.F.; Fernández, J. Scoring and Selecting Terms for Text Categorization. IEEE Intell. Syst. 2005, 20, 40–47. [Google Scholar] [CrossRef]
Yang, Y.; Pedersen, J.O. A comparative study on feature selection in Text Categorization. Proceedings of 14th International Conference on Machine Learning (ICML-97), Guangzhou, China, 18–21 February 2022; 1997; 97, pp. 412–420. [Google Scholar]
Yang, Y.; Joachims, T. Text categorization. Scholarpedia 2010, 3, 4242. [Google Scholar] [CrossRef]
Wang, Y.; Wang, X.J. A new approach to feature selection in text classification. In Proceedings of the 2005 International Conference on Machine Learning and Cybernetics, Guangzhou, China, 18–21 August 2005; 6, pp. 3814–3819. [Google Scholar] [CrossRef]
Sebastiani, F. Machine learning in automated text categorisation: A survey. ACM Comput. Surv. 1999, 34, 1–47. [Google Scholar] [CrossRef]
Yang, Y.; Liu, X. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, 15–19 August 1999; pp. 42–49. [Google Scholar] [CrossRef]
Chan, C.; Chan, C. Institutional Knowledge at Singapore Management University Automated online news classification with personalization Automated Online News Classification with Personalization. In Proceedings of the 4th International Conference on Asian Digital Libraries, Bangalore, India, 10–12 December 2001. [Google Scholar]
Tenenboim, L.; Shapira, B.; Shoval, P. Ontology-Based Classification of News in an Electronic Newspaper; Institute of Information Theories and Applications FOI ITHEA: Sofia, Bulgaria, 2008; pp. 89–97. [Google Scholar]
Chy, A.N.; Seddiqui, M.H.; Das, S. Bangla news classification using naive Bayes classifier. In Proceedings of the 16th Int’l Conf. Computer and Information Technology, Khulna, Bangladesh, 8–10 March 2014; pp. 366–371. [Google Scholar] [CrossRef]
Rabib, M.; Sarkar, S.; Rahman, M. Different Machine Learning based Approaches of Baseline and Deep Learning Models for Bengali News Categorization. Int. J. Comput. Appl. 2020, 176, 10–16. [Google Scholar] [CrossRef]
Pinner, R.W.; Rebmann, C.A.; Schuchat, A.; Hughes, J.M. Disease surveillance and the academic, clinical, and public health communities. Emerg. Infect. Dis. 2003, 9, 781–787. [Google Scholar] [CrossRef]
Yan, P.; Chen, H.; Zeng, D.D. Syndromic Surveillance Systems: Public Health and Biodefence. Rev. Inf. Sci. Technol. (ARIST) 2008, 42. [Google Scholar]
Sun, Y.; Wong, A.K.C.; Kamel, M.S. Classification of imbalanced data: A review. Int. J. Pattern Recognit. Artif. Intell. 2009, 23, 687–719. [Google Scholar] [CrossRef]
Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell. 2016, 5, 221–232. [Google Scholar] [CrossRef]
Verma, S.; Dickerson, J.; Hines, K. Counterfactual Explanations for Machine Learning: Challenges Revisited. arXiv 2021, arXiv:2106.07756. [Google Scholar]
He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2019, 21, 1263–1284. [Google Scholar] [CrossRef]
Moreo, A.; Esuli, A.; Sebastiani, F. Distributional random oversampling for imbalanced text classification. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 17–21 July 2016; pp. 805–808. [Google Scholar] [CrossRef]
Cao, P.; Zhao, D.; Zaiane, O. An optimized cost-sensitive SVM for imbalanced data learning. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Berlin/Heidelberg, Germany, 2013; pp. 280–292. [Google Scholar] [CrossRef]
Kaur, H.; Pannu, H.S.; Malhi, A.K. A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Comput. Surv. 2019, 52, 1–36. [Google Scholar] [CrossRef]
Madabushi, H.T.; Kochkina, E.; Castelle, M. Cost-Sensitive BERT for Generalisable Sentence Classification on Imbalanced Data. arXiv 2019, arXiv:2003.11563. [Google Scholar] [CrossRef]
Zhang, S.; Wei, Z.; Wang, Y.; Liao, T. Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary. Futur. Gener. Comput. Syst. 2018, 81, 395–403. [Google Scholar] [CrossRef]
Schumaker, R.P.; Zhang, Y.; Huang, C.-N.; Chen, H. Evaluating sentiment in financial news articles. Decis. Support Syst. 2012, 53, 458–464. [Google Scholar] [CrossRef]
Jacobs, G.; Hoste, V. Fine-grained implicit sentiment in financial news: Uncovering hidden bulls and bears. Electronics 2021, 10, 2554. [Google Scholar] [CrossRef]
Chen, C.-C.; Huang, H.-H.; Chen, H.-H. NLG301 at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News; Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2018; pp. 847–851. [Google Scholar] [CrossRef]
Zhang, S. Sentiment Classification of News Text Data Using Intelligent Model. Front. Psychol. 2021, 12, 758967. [Google Scholar] [CrossRef] [PubMed]
Chen, L.-C.; Lee, C.-M.; Chen, M.-Y. Exploration of social media for sentiment analysis using deep learning. Soft Comput. 2020, 24, 8187–8197. [Google Scholar] [CrossRef]
Meng, J.; Long, Y.; Yu, Y.; Zhao, D.; Liu, S. Cross-Domain Text Sentiment Analysis Based on CNN_FT Method. Information 2019, 10, 162. [Google Scholar] [CrossRef]
Blitzer, J.; Dredze, M.; Pereira, F. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, 23–30 June 2007; pp. 440–447. [Google Scholar]
Liu, R.; Shi, Y.; Ji, C.; Jia, M. A Survey of Sentiment Analysis Based on Transfer Learning. IEEE Access 2019, 7, 85401–85412. [Google Scholar] [CrossRef]
Kenton, M.C.; Kristina, L.; Devlin, J. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Proc. Adv. Neural Inf. Process. Syst. 2017, 2017, 5999–6009. [Google Scholar]
Schwartz, R.; Dodge, J.; Smith, N.A.; Etzioni, O. Green ai. arXiv 2019, arXiv:1907.10597. [Google Scholar] [CrossRef]
Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
García, S.; Herrera, F. Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol. Comput. 2009, 17, 275–306. [Google Scholar] [CrossRef]
Tsai, C.F.; Lin, W.C.; Hu, Y.H.; Yao, G.T. Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf. Sci. 2019, 477, 47–54. [Google Scholar] [CrossRef]
Liang, G.; Zhang, C. A comparative study of sampling methods and algorithms for imbalanced time series classification. In Australasian Joint Conference on Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2012; pp. 637–648. [Google Scholar] [CrossRef]
Zhang, H.; Li, M. RWO-Sampling: A random walk over-sampling approach to imbalanced data classification. Inf. Fusion 2014, 20, 99–116. [Google Scholar] [CrossRef]
Zhu, T.; Lin, Y.; Liu, Y. Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recognit. 2017, 72, 327–340. [Google Scholar] [CrossRef]
Raza, M.; Hussain, F.K.; Hussain, O.K.; Zhao, M.; Rehman, Z. ur A comparative analysis of machine learning models for quality pillar assessment of SaaS services by multi-class text classification of users’ reviews. Futur. Gener. Comput. Syst. 2019, 101, 341–371. [Google Scholar] [CrossRef]
Refaeilzadeh, P.; Tang, L.; Liu, H. Cross-validation. Encycl. Database Syst. 2009, 5, 532–538. [Google Scholar]
Safavian, S.R.; Landgrebe, D. A Survey of Decision Tree Classifier Methodology. IEEE Trans. Syst. Man Cybern. 1991, 21, 660–674. [Google Scholar] [CrossRef]
Daniel, J.; Martin, J.H. Martin Logistic Regression. In Speech and Language Processing; Pearson: London, UK, 2020. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chatterjee, C.; Roychowdhury, V. Statistical Risk Analysis for Classification and Feature Extraction by Multilayer. In Proceedings of the International Conference on Neural Networks (ICNN’96), Washington, DC, USA, 3–6 June 1996; pp. 1610–1615. [Google Scholar]
Robertson, S. Understanding inverse document frequency: On theoretical arguments for IDF. J. Doc. 2004, 60, 503–520. [Google Scholar] [CrossRef]
Jacobs, G.; Lefever, E.; Hoste, V. Economic Event Detection in Company-Specific News Text. In Proceedings of the 1st Workshop on Economics and Natural Language Processing (ECONLP) at Meeting of the Association-for-Computational-Linguistics (ACL), Melbourne, Australia, 20 July 2018; pp. 1–10. [Google Scholar] [CrossRef]
Dogra, V.; Verma, S.; Singh, A.; Talib, M.N.; Humayun, M. Banking news-events representation and classification with a novel hybrid model using DistilBERT and rule-based features. Turk. J. Comput. Math. Educ. (TURCOMAT) 2021, 12, 3039–3054. [Google Scholar]
Strubell, E.; Ganesh, A.; McCallum, A. Energy and policy considerations for deep learning in NLP. arXiv 2019, arXiv:1906.02243. [Google Scholar] [CrossRef]
Winter, J.C.F. Using the student’s t-test with extremely small sample sizes. Pract. Assessment, Res. Eval. 2013, 18, 10. [Google Scholar]
Dogra, V.; Verma, S.; Jhanjhi, N.Z.; Ghosh, U.; Le, D. A Comparative Analysis of Machine Learning Models for Banking News Extraction by Multiclass Classification With Imbalanced Datasets of Financial News: Challenges and Solutions. Int. J. Interact. Multimed. Artif. Intell. 2022, 7, 35. [Google Scholar] [CrossRef]

Figure 1. Extraction of banking news events from financial news articles for banking stock analysis impair with sentiment polarity.

Figure 2. The input news events to Sentiment Classifier and output as sentiment label.

Figure 3. The distribution of news article instances amongst the four classes [90].

Table 1. A collection of news articles from various sources and classes.

Source	News Article	Class
www.financialexpress.com 11 April 2018	ICICI Bank loses more than 2% since the bank has exposure to Gitanjali Group rather than Nirav Modi enterprises. According to CBI authorities, a consortium of 31 lenders loaned Rs 5280 crore to the Gitanjali Group.	Banking
www.moneycontrol.com 9 March 2018	India’s current account deficit has increased but remains below acceptable limits, and GDP growth is expected to range between 7.5 and 7.7%.	Governmental
www.bloombergquint.com 28 February 2018	The Federal Reserve of the United States has lowered its key interest rate by a half-point, the first time it has performed so other than regular appointments since the 2008 financial crisis.	Global
www.moneycontrol.com 7 March 2018	For the sixth day in a row, the Nifty50 has generated a bearish candle, and analysts believe it would be tough for the Nifty to rapidly overcome the 200-DEMA.	Non-Banking

Table 2. A collection of news articles with events labels from public and private Indian banks along with sentiment polarity.

News Article
The Long-Term Issuer Default Rating (IDR) of Axis Bank Limited is ‘BB+’ with a Negative Outlook has been maintained by Fitch Ratings, as has its Viability Rating (VR) of ‘bb’. RatingsAgencies_Experts_View Private Negative
Yes Bank reported a drop in revenue on a quarterly and annual basis. Total income for the current quarter was Rs 5972.12 crore, down −2.46% quarter on quarter and −28.46% year on year. Results Private Negative
The industry and markets have long speculated about a merger between HDFC Limited (HDFC) and its banking business HDFC Bank. The RBI’s internal working group has recommended that well-run major non-banking financial firms (NBFCs) be evaluated for conversion into banks, which has sparked speculation of the HDFC twins merging. Mergers_and_Acquisitions Private Positive
After the merger of Allahabad Bank with it, Indian Bank sees growth in lending and deposits and new opportunities emerging despite the lockdown, according to Padmaja Chunduru, Managing Director & CEO of Indian Bank. Mergers_and_Acquisitions Public Positive
Because of decreased provisions and a one-time gain, the State Bank of India’s quarterly earnings climbed. The net profit of India’s largest lender climbed by 81% year on year to Rs 4189 crore. Results Public Positive
Punjab National Bank reported a surprise quarterly loss on higher-than-expected provisions. The government-owned lender reported a loss of Rs 4750 crore in the quarter ended March compared to a Rs 13,417-crore loss in the same quarter last year, according to a press release. Results Public Negative

Table 3. Results for the classifiers for different classes with balanced data using SMOTE and vectorized with TF-IDF on four-class classification on financial news.

Classifier	Banking			Global			Non-Banking			Governmental
Classifier	P	R	F₁	P	R	F₁	P	R	F₁	P	R	F₁
Multi-layer Perceptron	0.90	0.82	0.86	0.86	0.91	0.88	0.90	0.93	0.92	0.99	1.00	1.00
Decision Tree	0.86	0.81	0.83	0.91	0.95	0.93	0.85	0.86	0.86	0.98	1.00	0.99
Linear SVC	0.91	0.83	0.87	0.90	0.92	0.91	0.90	0.95	0.92	0.97	1.00	0.99
Logistic Regression	0.88	0.86	0.87	0.91	0.90	0.91	0.92	0.91	0.91	0.95	1.00	0.97
Random Forest	0.86	0.94	0.90	0.96	0.92	0.94	0.93	0.88	0.90	1.00	1.00	1.00

Table 4. Accuracy of classifiers vectorized using TF-IDF with balanced data using SMOTE up-sampling technique on four-class classification on financial news.

Classifier	Accuracy (Train/Test)	Cross-Validation
Random Forest	0.93	0.944
Logistic Regression	0.92	0.916
Linear SVC	0.92	0.922
Multi-layer Perceptron	0.91	0.919
Decision Tree	0.90	0.916

Table 5. Results for the classifiers for different classes with balanced data using SMOTE and vectorized with DistilBERT on four-class classification on financial news.

Classifier	Banking			Global			Non-Banking			Governmental
Classifier	P	R	F₁	P	R	F₁	P	R	F₁	P	R	F₁
Decision Tree	0.85	0.89	0.87	0.80	0.88	0.84	0.78	0.63	0.69	0.99	1.00	1.00
Linear SVC	0.94	0.94	0.94	0.87	0.95	0.91	0.87	0.72	0.79	0.97	1.00	0.99
Logistic Regression	0.95	0.90	0.93	0.90	0.91	0.90	0.81	0.81	0.81	0.97	1.00	0.99
Random Forest	0.95	0.92	0.93	0.93	0.94	0.94	0.87	0.88	0.87	1.00	1.00	1.00
Multi-layer Perceptron	0.95	0.94	0.95	0.91	0.94	0.93	0.89	0.84	0.87	0.99	1.00	1.00

Table 6. Accuracy of classifiers vectorized using DistilBERT with balanced data using SMOTE up-sampling technique on four-class classification on financial news.

Classifier	Accuracy (Train/Test)	Cross-Validation
Decision Tree	0.86	0.848
Random Forest	0.94	0.935
Linear SVC	0.91	0.912
Multi-layer Perceptron	0.94	0.913
Logistic Regression	0.91	0.898

Table 7. Results of the DistilBERT fine-tuned with Random Forest classifier and proposed Hybrid model on event classification [87].

Approach	(Hybrid) DistilBERT + Random Forest + Rules			DistilBERT + RandomForest
	P	R	F₁	P	R	F₁
Global	1.00	1.00	1.00	1.00	1.00	1.00
Results	1.00	1.00	1.00	0.96	1.00	0.98
Fraud	1.00	1.00	1.00	1.00	1.00	1.00
RatingsAgencies_Experts_View	1.00	0.98	0.99	1.00	0.96	0.98
RBI_Policies	1.00	1.00	1.00	1.00	1.00	1.00
Merger_Or_Acquisition	0.98	1.00	0.99	0.97	1.00	0.98
Governmental	1.00	1.00	1.00	1.00	1.00	1.00

Table 8. Accuracy of the DistilBERT fine-tuned with Random Forest classifier and proposed Hybrid model on event classification [87].

Classifier	Accuracy (Train/Test)
DistilBERT + RandomForest	0.99
(Hybrid) DistilBERT + RandomForest + Rules	1.00

Table 9. Results of the DistilBERT fine-tuned on banking news-events sentiments with Machine Learning classifiers.

Classifier	Logistic Regression			Random Forest			Decision Tree			Linear SVC
Classifier	P	R	F1	P	R	F1	P	R	F1	P	R	F1
Positive	0.68	0.77	0.72	0.88	0.74	0.81	0.88	0.74	0.81	0.70	0.82	0.75
Negative	0.73	0.59	0.65	0.57	0.72	0.63	0.52	0.62	0.56	0.68	0.65	0.67
Neutral	0.85	0.92	0.88	0.88	0.88	0.88	0.78	0.84	0.81	0.82	0.69	0.75

Table 10. Accuracy of the classifiers with DistilBERT on banking news-events sentiments.

Classifier	Accuracy
Linear SVC	0.73
Decision Tree	0.74
Random Forest	0.78
Logistic Regression	0.76

Table 11. The effect of banking news events with negative polarity on private and public banks stock.

Event Window	Private Banks		Public Banks
Event Window	Mean CAR	t-Value	Mean CAR	t-Value
(−60, 60)	0.0624	0.961	−0.2481 ***	−4.921
(−30, 30)	0.0181	0.560	−0.1725 ***	−4.301
(−5, 5)	−0.0105 *	−1.653	−0.0830 *	−2.197
(−1, 1)	−0.0061	−0.360	−0.0287 ***	−3.982
(−60, −1)	0.0219	0.312	−0.1486 ***	−5.716
(−30, −1)	−0.0268	−0.929	−0.0955 ***	−3.030
(−5, −1)	−0.0052	−0.307	−0.0464 *	−1.763
(−1, D)	−0.0006	−0.135	−0.0076 *	−1.850
(D, 1)	−0.0053	−0.369	−0.0215 ***	−3.942
(D, 5)	−0.0019	−0.102	−0.0396 **	−2.433
(D, 30)	0.0505**	2.249	−0.0874 ***	−4.556
(D, 60)	0.0500	1.473	−0.1144 ***	−3.357

Note: *, **, and *** represent the statistical significance at 10%, 5%, and 1%, in that order.

Table 12. The effect of banking news events with positive polarity on private and public bank’s stock.

Event Window	Private Banks		Public Banks
Event Window	Mean CAR	t-Value	Mean CAR	t-Value
(−60, 60)	−0.0882	−1.160	0.0993	0.544
(−30, 30)	−0.0699 *	−1.710	−0.0688	−1.436
(−5, 5)	−0.0134	−1.030	−0.0231	−0.675
(−1, 1)	−0.0017	−0.180	−0.0036	−0.310
(−60, −1)	−0.0007	−0.015	0.1100	0.540
(−30, −1)	0.0017	0.068	−0.0329	−0.821
(−5, −1)	0.0116	1.335	0.0008	0.040
(−1, D)	0.0001	0.023	0.0133	1.423
(D, 1)	−0.0020	−0.344	−0.0161 **	−2.390
(D, 5)	−0.0246 ***	−2.732	−0.0366 ***	−2.973
(D, 30)	−0.0623 **	−2.463	−0.0339	−0.937
(D, 60)	−0.0599	−1.338	0.0095	0.180

Note: *, **, and *** represent the statistical significance at 10%, 5%, and 1%, in that order.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dogra, V.; Alharithi, F.S.; Álvarez, R.M.; Singh, A.; Qahtani, A.M. NLP-Based Application for Analyzing Private and Public Banks Stocks Reaction to News Events in the Indian Stock Exchange. Systems 2022, 10, 233. https://doi.org/10.3390/systems10060233

AMA Style

Dogra V, Alharithi FS, Álvarez RM, Singh A, Qahtani AM. NLP-Based Application for Analyzing Private and Public Banks Stocks Reaction to News Events in the Indian Stock Exchange. Systems. 2022; 10(6):233. https://doi.org/10.3390/systems10060233

Chicago/Turabian Style

Dogra, Varun, Fahd S. Alharithi, Roberto Marcelo Álvarez, Aman Singh, and Abdulrahman M. Qahtani. 2022. "NLP-Based Application for Analyzing Private and Public Banks Stocks Reaction to News Events in the Indian Stock Exchange" Systems 10, no. 6: 233. https://doi.org/10.3390/systems10060233

APA Style

Dogra, V., Alharithi, F. S., Álvarez, R. M., Singh, A., & Qahtani, A. M. (2022). NLP-Based Application for Analyzing Private and Public Banks Stocks Reaction to News Events in the Indian Stock Exchange. Systems, 10(6), 233. https://doi.org/10.3390/systems10060233

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

NLP-Based Application for Analyzing Private and Public Banks Stocks Reaction to News Events in the Indian Stock Exchange

Abstract

1. Introduction

2. Literature Review

2.1. Event Study

2.2. Text Classification

2.3. News Classification System

2.4. Data Imbalances in the Multiclass Classification System

2.5. Sentiment Classification

2.6. Transfer Learning: Pre-Trained Deep Learning Models

3. System Architecture: A Proposed Methodology

3.1. Data Acquisition

3.2. News Classification: Extracting Banking News from financial News Corpus

3.2.1. Dealing with Class Imbalances in Multiclass News Classification

3.2.2. Machine Learning Classifier

3.3. Banking News Events Extraction and Classification

3.3.1. News Events Representation: A Transfer Learning Approach

3.3.2. News Events Classification: A Hybrid Approach

3.4. Sentiment Classification of Banking News Events

3.5. Event Study: Indian Banking (Private and Public Banks)

4. Experiments and Analysis

5. Conclusions and Future Works

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI