Performance Analysis of Statistical and Supervised Learning Techniques in Stock Data Mining

: Nowadays, overwhelming stock data is available, which areonly of use if it is properly examined and mined. In this paper, the last twelve years of ICICI Bank’s stock data have been extensively examined using statistical and supervised learning techniques. This study may be of great interest for those who wish to mine or study the stock data of banks or any ﬁnancial organization. Different statistical measures have been computed to explore the nature, range, distribution, and deviation of data. The different descriptive statistical measures assist in ﬁnding different valuable metrics such as mean, variance, skewness, kurtosis, p -value, a-squared, and 95% conﬁdence mean interval level of ICICI Bank’s stock data. Moreover, daily percentage changes occurring over the last 12 years have also been recorded and examined. Additionally, the intraday stock status has been mined using ten different classiﬁers. The performance of different classiﬁers has been evaluated on the basis of various parameters such as accuracy, misclassiﬁcation rate, precision, recall, speciﬁcity, and sensitivity. Based upon different parameters, the predictive results obtained using logistic regression are more acceptable than the outcomes of other classiﬁers, whereas naïve Bayes, C4.5, random forest, linear discriminant, and cubic support vector machine (SVM) merely act as a random guessing machine. The outstanding performance of logistic regression has been validated using TOPSIS (technique for order preference by similarity to ideal solution) and WSA (weighted sum approach).


Introduction
The deep statistical analytics of a bank's stock data, along with the performance analysis of different classifiers, can significantly assist a financial analyst and data scientist in predicting intraday, weekly, monthly, and future values of the stock.In this manuscript, the stock data of ICICI bank has been examined using several statistical and supervised learning techniques.ICICI Bank is one of the important leading Indian private banks, comprised of more than 4000 branches that operate in 19 different countries [1].The bank offers a range of financial services related to savings, current, and fixed deposit accounts.It also offers a different range of loans to rural and urban customers.In the last few years, ICICI bank has gained a lot of faith and confidence from its customers.This statement can be verified from the TRA brand trust report 2018, which declared ICICI to be top of the private Indian banks [2].Figure 1 represents some of the key figures of the ICICI Banks.It is observed that when compared with the financial year (FY) of 2016, the year of 2017 has a good rate of net interest margins.Growth of 0.71% has been observed in 2017.For the same period, net NPA's stood at 4.89% (2.67% for FY 2016), which is slightly higher than the industry average for private banks.period, net NPA's stood at 4.89% (2.67% for FY 2016), which is slightly higher than the industry average for private banks.Nowadays, the momentous volume of data has been produced daily by different commercial, administrative, and scientific organizations that are tremendously expanding the sizes of the databases.Devices based on social media, the Internet, and the Internet of Things (IoT) further assist in fueling the growth of these databases [3].Therefore, data are not an issue now.However, the real challenge lies in transforming this mountain of data into useful information.The use of statistics and data mining techniques assists in the automatic extraction of veiled data patterns that are otherwise buried in unprocessed data [4].Data mining is a systematic data processing approach that collects, cleans, processes, examines, and extracts substantial qualities and hidden patterns in data [5,6].In other words, the use of data mining techniques assists in extracting precise and significant insights from the deluge of data that have been collected from several manual and digital data sources [4,7].Data mining has been used in several domains such as agriculture [7][8][9], finance [4,[10][11][12], medical science [13][14][15][16][17], and bio-informatics [18,19].
Chandralekha and Shenbagavadivu have examined the performance of different machine learning techniques used to predict cardiovascular disorders.The study has been carried to compare and contrast the performances of both supervised and unsupervised learning techniques based upon three different metrics, namely, accuracy, recall, and precision.Authors found that the results obtained using decision trees are more accurate than those using other machine learning techniques [20].Belavagi and B. Muniyal have evaluated the performance of different machine learning techniques employed for intrusion detection.Authors found that the results obtained in identifying intrusion using random forest outperform other those using machine learning techniques [21].
Previously, different authors have tried to examine the financial and multivariate analysis of ICICI Bank [22,23].However, no significant work has been carried out to rigorously analyze ICICI Bank's stock using a combination of descriptive statistics and supervised learning techniques.The aggregated motive of this study is to extensively mine and analyze the data of one of the leading private Indian banks (ICICI, Mumbai, India) using descriptive and supervised learning techniques.Statistical techniques have been employed to examine the nature, trend, variation, and distribution of data.The last five years' key figures of ICICI Bank's stock have been examined to assess the economic status of ICICI Bank.Different descriptive statistical measuressuch as mean, standard deviation, variance, skewness, and kurtosis, along with p-values and A-squared values, have been computed for the major attributes of ICICI Bank's stock data.The last twelve years' daily deviation in the opening value of the stock has been recorded and analyzed to examine the intraday status of ICICI Bank's stock.Moreover, ten different classifiers, namely, naïve Bayes; C4.5; random forest; Nowadays, the momentous volume of data has been produced daily by different commercial, administrative, and scientific organizations that are tremendously expanding the sizes of the databases.Devices based on social media, the Internet, and the Internet of Things (IoT) further assist in fueling the growth of these databases [3].Therefore, data are not an issue now.However, the real challenge lies in transforming this mountain of data into useful information.The use of statistics and data mining techniques assists in the automatic extraction of veiled data patterns that are otherwise buried in unprocessed data [4].Data mining is a systematic data processing approach that collects, cleans, processes, examines, and extracts substantial qualities and hidden patterns in data [5,6].In other words, the use of data mining techniques assists in extracting precise and significant insights from the deluge of data that have been collected from several manual and digital data sources [4,7].Data mining has been used in several domains such as agriculture [7][8][9], finance [4,[10][11][12], medical science [13][14][15][16][17], and bio-informatics [18,19].
Chandralekha and Shenbagavadivu have examined the performance of different machine learning techniques used to predict cardiovascular disorders.The study has been carried to compare and contrast the performances of both supervised and unsupervised learning techniques based upon three different metrics, namely, accuracy, recall, and precision.Authors found that the results obtained using decision trees are more accurate than those using other machine learning techniques [20].Belavagi and B. Muniyal have evaluated the performance of different machine learning techniques employed for intrusion detection.Authors found that the results obtained in identifying intrusion using random forest outperform other those using machine learning techniques [21].
Previously, different authors have tried to examine the financial and multivariate analysis of ICICI Bank [22,23].However, no significant work has been carried out to rigorously analyze ICICI Bank's stock using a combination of descriptive statistics and supervised learning techniques.The aggregated motive of this study is to extensively mine and analyze the data of one of the leading private Indian banks (ICICI, Mumbai, India) using descriptive and supervised learning techniques.Statistical techniques have been employed to examine the nature, trend, variation, and distribution of data.The last five years' key figures of ICICI Bank's stock have been examined to assess the economic status of ICICI Bank.Different descriptive statistical measuressuch as mean, standard deviation, variance, skewness, and kurtosis, along with p-values and A-squared values, have been computed for the major attributes of ICICI Bank's stock data.The last twelve years' daily deviation in the opening value of the stock has been recorded and analyzed to examine the intraday status of ICICI Bank's stock.Moreover, ten different classifiers, namely, naïve Bayes; C4.5; random forest; logistic regression; linear discriminant; and linear, quadratic, cubic, fine, and medium Gaussian support vector machines, have been used to classify the intraday status of ICICI Bank's stock data.The performances of the Data 2018, 3, 54 3 of 16 different classifiers have been computed using eight different performance metrics.Additionally, the rate of misclassification, as well as of sensitivity along the F1-score, has also been computed and examined.This study will be beneficial for financial analysts and researchers who wish to extensively mine the financial data of different leading Indian private or government banks.Moreover, it can be of potential interest to quantitative traders.

Related Work
Numerous researchers have strived to classify and forecast the future value of stocks using different statistical, data mining, and soft computing techniques.The stock and stock index has been predicted by trend deterministic data and machine learning by J. Patel [24].Authors compared and contrasted the performance of four different machine learning approaches, that is, SVM (support vector machine), RF (random forest), NB (naïve Bayes), and ANN (artificial neural network), in predicting the future value for Reliance and Infosys, and found that the predictive rate is increased if the different trading parameters are represented as deterministic trend data.Al-Radqidah et al. have predicted the stock price of three different enterprises of the Amman Stock Exchange using ID3 and C4.5 [25].Özorhan MO et al. have employed a hybrid approach based on SVM and GA (Genetic Algorithm) to predict the best currency pair for exchange.Authors have used primary technical indicators for their analysis and found that by mixing the raw data with a technical financial indicator, one is able to achieve more accurate results [26].Khedr et al. have predicted the stock value using news sentiment analysis.Authors classified the results of Yahoo, Microsoft, and Facebook using three different approaches, namely, K-NN, SVM, and naïve Bayes [27].Desai and Gandhi have designed a natural language processing (NLP) module for stock forecasting that uses the online news to determine the future stock value.The NLP was employed to find the polarity of sentences [28].Zhao and Wang have used an outlier data mining technique for stock forecasting.Authors tried to remove the anomalies of the time series approach.Authors found that their method generated better long-term forecasting results for the Chinesemarket [29].Bini and Mathew have used clustering and multiple regression techniques for stock forecasting.The objective of clustering is to find a set of companies where a customer has to invest money in better results.Different indexes like Jaccard, C, Rand, and Silhouette were used to validate the results.In general, the focus was on technical analysis, classification, and prediction only [30].Huang and Gang have devised a kernel manifold learning approach for financial dataset analysis.They found their approach to be useful in improving accuracy.Moreover, the objective criterions provided by the kernel manifold learning approach also assist in depicting and predicting the precise volatility of the stock market [31].Ye and Li have reviewed literature related to the role of big data in the capital market.Authors concluded that internet big data plays a significant role in stock analysis using sentiment analysis.Authors did not found any clear evidence that explicitly supports that the capital market can be predicted using internet big data [32].M. Khashei, Z. Hajirahimi has examined the performance of series and parallel strategies in forecasting financial time series.Authors found that the hybridization of a multilayer perceptron model along with ARIMA produces better results when compared with those of the individual models [33].Nayak and Misra have employed a GA-weighted condensed polynomial neural network (GACPNN).Authors applied GACPNN for five different stock indexes, namely, BSE, DJIA, NASDAQ, FTSE, and TAIEX.The model was validated using the Deibold-Mariano test and found to produce more accurate results [34].

Methodology
Statistics represent a multidisciplinary data exploration approach that has been effectively used in various fields such as engineering, physics, chemistry, economics, finance, commerce, computer science, and so on [35][36][37][38][39].The effective use of different statistical techniques can help in examining the nature, distribution, and trends of data.Descriptive and inferential techniques are significant classes of statistical techniques.Descriptive techniques are aimed at providing aggregated information, that is, they analyze the average and dispersion of data.However, they do not attempt to describe the nature of the population from which the sample has been taken.Rather, they examine the distribution of data.A measure of central tendency, dispersion, skewness, kurtosis, and correlation study are some of the conventional standards of descriptive statistics.Inferential statistics come into picture when one has to analyze the massive amount of data consisting of population size or an order of millions, billions, or even more.With this size of population, it is not feasible to acquire the data for each item of a population.Thus, inferential statistics are used to disclose the nature of the entire population using a sample from the population.Estimation statistics and hypothesis testing are common methods of inferential statistics [40].
Supervised learning (classification) is an important data mining technique that assists in categorizing data into important classes.Classification is a learning process in which a function F n tries to map each instance of data set to tone specific classes.There are two types of classification techniques, namely descriptive and predictive.Descriptive modeling assists in finding a set of features that can effectively be used to recognize different classes, whereas predictive modeling is used to forecast the unknown category of data instances.These are more efficient for binary or nominal classes and are not fit for ordinal classes [41].Preceding research observed that many supervised learning techniques have been used to solve different data mining problems [5,6,41].Some of the dominant classification techniques are briefly introduced in the remaining part of this section.
Naïve Bayes is a conditional probability-based classifier that is highly scalable and gives equal importance to all attributes of the classification problem [42].C4.5 is a decision tree-based technique that employs a top-down recursive divide and conquer approach for data classification [43,44].Random forest is an ensemble-based classifier that can be used for enormous and multifaceted databases for exploration, classification, and prediction [45,46].SVM is one of the discriminative classifiers in which classification is based on the decision planes (multidimensional or hyperplanes) and their boundaries, and is effectively used for both classification and regression.On the basis of the kernel function, SVM can be further categorized as linear, quadratic, cubic, fine Gaussian, and medium Gaussian SVM [47,48].Logistic regression is a binary classification technique that cannot be applied to a problem where there are more than two classes to be classified.It can provide best fit for real-life issues like spam detection, banking, health, and marketing related applications.Unlike logistic regression, linear discriminant analysis (LDA) is a statistical classifier that can be used for data classification problems where data have to be categorized into two or more classes [49,50].
In spite of classification techniques, ICICI Bank's stock data were also examined using distinct statistical measures of central tendency and dispersion.The skewness and kurtosis were analyzed to investigate the trend of different attributes of stock data along with the distribution of data.Additionally, A-squared values and p-values of different attributes of ICICI Bank's stock were also investigated.Furthermore, the data were classified using different classifiers, such asnaïve Bayes; C4.5; random forest; logistic regression; linear discriminant; and linear, quadratic, cubic, fine, and medium Gaussian SVM.

Data Set
The last twelve years' (2007 to 2018) ICICI Bank's stock data extracted from Yahoo finance were extensively analyzed using several statistical and supervised learning techniques.There were 2714 distinct instances, along with seven different attributes.For a precise analysis, the cases comprising missing values were eliminated, and finally, 2706 instances were analyzed.Here, status represents the intraday investment analysis.In a day, if closing value is higher than the opening value of ICICI Bank's stock, then it will be a profitable day for the investor.Otherwise, it will represent a loss for the investor.The data were examined from different statistical perspectives.Moreover, different classifiers have been employed to classify 2714 distinct instances, and their performance was examined on the basis of different parameters.

Results
Different descriptive statistical measures were examined to explore the nature of ICICI Bank's stock data.Table 1 depicts four essential attributes of ICICI Bank's stock data, along with the values of several descriptive statistical measures.Moreover, different classifiers have been employed to classify 2714 distinct instances, and their performance was examined on the basis of different parameters.

Results
Different descriptive statistical measures were examined to explore the nature of ICICI Bank's stock data.From the data, it is also clear that maximum and minimum low-high variations are recorded in 2008 and 2013, respectively.However, the common range of variation lies between 90 and 120.From the data, it is also clear that maximum and minimum low-high variations are recorded in 2008 and 2013, respectively.However, the common range of variation lies between 90 and 120.360.8)was achieved in this year.Additionally, the maximum and minimum variation in opening balance was recorded in the years 2008 and 2013, respectively.Figure 4 represents the summary of descriptive statistical measures for low-high variation of ICICI Bank's stock.The mean, minimum, and maximum values of high-low variation were found to be 6.36, 0, and 42.4,respectively.This means the maximum intraday variation in ICICI Bank's stock lies between 0 and 42.4 Indian rupees.Furthermore, in the last twelve years, the year 2009 seems to be very negative for the sellers as it had the lowest minimum and maximum opening values.However, the variation between minimum and maximum was found to be significant (264.2%).The year 2018 represents a sound state for investors who invested their money in previous years as the maximum opening value (Rs.360.8) was achieved in this year.Additionally, the maximum and minimum variation in opening balance was recorded in the years 2008 and 2013, respectively.Figure 4 represents the summary of descriptive statistical measures for low-high variation of ICICI Bank's stock.The mean, minimum, and maximum values of high-low variation were found to be 6.36, 0, and 42.4,respectively.This means the maximum intraday variation in ICICI Bank's stock lies between 0 and 42.4 Indian rupees.

Evaluation Criterions and Analysis of Different Classifiers
The performance of predictive classification models is based upon the values of correctly and incorrectly classified instances.A confusion matrix represents the performance metrics of classifiers that highlight the number and types of errors made during data classification and are related to the following conditions:

•
Positive instances classified as positive (TP)

•
Positive instances classified as negative (FP)

•
Negative instances classified as negative (TN)

•
Negative instances classified as positive (FN) Some metrics from the confusion matrix, such as accuracy, precision, recall, F1-score, specificity, and sensitivity, can be computed to determine the performance of classifiers from a different perspective.

Summary for Variation in High Low
Figure4.Summary of variation in high-low value of ICICI Bank's stock.

Evaluation Criterions and Analysis of Different Classifiers
The performance of predictive classification models is based upon the values of correctly and incorrectly classified instances.A confusion matrix represents the performance metrics of classifiers that highlight the number and types of errors made during data classification and are related to the following conditions:


Positive instances classified as positive (TP)  Positive instances classified as negative (FP)  Negative instances classified as negative (TN)  Negative instances classified as positive(FN) Some metrics from the confusion matrix, such as accuracy, precision, recall, F1-score, specificity, and sensitivity, can be computed to determine the performance of classifiers from a different perspective.
Accuracy is the most instinctive performance metric that represents the ratio of correctly foretold observation to the total observations, that is, The rate of misclassification is an important measure of classification techniques.The rate of misclassification is based upon three major parameters of classification matrix, namely, true positive, true negative, and a total number of instances.A classifier that has zero rates of misclassification would be perfect and preferred.However, because of the presence of noise in data, it is difficult to find such a type of classifier.Mathematically, the rate of misclassification, which is denoted as err, is computed as Here, TP, TN, and N represent true positive, true negative, and total number of instances, respectively.Accuracy is the most instinctive performance metric that represents the ratio of correctly foretold observation to the total observations, that is, Accuracy = (TP + TN)/TP + TN + FP + FN (1) The rate of misclassification is an important measure of classification techniques.The rate of misclassification is based upon three major parameters of classification matrix, namely, true positive, true negative, and a total number of instances.A classifier that has zero rates of misclassification would be perfect and preferred.However, because of the presence of noise in data, it is difficult to find such a type of classifier.Mathematically, the rate of misclassification, which is denoted as err, is computed as Here, TP, TN, and N represent true positive, true negative, and total number of instances, respectively.Mean absolute error (MAE) represents the magnitude of the average absolute error.Mathematically, F1 denotes the weighted average of recall and precision.It should be noted that a higher value of F1-score does not guarantee that the classifier is performing well.Rather, it depends upon the circumstances.
In this section, the performance of ten different classifiers has been examined in classifying the instance of ICICI Bank's stock data.Tables 2-11 representthe values of different metrics like FP, TP, TN, and FN, along with the number of correctly and incorrectly classified instances, accuracy, precision, recall, and an F1-score of different supervised classifiers in analyzing the data set of ICICI Bank's stock.From Tables 2-11, it is observed that rates of classification of naïve Bayes; C4.5; random forest; logistic regression; linear discriminant; and linear, quadratic, cubic, fine and medium Gaussian support vector machines lie between 47.6% and 48.9%, 53.6% and 53.6%, 53.0% and 53.6%, 99.7% and 99.8%, 53.6% and 53.6%, 98.7% and 99.8%, 91.0% and 93.9%, 75.0% and 91.7%, 78.2% and 79.9%, and 69.6% and 72.4%, respectively.To precisely examine the performance of different classifiers, the K-fold cross-validation mechanism was used.In K-fold validation, initially, the data have to be decomposed into K mutually exclusive equal sized folds or subsets.In 5-fold, the data are decomposed into giving subsets also known as folds (F1, F2, F3, F4, and F5).Testing and training are carried out five times.In the first iteration, the fold F1 acts as the test set and the remaining four subsets as training sets.Similarly, in the second iteration, F2 acts as testing and the remaining subgroups are used for drilling.The process is repeated five times.The data were mined by varying the folds from 5 to 10.The experimental analysis shows that logistic regression, followed by linear SVM, was found to be best suited as a classifier for ICICI Bank's stock analysis.NB, C4.5, RF, LD, and CSVM merely act a random guessing machine.The rate of accuracy achieved using logistic regression lies in between 99.7% and 99.8%.Moreover, this classifier had a higher rate of precision, as well as recall.It was found that naïve Bayes seems to merely guessing machine, as it has the lowest rate of accuracy among all the classifiers.The rate of classification in classifying correct and incorrect instances using naïve Bayes was found to be 47.6% and 48.9%, respectively.In addition, when precision was considered, C4.5 seemed to be the best classifier.
The value of precision accomplished using six to ten cross-fold remained constant and, surprisingly, was 100%.However, this classifier utterly failed to classify true negative cases.Additionally, the rate of identifying false negative cases using naïve Bayes was extremely high.Like accuracy, logistic regression also showed outstanding performance as far as F1 values were concerned.
The rates of correctly and incorrectly classified instances achieved using different classifiers are depicted in Figure 5.       Additinally, the analysis of sensitivty, specificity, F1 score and recall has been presntedin Figure 7.  Additinally, the analysis of sensitivty, specificity, F1 score and recall has been presntedin Figure 7.This MCD problem has been solved using two different statistical techniques, called TOPSIS (technique for order preference by similarity to ideal solution) and WSA (weighted sum approach) [51,52].The ranking of different approaches based upon seven different criterions is presented in Table 13.

Ranking Using MCD Techniques
Table 13.Ranking of different classification approaches using the weighted sum approach (WSA) and the technique for order preference by similarity to ideal solution (TOPSIS).The working of the WSA method is based on the utility maximization principle.It helps in finding the ranks of the alternatives on the basis of their total utility by considering all the chosen criteria.In TOPSIS, d i + and d i − represent the distance of ideal and basal variants.Here, Hj and Dj are the maximum or minimum values corresponding to the ideal or basal distances.

Logistic
Finally, the relative closeness to the ideal solution C i is calculated as mentioned below: In order to get the real picture of predicted rate of return, the ICICI stock data were also predicted using linear and multiple regression.Table 14 represents the difference between the rate of actual and predicted return value obtained using both linear and multiple regression.Here, the rate of return was computed for the month of February 2018.The buy-and-hold time was fixed at one month.From Table 14, it was found that the results obtained using linear regression were more precise when compared with results obtained using multiple regression, as the difference between actual and predicted rate of return was very small for linear regression.

Conclusions
ICICI Bank's stock was substantially examined using different statistical and supervised learning techniques.The large negative variation observed in five years (2008, 2011, 2015, 2016, and 2017) indicates that in these years, a momentous intraday loss was recorded.A negatively skewed representation indicates that the distribution curve is platy curtic and more flat in nature.The lowest minimum and maximum opening values were marked in 2009.Therefore, it seems that the long-term investors who invested their money in this year must have achieved a good rate of return.The year 2018 represented a sound state of ICICI Bank's stock as the maximum opening value of Rs. 360.8 was achieved in this year.Therefore, this year should not be seen as a year of investment.This study can be extended to predict the daily, weekly, and monthly future values of ICICI Bank's stock.Furthermore, it was observed that rates of classification of naïve Bayes; C4.5; random forest; logistic regression; linear discriminant; and linear, quadratic, cubic, fine, and medium Gaussian support vector machines lie between 47.6% and 48.9%, 53.6% and 53.6%, 53.0% and 53.6%, 99.7% and 99.8%, 53.6% and 53.6%, 98.7% and 99.8%, 91.0% and 93.9%, 75.0% and 91.7%, 78.2% and 79.9%, and 69.6% and 72.4%, respectively.The performance of logistic regression was outstanding when compared with other classifiers and this was validated using two different multi-criterion decision problem techniques, namely TOPSIS and WSA The rank generated using TOPSIS and WSA verified the outstanding performance of logistic regression.In addition to this, the average values of major attribute (open, close, low, and high) lie between 198.43 to 204.79.Moreover, based upon the performance of difference classifiers, an innovative and novel ensemble-based classifier can be designed.In linear and multiple regression, as far as the rate of return is concerned, the results produced using linear regression are better than the results obtained using multiple regression.

Figure 1 .
Figure 1.Key figures of ICICI Bank's stock.FY-financial year.

Figure 1 .
Figure 1.Key figures of ICICI Bank's stock.FY-financial year.

Figure 2
depicts the brief and consolidated picture of the major attributes of ICICI Bank's stock data.It is observed that over the last 12 years, the minimum and maximum values of opening and closing balance lie between 47.95 to 360.80 and 47.81 to 362.30, respectively.A significant variation (652.45%) in minimum and maximum opening values of ICICI Bank's stock has been witnessed.Data 2018, 3, x FOR PEER REVIEW 5 of 16

Figure
Figure 2depicts the brief and consolidated picture of the major attributes of ICICI Bank's stock data.It is observed that over the last 12 years, the minimum and maximum values of opening and closing balance lie between 47.95 to 360.80 and 47.81 to 362.30, respectively.A significant variation (652.45%) in minimum and maximum opening values of ICICI Bank's stock has been witnessed.

Figure 3
depicts the daily percentage change recorded over the last 12 years.It is observed that in
Figure 3 depicts the daily percentage change recorded over the last 12 years.It is observed that in the years 2008, 2011, 2015, 2016, and 2017, large numbers of negative daily percentage changes were witnessed.

Figure 4 .
Figure 4. Summary of variation in high-low value of ICICI Bank's stock.

6 )
Sensitivity and specificity are computed to examine the rate of true positive and true negative instances.Mathematically, Sensitivity (TP Rate) = TP/precision and recall can be computed to determine the exactness and completeness property of the classifier.Precision = TP/(TP + FP) (5) Recall = TP/(TP + FN) ( Data 2018, 3, x FOR PEER REVIEW 11 of 16

Figure 5 .
Figure 5. Number of correctly classified instances using different classifiers.

Figure 6
Figure 6 depicts the rate of misclassification of different classifiers.It is observed that naïve Bayes had the highest rate of misclassification, whereas logistic regression and linear SVM were found to have the lowest misclassification rate.

Figure 5 .
Figure 5. Number of correctly classified instances using different classifiers.

Figure 6 Figure 5 .
Figure 6 depicts the rate of misclassification of different classifiers.It is observed that naïve Bayes had the highest rate of misclassification, whereas logistic regression and linear SVM were found to have the lowest misclassification rate.

Figure 6
Figure 6 depicts the rate of misclassification of different classifiers.It is observed that naïve Bayes had the highest rate of misclassification, whereas logistic regression and linear SVM were found to have the lowest misclassification rate.
Data 2018, 3, x FOR PEER REVIEW 12 of 16

Table 1 .
Descriptive statistical measures of ICICI Bank's Stock.It was found that the average value of open, close, low, and high attributes of ICICI Bank's stock lies between 198.43 to 204.79.As per statistical results, the 95% confidence interval range (mean) for open attribute reveals that the average value of open attribute lies between 199.50 to 204.13.From standard deviation, it was found that in the last twelve years, 68% of opening values lie between 140.47 and 263.17, 95% lie between 79.12 and 324.52, and 99.7% lie between 17.77 and 285.87.The negative kurtosis values of open, close, low, and high attributes represent that the distribution curve is platy curtic and more flat in nature.

Table 1 .
Table 1 depicts four essential attributes of ICICI Bank's stock data, along with the values of several descriptive statistical measures.It was found that the average value of open, close, low, and high attributes of ICICI Bank's stock lies between 198.43 to 204.79.As per statistical results, the 95% confidence interval range (mean) for open attribute reveals that the average value of open attribute lies between 199.50 to 204.13.From standard deviation, it was found that in the last twelve years, 68% of opening values lie between 140.47 and 263.17, 95% lie between 79.12 and 324.52, and 99.7% lie between 17.77 and 285.87.The negative kurtosis values of open, close, low, and high attributes represent that the distribution curve is platy curtic and more flat in nature.Descriptive statistical measures of ICICI Bank's Stock.

Table 4 .
Performance analysis of random forest.

Table 5 .
Performance analysis of logistic regression.

Table 7 .
Performance analysis of linear support vector machine (SVM).

Table 8 .
Performance analysis of quadratic SVM.

Table 9 .
Performance analysis of cubic SVM.

Table 10 .
Performance analysis of fine Gaussian SVM.

Table 11 .
Performance analysis of medium Gaussian SVM.

Table 12
represents the summarized performance of different classifiers in classifying ICICI Bank's stock data.It represents a multi-criterion decision problem with ten different approaches having seven different performance criterions.This table has been created from Table4by taking the best possible value from different cross folds.

Table 12
represents the summarized performance of different classifiers in classifying ICICI Bank's stock data.It represents a multi-criterion decision problem with ten different approaches having seven different performance criterions.This table has been created from Table4by taking the best possible value from different cross folds.

Table 14 .
Difference between actual and predicted rate of return.