# Applied Identification of Industry Data Science Using an Advanced Multi-Componential Discretization Model

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Literature Review

#### 2.1. Applied Financial Ratio from Scientific Data

#### 2.2. Decision Tree Learning

#### 2.3. K-Nearest Neighbors Algorithm

#### 2.4. Applied Ensemble Learning of Stacking Classifier

#### 2.5. Radial Basis Function Network

#### 2.6. Applied Naïve Bayes

## 3. Methods and Materials

#### 3.1. Background of the Applied Study Framework

#### 3.2. Algorithm of the Applied Hybrid Models Proposed

## 4. Experimental Data Analysis and Research Finding

#### 4.1. Empirical Results with Implication

#### 4.2. Rule-based Knowledge Construction

**(1) Rule 1**: => IF X10 > −0.01 THEN Class = P.

**(2) Rule 2**: => IF X10 ≦ −0.01 and X10 ≦ −0.20 THEN Class = N.

**(3) Rule 3**: => IF X10 ≦ −0.01 and X10 > −0.20 and X10 ≦ −0.07 THEN Class = N.

- (1)
**Rule 1**: => IF X10 ≦ −0.01 THEN Class = A, or semantically low EPS.

- (2)
**Rule 2**: => IF −0.01 > X10 ≦ 4.23 THEN Class = B, or semantically medium EPS.

- (3)
**Rule 3**: => IF X10 > 4.23 and X20 ≦ 97.67 THEN Class = B, or semantically medium EPS.

- (4)
**Rule 4**: => IF X10 > 4.23 and X20 > 97.67 THEN Class = C, or semantically high EPS.

#### 4.3. Helpful Research Findings and Management Implications

#### 4.4. Closing the Gap Report from the Applied Hybrid Models Proposed

#### 4.5. Research Limitations of the Study

- (1)
- A significant shortfall of this study is that the sample used only listed companies on the TWSE. Investor choice, such as companies traded over the counter and emerging the stock market, was not considered in this study, and there are 969 such companies currently operating, which is quite large.
- (2)
- Another limitation of this study is that the pool of possible predictors is confined to company financial statements, but, for the work of forecasting EPS, a broader range of information sources may be considered.
- (3)
- The financial ratios and variables were predefined and calculated from financial statements of online databases, and researchers should have professional knowledge of this industry background and environment to know the financial topics better.
- (4)
- The classifiers were limited to DT-C4.5, KNNs, STK, RBFN, and NB, which were used and validated. Future studies may take other classification algorithms in hybrid and stand-alone models into account for a generalized application in the identification of EPS for financial diagnosis.

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Allen, K.D.; Winters, D.B. Auditor response to changing risk: Money market funds during the financial crisis. Rev. Quant. Financ. Account.
**2020**. [Google Scholar] [CrossRef] - Cai, S.; Zhang, J. Exploration of credit risk of P2P platform based on data mining technology. J. Comput. Appl. Math.
**2020**, 372, 112718. [Google Scholar] [CrossRef] - Wang, Z.; Yin, J. Risk assessment of inland waterborne transportation using data mining. Marit. Policy Manag.
**2020**, 47, 633–648. [Google Scholar] [CrossRef] - Wang, G.; Miao, J. Design of data mining algorithm based on rough entropy for US stock market abnormality. J. Intell. Fuzzy Syst.
**2020**, 1–9. [Google Scholar] [CrossRef] - Dimitrakopoulos, S.; Kolossiatis, M. Bayesian analysis of moving average stochastic volatility models: Modeling in-mean effects and leverage for financial time series. Econ. Rev.
**2020**, 39, 319–343. [Google Scholar] [CrossRef] - Muruganandan, S. Testing the profitability of technical trading rules across market cycles: Evidence from India. Colombo Bus. J.
**2020**, 11, 24–46. [Google Scholar] [CrossRef] - Hung, N.H. Various moving average convergence divergence trading strategies: A comparison. Invest. Manag. Financ. Innov.
**2016**, 13, 1–7. [Google Scholar] [CrossRef] - Chahine, S.; Malhotra, N.K. Impact of social media strategies on stock price: The case of Twitter. Eur. J. Mark.
**2018**, 52, 1526–1549. [Google Scholar] [CrossRef] - Cuestas, J.C.; Huang, Y.S.; Tang, B. Does internationalisation increase exchange rate exposure?—Evidence from Chinese financial firms. Int. Rev. Financ. Anal.
**2018**, 56, 253–263. [Google Scholar] [CrossRef] - Mehlawat, M.K.; Kumar, A.; Yadav, S.; Chen, W. Data envelopment analysis based fuzzy multi-objective portfolio selection model involving higher moments. Inf. Sci.
**2018**, 460–461, 128–150. [Google Scholar] [CrossRef] - Choi, H.; Son, H.; Kim, C. Predicting financial distress of contractors in the construction industry using ensemble learning. Expert Syst. Appl.
**2018**, 110, 1–10. [Google Scholar] [CrossRef] - Lu, R.; Wei, Y.C.; Chang, T.Y. The effects and applicability of financial media reports on corporation default ratings. Int. Rev. Econ. Financ.
**2015**, 36, 69–87. [Google Scholar] [CrossRef] - Kadim, A.; Sunardi, N.; Husain, T. The modeling firm’s value based on financial ratios, intellectual capital and dividend policy. Accounting
**2020**, 6, 859–870. [Google Scholar] [CrossRef] - Bagina, R.W. Assessing the financial statement (ratios) of Anglogold-Ashanti Limited, Ghana. Asian J. Econ. Bus. Account.
**2020**, 14, 45–55. [Google Scholar] [CrossRef] [Green Version] - Sriram, M. Do firm specific characteristics and industry classification corroborate voluntary disclosure of financial ratios: An empirical investigation of S&P CNX 500 companies. J. Manag. Gov.
**2020**, 24, 431–448. [Google Scholar] [CrossRef] - Cengiz, H. The relationship between stock returns and financial ratios in Borsa Istanbul analysed by the classification tree method. Int. J. Bus. Emerg. Markets
**2020**, 12, 204–216. [Google Scholar] [CrossRef] - Mita, A.F.; Utama, S.; Fitriany, F.; Wulandari, E.R. The adoption of IFRS, comparability of financial statements and foreign investors’ ownership. Asian Rev. Account.
**2018**, 26, 391–411. [Google Scholar] [CrossRef] - Rawal, B.; Agarwal, R. Improving accuracy of classification based on C4.5 decision tree algorithm using big data analytics. Adv. Intell. Syst. Comput.
**2019**, 711, 203–211. [Google Scholar] - Lee, C.-T.; Horng, S.-C. Abnormality detection of Cast-Resin transformers using the fuzzy logic clustering decision tree. Energies
**2020**, 13, 2546. [Google Scholar] [CrossRef] - Ghasemi, E.; Gholizadeh, H.; Adoko, A.C. Evaluation of rockburst occurrence and intensity in underground structures using decision tree approach. Eng. Comput.
**2020**, 36, 213–225. [Google Scholar] [CrossRef] - Saadatfar, H.; Khosravi, S.; Joloudari, J.H.; Mosavi, A.; Shamshirband, S. A new K-nearest neighbors classifier for big data based on efficient data pruning. Mathematics
**2020**, 8, 286. [Google Scholar] [CrossRef] [Green Version] - Gohari, M.; Eydi, A.M. Modelling of shaft unbalance: Modelling a multi discs rotor using K-Nearest Neighbor and Decision Tree Algorithms. Measurement
**2020**, 151, 107253. [Google Scholar] [CrossRef] - Qaddoura, R.; Faris, H.; Aljarah, I. An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio. Int. J. Mach. Learn. Cybern.
**2020**, 11, 675–714. [Google Scholar] [CrossRef] - Tran, H.Q.; Ha, C. High precision weighted optimum K-Nearest Neighbors algorithm for indoor visible light positioning applications. IEEE Access
**2020**, 8, 114597–114607. [Google Scholar] [CrossRef] - Tjahjadi, H.; Ramli, K. Noninvasive blood pressure classification based on Photoplethysmography using K-Nearest Neighbors algorithm: A feasibility study. Information
**2020**, 11, 93. [Google Scholar] [CrossRef] [Green Version] - Fiorentini, N.; Losa, M. Handling imbalanced data in road crash severity prediction by machine learning algorithms. Infrastructures
**2020**, 5, 61. [Google Scholar] [CrossRef] - Cai, W.; Pan, W.; Liu, J.; Chen, Z.; Ming, Z. k-Reciprocal nearest neighbors algorithm for one-class collaborative filtering. Neurocomputing
**2020**, 381, 207–216. [Google Scholar] [CrossRef] - Ala’raj, M.; Majdalawieh, M.; Abbod, M.F. Improving binary classification using filtering based on k-NN proximity graphs. J. Big Data
**2020**, 7, 15. [Google Scholar] [CrossRef] - Zhang, X.; Han, N.; Qiao, S.; Zhang, Y.; Huang, P.; Peng, J.; Zhou, K.; Yuan, C.-A. Balancing large margin nearest neighbours for imbalanced data. J. Eng.
**2020**, 2020, 316–321. [Google Scholar] [CrossRef] - Prajapati, B.P.; Kathiriya, D.R. A hybrid machine learning technique for fusing fast k-NN and training set reduction: Combining both improves the effectiveness of classification. Adv. Intell. Syst. Comput.
**2019**, 714, 229–240. [Google Scholar] - Jiang, M.; Liu, J.; Zhang, L.; Liu, C. An improved Stacking framework for stock index prediction by leveraging tree-based ensemble models and deep learning algorithms. Phys. A Stat. Mech. Appl.
**2020**, 541, 122272. [Google Scholar] [CrossRef] - Pisula, T. An ensemble classifier-based scoring model for predicting bankruptcy of polish companies in the Podkarpackie Voivodeship. J. Risk Financ. Manag.
**2020**, 13, 37. [Google Scholar] [CrossRef] [Green Version] - Soui, M.; Smiti, S.; Mkaouer, M.W.; Ejbali, R. Bankruptcy prediction using stacked auto-encoders. Appl. Artif. Intell.
**2020**, 34, 80–100. [Google Scholar] [CrossRef] - García, V.; Marqués, A.I.; Sánchez, J.S. Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Inf. Fusion
**2019**, 47, 88–101. [Google Scholar] [CrossRef] - Liang, D.; Tsai, C.F.; Lu, H.Y.R.; Chang, L.S. Combining corporate governance indicators with stacking ensembles for financial distress prediction. J. Bus. Res.
**2020**, 120, 137–146. [Google Scholar] [CrossRef] - Khan, W.; Ghazanfar, M.A.; Azam, M.A.; Karami, A.; Alyoubi, K.H.; Alfakeeh, A.S. Stock market prediction using machine learning classifiers and social media, news. J. Ambient Intell. Hum. Comput.
**2020**. [Google Scholar] [CrossRef] - Saha, M.; Santara, A.; Mitra, P.; Chakraborty, A.; Nanjundiah, R.S. Prediction of the Indian summer monsoon using a stacked autoencoder and ensemble regression model. Int. J. Forecast.
**2020**. [Google Scholar] [CrossRef] - Patil, P.R.; Sivagami, M. Forest cover classification using stacking of ensemble learning and neural networks. In Artificial Intelligence and Evolutionary Computations in Engineering Systems. Advances in Intelligent Systems and Computing; Dash, S., Lakshmi, C., Das, S., Panigrahi, B., Eds.; Springer: Singapore, 2020; Volume 1056, pp. 89–102. [Google Scholar]
- Zheng, S.; Zhao, J. A new unsupervised data mining method based on the stacked autoencoder for chemical process fault diagnosis. Comput. Chem. Eng.
**2020**, 135, 106755. [Google Scholar] [CrossRef] - Liu, H.; Long, Z. An improved deep learning model for predicting stock market price time series. Digital Signal Process.
**2020**, 102, 102741. [Google Scholar] [CrossRef] - Ribeiro, M.H.D.M.; dos Santos Coelho, L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl. Soft Comput.
**2020**, 86, 105837. [Google Scholar] [CrossRef] - Kanazawa, N. Radial basis functions neural networks for nonlinear time series analysis and time-varying effects of supply shocks. J. Macroecon.
**2020**, 64, 103210. [Google Scholar] [CrossRef] - Mansor, M.A.; Mohd Jamaludin, S.Z.; Mohd Kasihmuddin, M.S.; Alzaeemi, S.A.; Md Basir, M.F.; Sathasivam, S. Systematic boolean satisfiability programming in radial basis function neural network. Processes
**2020**, 8, 214. [Google Scholar] [CrossRef] [Green Version] - Teixeira Zavadzki de Pauli, S.; Kleina, M.; Bonat, W.H. Comparing artificial neural network architectures for Brazilian stock market prediction. Ann. Data Sci.
**2020**. [Google Scholar] [CrossRef] - Mirjalili, S. Evolutionary radial basis function networks. Stud. Comput. Intell.
**2019**, 780, 105–139. [Google Scholar] - Buhmann, M.; Jäger, J. Multiply monotone functions for radial basis function interpolation: Extensions and new kernels. J. Approx. Theory
**2020**, 256, 105434. [Google Scholar] [CrossRef] - Karimi, N.; Kazem, S.; Ahmadian, D.; Adibi, H.; Ballestra, L.V. On a generalized Gaussian radial basis function: Analysis and applications. Eng. Anal. Bound. Elem.
**2020**, 112, 46–57. [Google Scholar] [CrossRef] - Soradi-Zeid, S. Efficient radial basis functions approaches for solving a class of fractional optimal control problems. Comput. Appl. Math.
**2020**, 39, 20. [Google Scholar] [CrossRef] - Nabipour, M.; Nayyeri, P.; Jabani, H.; Shahab, S.; Mosavi, A. Predicting stock market trends using machine learning and deep learning algorithms via continuous and binary data: A comparative analysis. IEEE Access
**2020**, 8, 150199–150212. [Google Scholar] [CrossRef] - Vismayaa, V.; Pooja, K.R.; Alekhya, A.; Malavika, C.N.; Nair, B.B.; Kumar, P.N. Classifier based stock trading recommender systems for Indian stocks: An empirical evaluation. Comput. Econ.
**2020**, 55, 901–923. [Google Scholar] [CrossRef] - Bhandare, Y.; Bharsawade, S.; Nayyar, D.; Phadtare, O.; Gore, D. SMART: Stock Market Analyst Rating Technique Using Naive Bayes Classifier. In Proceedings of the 2020 International Conference for Emerging Technology (INCET), Belgaum, India, 5–7 June 2020; pp. 1–4. [Google Scholar]
- Rahul; Sarangi, S.; Kedia, P.; Monika. Analysis of various approaches for stock market prediction. J. Stat. Manag. Syst.
**2020**, 23, 285–293. [Google Scholar] - Ahmed, M.; Sriram, A.; Singh, S. Short term firm-specific stock forecasting with BDI framework. Comput. Econ.
**2020**, 55, 745–778. [Google Scholar] [CrossRef] - Chen, W.; Zhang, S.; Li, R.; Shahabi, H. Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. Sci. Total Environ.
**2018**, 644, 1006–1018. [Google Scholar] [CrossRef] [PubMed] - Nascimento, A.C.A.; Prudêncio, R.B.C.; Costa, I.G. A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinf.
**2016**, 17, 17–46. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Tripathy, A.; Anand, A.; Rath, S.K. Document-level sentiment classification using hybrid machine learning approach. Knowl. Inf. Syst.
**2017**, 1–27. [Google Scholar] [CrossRef] - Shon, H.S.; Batbaatar, E.; Kim, K.O.; Cha, E.J.; Kim, K.-A. Classification of kidney cancer data using cost-sensitive hybrid deep learning approach. Symmetry
**2020**, 12, 154. [Google Scholar] [CrossRef] [Green Version] - Liu, J.; Wang, Y.; Zhang, Y. A novel Isomap-SVR soft sensor model and its application in rotary kiln calcination zone temperature prediction. Symmetry
**2020**, 12, 167. [Google Scholar] [CrossRef] [Green Version] - Taiwan Economic Journal Website. Available online: http://www.tej.com.tw/twsite/Default.aspx?TabId=186 (accessed on 31 January 2020).
- Džeroski, S.; Zenko, B. Is combining classifiers with stacking better than selecting the best one? Mach. Learn.
**2004**, 54, 255–273. [Google Scholar] [CrossRef] [Green Version] - Chen, Y.S. An empirical study of a hybrid imbalanced-class DT-RST classification procedure to elucidate therapeutic effects in uremia patients. Med. Biol. Eng. Comput.
**2016**, 54, 983–1001. [Google Scholar] [CrossRef] - Chen, Y.S. A comprehensive identification-evidence based alternative for HIV/AIDS treatment with HAART in the healthcare industries. Comput. Methods Programs Biomed.
**2016**, 131, 111–126. [Google Scholar] [CrossRef] - Thangavel, K.; Pethalakshmi, A. Dimensionality reduction based on rough set theory: A review. Appl. Soft Comput.
**2009**, 9, 1–12. [Google Scholar] [CrossRef] - Kuang, Y.; Wu, Q.; Shao, J.; Wu, J.; Wu, X. Extreme learning machine classification method for lower limb movement recognition. Cluster Comput.
**2017**, 20, 3051–3059. [Google Scholar] [CrossRef] - Ren, X.; Li, L.; Yu, Y.; Xiong, Z.; Yang, S.; Du, W.; Ren, M. A simplified climate change model and extreme weather model based on a machine learning method. Symmetry
**2020**, 12, 139. [Google Scholar] [CrossRef] [Green Version] - Alabdulwahab, S.; Moon, B. Feature selection methods simultaneously improve the detection accuracy and model building time of machine learning classifiers. Symmetry
**2020**, 12, 1424. [Google Scholar] [CrossRef] - Wu, Q.; Wang, L.; Zhu, Z. Research of pre-stack AVO elastic parameter inversion problem based on hybrid genetic algorithm. Cluster Comput.
**2017**, 20, 3173–3183. [Google Scholar] [CrossRef] - Pal, S.S.; Kar, S. Time series forecasting for stock market prediction through data discretization by fuzzistics and rule generation by rough set theory. Math. Comput. Simul.
**2019**, 162, 18–30. [Google Scholar] [CrossRef] - Balogun, A.O.; Basri, S.; Mahamad, S.; Abdulkadir, S.J.; Almomani, M.A.; Adeyemo, V.E.; Al-Tashi, Q.; Mojeed, H.A.; Imam, A.A.; Bajeh, A.O. Impact of feature selection methods on the predictive performance of software defect prediction models: An extensive empirical study. Symmetry
**2020**, 12, 1147. [Google Scholar] [CrossRef] - Seeja, K.R. Feature selection based on closed frequent itemset mining: A case study on SAGE data classification. Neurocomputing
**2015**, 151, 1027–1032. [Google Scholar] [CrossRef] - Tabassum, H. Enactment ranking of supervised algorithms dependence of data splitting algorithms: A case study of real datasets. Int. J. Comput. Sci. Inf. Technol.
**2020**, 12, 1–8. [Google Scholar] [CrossRef] - Fan, H.; Mark, A.E.; Zhu, J.; Honig, B. Comparative study of generalized born models: Protein dynamics. Proc. Natl. Acad. Sci. USA
**2005**, 102, 6760–6764. [Google Scholar] [CrossRef] [Green Version] - Barber, S. Creating effective load models for performance testing with incomplete empirical data. In Proceedings of the Sixth IEEE International Workshop, Chicago, IL, USA, 11 September 2004; pp. 51–59. [Google Scholar]
- Chen, C.C. A model for customer-focused objective-based performance evaluation of logistics service providers. Asia Pac. J. Mark. Logist.
**2008**, 20, 309–322. [Google Scholar] [CrossRef] - Li, Z.; Gan, S.; Jia, R.; Fang, J. Capture-removal model sampling estimation based on big data. Cluster Comput.
**2017**, 20, 949–957. [Google Scholar] [CrossRef] - Wu, Y.; Guo, Y.; Liu, L.; Huang, N.; Wang, L. Trend analysis of variations in carbon stock using stock big data. Cluster Comput.
**2017**, 20, 989–1005. [Google Scholar] [CrossRef]

**Figure 1.**Flowchart of the applied hybrid models proposed, with corresponding tools in detailed steps.

Model | A | B | C | D | E | |
---|---|---|---|---|---|---|

Stage (Component) | ||||||

1. Data-preprocessing | √ | √ | √ | √ | √ | |

2. Data-discretization | √ | √(1) | √(2) | |||

3. Feature-selection | √ | √(2) | √(1) | |||

4. Measurement of data-split methods | √ | √ | √ | √ | √ | |

5. Classifiers | √ | √ | √ | √ | √ | |

6. Measurement of rule-based knowledge | √ | √ | √ | √ | √ | |

7.Measurement of times of running experiments | √ | √ | √ | √ | √ | |

8. Measurement of time-lag effects | √ | √ | √ | √ | √ | |

9. Measurement of types of classes | √ | √ | √ | √ | √ |

Code | Feature | Type | Min. | Max. | Mean (μ) | S.D. (σ) |
---|---|---|---|---|---|---|

X1 | Year | Symbolic | - | - | - | - |

X2 | Season | Symbolic | - | - | - | - |

X3 | Industrial classification | Symbolic | - | - | - | - |

X4 | Total capital | Numeric | 200,000 | 259,291,239 | 7,721,909.23 | 23,565,808.62 |

X5 | Current liabilities | Numeric | 42,034 | 419,171,745 | 8,008,758.41 | 21,742,627.73 |

X6 | Cash flow ratio | Numeric | −176.05 | 297.83 | 9.09 | 18.18 |

X7 | Net asset value of each share | Numeric | 0.09 | 249.42 | 21.48 | 14.48 |

X8 | Cash flow of each share | Numeric | −23.34 | 37.03 | 0.88 | 1.89 |

X9 | Sales per share | Numeric | −56.6 | 69.90 | 8.05 | 7.30 |

X10 | Operating income per share | Numeric | −17.44 | 28.43 | 0.48 | 1.32 |

X11 | Current ratio | Numeric | 19.93 | 2247.51 | 224.54 | 175.44 |

X12 | Quick ratio | Numeric | 5.42 | 2127.72 | 165.93 | 152.63 |

X13 | Debt ratio | Numeric | 2.70 | 97.95 | 42.04 | 15.82 |

X14 | Accounts receivable turnover ratio | Numeric | −2.63 | 305.33 | 1.80 | 7.90 |

X15 | Inventory turnover ratio | Numeric | −1681.03 | 1274.55 | 4.59 | 75.30 |

X16 | Fixed asset turnover ratio | Numeric | −6.38 | 91.33 | 1.31 | 4.97 |

X17 | Operating income margin | Numeric | −640.32 | 167.52 | 17.66 | 23.38 |

X18 | Return on net asset | Numeric | −84.40 | 45.29 | 4.21 | 10.17 |

X19 | Cash ratio | Numeric | 0 | 20.41 | 0.87 | 1.31 |

X20 | Times interest earned | Numeric | −266,442 | 600,237 | 957.50 | 15,209.79 |

X21 | Interest expense | Numeric | −77,927 | 2,173,528 | 38,994.51 | 130,730.96 |

X22 | Year-on-year percentage total assets | Numeric | −58.02 | 349.27 | 5.62 | 22.05 |

X23 | Total asset turnover ratio | Numeric | −0.55 | 1.49 | 0.19 | 0.10 |

X24 | Net worth turnover ratio | Numeric | −1.06 | 4.37 | 0.37 | 0.27 |

X25 | EPS (two classes and three classes) | Numeric | −11.95 | 25.38 | 0.40 | 1.19 |

Model | A (%) | B (%) | C (%) | D (%) | E (%) | Avg. (%) |
---|---|---|---|---|---|---|

DT-C4.5 | 91.4292 | 91.4292 | 91.1740 | 91.4292 | 91.4292 | 91.3782 |

KNNs | 74.6704 | 88.0476 | 86.2399 | 91.3441 | 91.3441 | 86.3292 |

STK | 72.2033 | 72.2033 | 72.2033 | 72.2033 | 72.2033 | 72.2033 |

RBFN | 74.3301 | 90.0893 | 85.8783 | 91.3016 | 91.3016 | 86.5802 |

NB | 65.3126 | 88.9834 | 55.8911 | 91.3886 | 91.3886 | 78.5929 |

Avg. accuracy | 75.5891 | 86.1506 | 78.3284 | 87.5334 | 87.5334 | 83.0270 |

Model | A (%) | B (%) | C (%) | D (%) | E (%) | Avg. (%) |
---|---|---|---|---|---|---|

DT-C4.5 | 92.1392 | 91.6881 | 92.4613 | 92.0103 | 92.0103 | 92.0618 |

KNNs | 76.4175 | 88.7887 | 86.3402 | 91.9459 | 91.9459 | 87.0876 |

STK | 72.6804 | 72.6804 | 72.6804 | 72.6804 | 72.6804 | 72.6804 |

RBFN | 73.6469 | 91.3015 | 88.6598 | 92.0747 | 92.0747 | 87.5515 |

NB | 69.4588 | 90.6572 | 59.1495 | 92.0103 | 92.0103 | 80.6572 |

Avg. accuracy | 76.8686 | 87.0232 | 79.8582 | 88.1443 | 88.1443 | 84.0077 |

Model | A (%) | B (%) | C (%) | D (%) | E (%) | Avg. (%) |
---|---|---|---|---|---|---|

DT-C4.5 | 90.8762 | 90.9188 | 90.8550 | 90.9188 | 90.9188 | 90.8975 |

KNNs | 73.4794 | 88.2178 | 86.0910 | 90.8550 | 90.8550 | 85.8996 |

STK | 71.2675 | 71.2675 | 71.2675 | 71.2675 | 71.2675 | 71.2675 |

RBFN | 77.3926 | 88.7495 | 88.6431 | 90.5395 | 90.5395 | 87.1728 |

NB | 66.9715 | 86.4738 | 63.6112 | 90.4509 | 90.4509 | 79.5917 |

Avg. accuracy | 75.9974 | 85.1255 | 80.0936 | 86.8063 | 86.8063 | 82.9658 |

Model | A (%) | B (%) | C (%) | D (%) | E (%) | Avg. (%) | Total Avg. (%) |
---|---|---|---|---|---|---|---|

DT-C4.5 | 91.2371 | 91.5573 | 91.5593 | 91.5593 | 91.5593 | 91.4945 | 91.4708 |

KNNs | 75.6443 | 88.2732 | 88.2732 | 91.4948 | 91.4948 | 87.0361 | 86.5881 |

STK | 71.7784 | 71.7784 | 71.7784 | 71.7784 | 71.7784 | 71.7784 | 71.9824 |

RBFN | 78.6727 | 90.3995 | 90.3995 | 91.5593 | 91.5593 | 88.5181 | 87.4557 |

NB | 70.6186 | 89.1108 | 89.1778 | 91.1727 | 91.1727 | 86.2505 | 81.2731 |

Avg. accuracy | 77.5902 | 86.2238 | 86.2376 | 87.5129 | 87.5129 | 85.0155 | 83.7540 |

Total avg. | 76.5113 | 86.1308 | 81.1295 | 87.4992 | 87.4992 | - | - |

**Table 7.**Accuracy mean and standard deviation of the cross-validation method in two classes for the TEJ dataset.

Model | A (%) | B (%) | C (%) | D (%) | E (%) | |||||
---|---|---|---|---|---|---|---|---|---|---|

Mean | S.D. | Mean | S.D. | Mean | S.D. | Mean | S.D. | Mean | S.D. | |

DT-C4.5 | 91.43 | 1.25 | 91.36 | 1.26 | 91.18 | 1.20 | 91.39 | 1.25 | 91.39 | 1.25 |

KNNs | 74.70 | 1.74 | 88.28 | 1.31 | 86.20 | 1.47 | 91.28 | 1.23 | 91.28 | 1.23 |

STK | 72.20 | 0.09 | 72.20 | 0.09 | 72.20 | 0.09 | 72.20 | 0.09 | 72.20 | 0.09 |

RBFN | 74.09 | 2.61 | 90.23 | 1.35 | 86.10 | 1.89 | 91.35 | 1.25 | 91.35 | 1.25 |

NB | 65.17 | 2.23 | 89.11 | 1.32 | 55.89 | 3.40 | 91.44 | 1.18 | 91.44 | 1.18 |

Avg. accuracy | 75.52 | 1.58 | 86.24 | 1.07 | 78.31 | 1.61 | 87.53 | 1.00 | 87.53 | 1.00 |

**Table 8.**Accuracy mean and standard deviation of the percentage-split method in two classes for the TEJ dataset.

Model | A (%) | B (%) | C (%) | D (%) | E (%) | |||||
---|---|---|---|---|---|---|---|---|---|---|

Mean | S.D. | Mean | S.D. | Mean | S.D. | Mean | S.D. | Mean | S.D. | |

DT-C4.5 | 91.32 | 0.60 | 91.15 | 0.51 | 91.16 | 0.77 | 91.26 | 0.57 | 91.26 | 0.57 |

KNNs | 74.58 | 1.13 | 88.48 | 0.79 | 85.86 | 0.63 | 91.09 | 0.66 | 91.09 | 0.66 |

STK | 72.20 | 0.02 | 72.20 | 0.02 | 72.20 | 0.02 | 72.20 | 0.02 | 72.20 | 0.02 |

RBFN | 74.58 | 1.81 | 90.49 | 0.67 | 86.64 | 1.53 | 91.30 | 0.62 | 91.30 | 0.62 |

NB | 65.78 | 2.25 | 89.61 | 0.51 | 55.08 | 3.76 | 91.44 | 0.61 | 91.44 | 0.61 |

Avg. accuracy | 75.69 | 1.16 | 86.39 | 0.50 | 78.19 | 1.34 | 87.46 | 0.50 | 87.46 | 0.50 |

**Table 9.**Accuracy mean and standard deviation of the cross-validation method in three classes for the TEJ dataset.

Model | A (%) | B (%) | C (%) | D (%) | E (%) | |||||
---|---|---|---|---|---|---|---|---|---|---|

Mean | S.D. | Mean | S.D. | Mean | S.D. | Mean | S.D. | Mean | S.D. | |

DT-C4.5 | 91.13 | 1.21 | 91.14 | 1.21 | 91.21 | 1.16 | 91.15 | 1.20 | 91.15 | 1.20 |

KNNs | 74.10 | 2.00 | 88.56 | 1.54 | 86.92 | 1.32 | 91.15 | 1.19 | 91.15 | 1.19 |

STK | 71.74 | 0.08 | 71.74 | 0.08 | 71.74 | 0.08 | 71.74 | 0.08 | 71.74 | 0.08 |

RBFN | 77.06 | 2.27 | 89.69 | 1.24 | 90.96 | 1.17 | 90.86 | 1.26 | 90.86 | 1.26 |

NB | 67.42 | 2.46 | 87.80 | 1.42 | 62.27 | 4.85 | 90.57 | 1.20 | 90.57 | 1.20 |

Avg. accuracy | 76.29 | 1.60 | 85.79 | 1.10 | 80.62 | 1.72 | 87.09 | 0.99 | 87.09 | 0.99 |

**Table 10.**Accuracy mean and standard deviation of the percentage-split method in three classes for the TEJ dataset.

Model | A (%) | B (%) | C (%) | D (%) | E (%) | |||||
---|---|---|---|---|---|---|---|---|---|---|

Mean | S.D. | Mean | S.D. | Mean | S.D. | Mean | S.D. | Mean | S.D. | |

DT-C4.5 | 90.93 | 0.54 | 91.12 | 0.56 | 91.02 | 0.61 | 91.05 | 0.56 | 91.05 | 0.56 |

KNNs | 74.08 | 1.11 | 88.55 | 0.70 | 86.85 | 0.62 | 91.15 | 0.58 | 91.15 | 0.58 |

STK | 71.73 | 0.02 | 71.73 | 0.02 | 71.73 | 0.02 | 71.73 | 0.02 | 71.73 | 0.02 |

RBFN | 77.39 | 1.11 | 90.09 | 0.49 | 90.76 | 0.63 | 90.70 | 0.44 | 90.70 | 0.44 |

NB | 67.94 | 2.11 | 88.39 | 0.47 | 59.09 | 5.70 | 90.56 | 0.60 | 90.56 | 0.60 |

Avg. accuracy | 76.41 | 0.98 | 85.98 | 0.45 | 79.89 | 1.52 | 87.04 | 0.44 | 87.04 | 0.44 |

Model | A (%) | B (%) | C (%) | D (%) | E (%) | Avg. (%) |
---|---|---|---|---|---|---|

T+0 | 91.4292 | 91.4292 | 91.1740 | 91.4292 | 91.4292 | 91.3782 |

T+1 | 83.2508 | 83.8536 | 84.9085 | 84.6932 | 84.6932 | 84.2799 |

T+2 | 78.8851 | 80.5424 | 80.1119 | 80.5424 | 80.5424 | 80.1248 |

T+3 | 76.7857 | 80.1282 | 79.8306 | 80.1282 | 80.1282 | 79.4002 |

Model | A (%) | B (%) | C (%) | D (%) | E (%) | Avg. (%) |
---|---|---|---|---|---|---|

T+0 | 92.1392 | 91.6881 | 92.4613 | 92.0103 | 92.0103 | 92.0618 |

T+1 | 82.9746 | 83.1050 | 84.5401 | 83.4964 | 83.4964 | 83.5225 |

T+2 | 77.8865 | 81.0176 | 79.9087 | 81.0176 | 81.0176 | 80.1696 |

T+3 | 76.6135 | 79.1117 | 79.1117 | 79.1117 | 79.1117 | 78.6121 |

Model | A (%) | B (%) | C (%) | D (%) | E (%) | Avg. (%) |
---|---|---|---|---|---|---|

T+0 | 74.3301 | 90.0893 | 85.8783 | 91.3016 | 91.3016 | 86.5802 |

T+1 | 71.0011 | 84.3272 | 81.0334 | 84.1550 | 54.1550 | 74.9343 |

T+2 | 72.7938 | 80.4348 | 80.3056 | 68.6612 | 68.6612 | 74.1713 |

T+3 | 72.4359 | 79.6016 | 79.0064 | 75.8700 | 75.8700 | 76.5568 |

Model | A (%) | B (%) | C (%) | D (%) | E (%) | Avg. (%) |
---|---|---|---|---|---|---|

T+0 | 73.6469 | 91.3015 | 88.6598 | 92.0747 | 92.0747 | 87.5515 |

T+1 | 72.7984 | 84.2140 | 82.3875 | 83.4964 | 83.4964 | 81.2785 |

T+2 | 72.0809 | 79.8434 | 80.5610 | 77.9517 | 77.9517 | 77.6777 |

T+3 | 75.1561 | 78.1402 | 77.9320 | 76.6135 | 76.6135 | 76.8911 |

**Table 15.**Information of data-discretization for the four features in two classes in the TEJ dataset.

Feature | Cutoff Point | Interval | Linguistic Term | Natural Language | Corresponding Instances |
---|---|---|---|---|---|

X10 | −0.195, −0.005, 0.135, 0.235, 0.385 | 6 | A_1–A_6 | Very low, Low, Medium, Medium high, High, and Very high | 814, 472, 517, 352, 448, 2099 |

X20 | −0.195, −0.005, 0.135, 0.235, 0.385 | 6 | B_1–B_6 | Very low, Low, Medium, Medium high, High, and Very high | 20, 1080, 340, 282, 610, 2370 |

X22 | −0.195, −0.005, 0.135, 0.235 | 5 | C_1–C_5 | Very low, Low, Medium, High, and Very high | 110, 745, 1046, 471, 2330 |

X25 | By expert suggestion | 2 | P and N | Positive and Negative | - |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chen, Y.-S.; Sangaiah, A.K.; Chen, S.-F.; Huang, H.-C.
Applied Identification of Industry Data Science Using an Advanced Multi-Componential Discretization Model. *Symmetry* **2020**, *12*, 1620.
https://doi.org/10.3390/sym12101620

**AMA Style**

Chen Y-S, Sangaiah AK, Chen S-F, Huang H-C.
Applied Identification of Industry Data Science Using an Advanced Multi-Componential Discretization Model. *Symmetry*. 2020; 12(10):1620.
https://doi.org/10.3390/sym12101620

**Chicago/Turabian Style**

Chen, You-Shyang, Arun Kumar Sangaiah, Su-Fen Chen, and Hsiu-Chen Huang.
2020. "Applied Identification of Industry Data Science Using an Advanced Multi-Componential Discretization Model" *Symmetry* 12, no. 10: 1620.
https://doi.org/10.3390/sym12101620