Establishing a Multiple-Criteria Decision-Making Model for Stock Investment Decisions Using Data Mining Techniques
Abstract
:1. Introduction
- Use decision tree analysis to find easy-to-understand classification rules among the data to construct an investment decision model.
- The decision tree analysis may not be able to see the degree of mutual influence among the variables; therefore, this study joined another algorithm, the Apriori algorithm in association rules, to supplement the explanation of the mutual influence of various variables.
2. Literature Review
2.1. Data Mining
2.2. Decision Tree
2.3. Association Rules
2.4. Stock Investment Using Data Mining Techniques
3. The Methodology
3.1. The Framework of a Stock Investment Decision Model
3.1.1. Data Preparing
3.1.2. Data Analysis
- Decision tree analysis: the pre-processed data are analyzed by using the J48 classifier established by C4.5 algorithm in Weka 3.6. Before performing decision tree analysis, the data are divided into two parts. One part is used as a training set for building the predict model, and the other is used as the test set for testing the accuracy of the model. In the process of data exploration, in order to ensure that the obtained model has better accuracy, a simple verification method was used to divide the original dataset into two parts: approximately 66.67% as training data to build the model, and approximately 33.33% as a test sample to test the accuracy of the model.
- Analysis of association rules: in order to prevent the number of rules generated from being too large, or if the rules do not have substantial meaning, association rules usually need to be performed according to support and confidence to prune. In this study, the Weka 3.6 Apriori algorithm is used to set the threshold of support at 10% and the threshold of confidence at 85%. This algorithm is used to operate based on the threshold of support and the threshold of confidence.
3.1.3. Results Evaluation
3.2. Meanings of Variables
3.3. The Decision Tree Algorithm
3.4. Apriori Algorithm
- Calculate the support of item sets (of size k = 1) in the transactional database (note that support is the frequency of occurrence of an itemset). This is called generating the candidate set.
- Prune the candidate set by eliminating items with a support less than the given threshold.
- Join the frequent itemsets to form sets of size k + 1, and repeat the above sets until no more itemsets can be formed. This will happen when the set(s) formed have a support less than the given support.
4. Case Study
4.1. Descriptive Statistics
4.2. Decision Tree Analysis
4.3. Association Rules
- Association rule 1 indicates that if the return of equity is larger than or equal to the average of the industry, and the operating profit rate is larger than or equal to the average of the industry, then the return of assets is larger than or equal to the average of the industry.
- Association rule 2 indicates that if the return of equity is larger than or equal to the average of the industry, and the return of assets is larger than or equal to the average of the industry, then the operating profit rate is larger than or equal to the average of the industry.
- Association rule 3 indicates that if the return of equity is larger than or equal to the average of the industry, then the return of assets is larger than or equal to the average of the industry, and the operating profit rate is larger than or equal to the average of the industry.
- Association rule 4 indicates that if the return of assets is larger than or equal to the average of the industry, and the company is signed by one of the four large accounting firms, then the operating profit rate is larger than or equal to the average of the industry.
- Association rule 5 indicates that if the return of assets is larger than or equal to the average of the industry, and the operating profit rate is larger than or equal to the average of the industry, then the return of equity is larger than or equal to the average of the industry.
- Association rule 6 indicates that if the return of assets is larger than or equal to the average of the industry, then the return of equity is larger than or equal to the average of the industry, and the operating profit rate is larger than or equal to the average of the industry.
4.4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Ethical Approval
References
- Kane, A.; Marus, A.J.; McDonald, R.L. Debt Policy and the Rate of Return Premium to Leverage. J. Financ. Quant. Anal. 1985, 20, 479–499. [Google Scholar] [CrossRef]
- Demsetz, H.; Villalonga, B. Ownership structure and corporate Performance. J. Corp. Financ. 2001, 7, 209–233. [Google Scholar] [CrossRef]
- Myers, S.C.; Majluf, N.S. Corporate Financing and Investment Decisions When Firms Have Information That Investors Do Not Have. J. Financ. Econ. 1984, 13, 187–221. [Google Scholar] [CrossRef] [Green Version]
- Lang, L.; Ofek, E.; Stulz, R.M. Leverage, Investment, and Firm Growth. J. Financ. Econ. 1996, 40, 3–29. [Google Scholar] [CrossRef] [Green Version]
- Martikainen, T. Stock Returns and Classification Pattern of Firm-Specific Financial Variable: Empirical Evidence with Finnish Data. J. Bus. Financ. Account. 1993, 20, 537–558. [Google Scholar] [CrossRef]
- Huang, M.J.; Sung, H.S.; Hsieh, T.J.; Wu, M.C.; Chung, S.H. Applying data-mining techniques for discovering association rules. Soft Comput. 2019, 24, 8069–8075. [Google Scholar] [CrossRef]
- Bose, I.; Mahapatra, R. Business data mining-A machine learning perspective. Inform. Manag. 2001, 39, 211–225. [Google Scholar] [CrossRef]
- Huang, M.J.; Chen, M.Y.; Lee, S.C. Integrating Data Mining with Case-based Reasoning for Chronic Diseases Prognosis and Diagnosis. Expert Syst. Appl. 2007, 32, 856–867. [Google Scholar] [CrossRef]
- Kopun, D. A review of the research on data mining techniques in the detection of fraud in financial statements. J. Account Manag. 2018, 8, 1–18. [Google Scholar]
- Fayyad, U.M.; Piatstsky-Shapiro, G. From Data Mining to Knowledge Discovery in Databases. AI Mag. 1996, 17, 37–54. [Google Scholar]
- Olson, D.; Shi, Y. Introduction to Business Data Mining; McGraw-Hill/Irwin Englewood Cliffs: New York, NY, USA, 2007. [Google Scholar]
- Kirkos, E.; Spathis, C.; Manolopoulos, Y. Data mining techniques for the detection of fraudulent financial statements. Exp. Syst. Appl. 2007, 32, 995–1003. [Google Scholar] [CrossRef]
- Ladas, A.; Ferguson, E.; Aickelin, U.; Garibaldi, J. A data mining framework to model consumer indebtedness with psychological factors. In Proceedings of the 2014 IEEE International Conference on Data Mining Workshop, Shenzhen, China, 14 December 2014. [Google Scholar]
- Yanga, X.; Lina, X.; Lin, X. Application of Apriori and FP-growth algorithms in soft examination data analysis. J. Intell. Fuzzy Syst. 2019, 37, 425–432. [Google Scholar] [CrossRef]
- Witten, F.; Hall, P. Data Mining Practical Machine Learning Tools and Techniques, 4th ed.; Morgan Kaufmann, Inc.: San Francisco, CA, USA, 2017. [Google Scholar]
- Questier, F.; Put, R.; Coomans, D.; Walczak, B.; Vander, H.Y. The use of CART and multivariate regression trees for supervised and unsupervised feature selection. Chemometer. Intell. Lab. 2005, 76, 45–54. [Google Scholar] [CrossRef]
- Cherfi, A.; Nouira, K.; Ferchich, A. Very Fast C4.5 Decision Tree Algorithm. Appl. Artif. Intell. 2018, 32, 119–137. [Google Scholar] [CrossRef]
- Singh, N.; Singh, P. A novel Bagged Na¨ıve Bayes-Decision Tree approach for multi-class classification problems. J. Intell. Fuzzy Syst. 2019, 36, 2261–2271. [Google Scholar] [CrossRef]
- Chang, N.; Olivia, R.; Sheng, O. Decision-Tree-Based Knowledge Discovery: Single- vs. Multi-Decision-Tree Induction. Inform. J. Comput. 2008, 20, 46–54. [Google Scholar] [CrossRef]
- Alos, A.; Dahrouj, Z. Decision tree matrix algorithm for detecting contextual faults in unmanned aerial vehicles. J. Intell. Fuzzy Syst. 2020, 38, 4929–4939. [Google Scholar] [CrossRef]
- Agrawal, R.; Imielinski, T.; Swami, A. Mining Association Rules between Sets of Items in Large Databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, 26–28 May 1993; Volume 22, pp. 207–216. [Google Scholar]
- Ordonez, C.; Zhao, K. Evaluating association rules and decision trees to predict multiple target attributes. Intell. Data Anal. 2011, 15, 173–192. [Google Scholar] [CrossRef] [Green Version]
- Han, J.; Kamber, M. Data Mining: Concepts and Techniques, 1st ed.; Morgan Kaufmann: San Francisco, CA, USA, 2001. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J.H. The Elements of Statistical Learning, 1st ed.; Springer: New York, NY, USA, 2001. [Google Scholar]
- Li, Z.; Li, L.; Yan, K.; Zhang, C. Automatic image annotation using fuzzy association rules and decision tree. Multi Syst. 2017, 23, 679–690. [Google Scholar] [CrossRef]
- Wang, Q.; Hu, X. Stock Selection in Investment Management of Commercial Stock Market: Prediction by Data Mining. J. Comput. 2019, 30, 260–268. [Google Scholar]
- Chen, C.; Hsieh, C. Actionable Stock Portfolio Mining by Using Genetic Algorithms. J. Inf. Sci. Eng. 2016, 32, 1657–1678. [Google Scholar]
- Ng, K.; Khor, K. StockProF: A stock profiling framework using data mining approaches. Inf. Syst. E-Bus. Manag. 2017, 15, 139–158. [Google Scholar] [CrossRef]
- Drobetz, W.; Wanzenried, G. What Determines the Speed of Adjustment to the Target Capital Structure? Appl. Financ. Econ. 2006, 16, 941–958. [Google Scholar] [CrossRef] [Green Version]
- DeAngelo, L.E. Auditor size and audit quality. J. Account Econ. 1981, 3, 183–199. [Google Scholar] [CrossRef]
- Attar Software Limited XpertRuler Miner: Knowledge from Data; Attar Software Limited Inc.: Manchester, UK, 2002.
Variable Feature | Codename | Variable Name |
---|---|---|
Dependent variable | Return | Annual rate of return |
Independent variable | Price | Stock price |
Year | Years on the market | |
DS&F-holding | Directors, supervisors, and foreign shareholding ratio | |
DS-Pledge | Pledge rate of shares held by directors and supervisors | |
BIG4 | Whether it is signed by one of the four large accounting firms | |
R&D | Research and development expense rate | |
EPS | Earnings per share | |
ROE | Return on equity | |
ROA | Return on assets | |
FA-turnover | Turnover rate of fixed assets | |
INV-turnover | Inventory turnover | |
A/R-turnover | Accounts receivable turnover rate | |
TA-turnover | Total asset turnover | |
OR-growing | Revenue growth rate | |
OE-growing | Net equity growth rate | |
OI | Operating profit rate | |
OP | Gross profit rate | |
DR | Debt ratio | |
CR | Current ratio |
Variable | Sample Number | Max | Min | Average | Standard Deviation |
---|---|---|---|---|---|
Annual rate of return | 133.00 | −0.63 | 2.62 | 0.17 | 0.54 |
Stock price | 133.00 | 2.69 | 281.00 | 37.95 | 45.96 |
Years on the market | 133.00 | 2.00 | 25.00 | 12.80 | 5.44 |
Directors and supervisors and foreign and shareholding ratio | 133.00 | 0.00 | 77.26 | 27.82 | 19.88 |
Pledge rate of shares held by directors and supervisors | 133.00 | 0.00 | 61.84 | 4.88 | 10.60 |
Whether it is signed by one of the 4 large accounting firms | 133.00 | 0.00 | 1.00 | 0.75 | 0.43 |
Research and development expense rate | 133.00 | 0.00 | 6.72 | 1.71 | 1.38 |
Earnings per share | 133.00 | −3.06 | 11.20 | 2.36 | 2.97 |
Return on equity | 133.00 | −36.78 | 39.92 | 9.55 | 13.61 |
Return on assets | 133.00 | −32.84 | 21.87 | 5.18 | 8.02 |
Turnover rate of fixed assets | 133.00 | 0.67 | 77.12 | 6.80 | 10.34 |
Inventory turnover | 133.00 | 1.54 | 30.86 | 5.32 | 3.48 |
Accounts receivable turnover rate | 133.00 | 2.86 | 16.49 | 7.06 | 3.14 |
Total asset turnover | 133.00 | 0.23 | 2.03 | 1.17 | 0.39 |
Revenue growth rate | 133.00 | −46.71 | 680.14 | 12.00 | 61.18 |
Net equity growth rate | 133.00 | −36.27 | 98.44 | 7.27 | 15.92 |
Operating profit rate | 133.00 | −59.66 | 27.57 | 3.11 | 10.96 |
Gross profit rate | 133.00 | −14.84 | 35.36 | 17.14 | 6.31 |
Debt ratio | 133.00 | 2.46 | 74.62 | 46.85 | 13.97 |
Current ratio | 133.00 | 72.91 | 3663.80 | 247.15 | 369.81 |
No | DS&F- Holding | DS- Pledge | Year | OI | FA- Turnover | OP | OE- Growing | Return |
---|---|---|---|---|---|---|---|---|
1 | <45 | - | ≤14 | - | - | - | - | Bad |
2 | <45 | - | >14 | ≤6.48 | ≤6.48 | - | - | Bad |
3 | <45 | - | >14 | ≤6.48 | >6.48 | - | - | Good ≥ ave * |
4 | <45 | - | >14 | >6.48 | - | ≤16.52 | ≤7.52 | Bad |
5 | <45 | - | >14 | >6.48 | - | ≤16.52 | >7.52 | Good ≥ ave |
6 | <45 | - | >14 | >6.48 | - | >16.52 | - | Good ≥ ave |
7 | ≥45 | ≤33 | - | - | - | - | - | Good ≥ ave |
8 | ≥45 | >33 | - | - | - | - | - | bad |
Rule No | Condition | Result | Confidence |
---|---|---|---|
1 | ROE ≥ ave * and OI ≥ ave | ROA ≥ ave | 100% |
2 | ROE ≥ ave and ROA ≥ ave | OI ≥ ave | 100% |
3 | ROE ≥ ave | ROA ≥ ave and OI ≥ ave | 100% |
4 | ROA ≥ ave and BIG4 = Y | OI ≥ ave | 98% |
5 | ROA ≥ ave and OI ≥ ave | ROE ≥ ave | 96% |
6 | ROA ≥ ave | ROE ≥ ave and OI ≥ ave | 94% |
No | Condition | Result | Remark |
---|---|---|---|
1 | DS&F-Holding<45 and year ≤ 14 | Bad | Decision tree rules |
2 | DS&F-Holding<45 and year >14 and OI ≤ 6.48 and FA-turnover ≤ 6.48 | Bad | |
3 | DS&F-Holding<45 and year >14 and OI ≤ 6.48 and FA-turnover >6.48 | Good ≥ ave * | |
4 | DS&F-Holding<45 and year >14 and OI >6.48 and OP ≤ 16.52 and OE-growing ≤ 7.52 | Bad | |
5 | DS&F-Holding<45 and year >14 and OI >6.48 and OP ≤ 16.52 and OE-growing >7.52 | Good ≥ ave | |
6 | DS&F-Holding<45 and year >14 and OI >6.48 and OP >16.52 | Good ≥ ave | |
7 | DS&F-Holding ≥ 45 and DS-Pledge ≤ 33 | Good ≥ ave | |
8 | DS&F-Holding ≥ 45 and DS-Pledge >33 | bad | |
9 | ROE ≥ ave and OI ≥ ave | ROA ≥ ave | Association rules |
10 | ROE ≥ ave and ROA ≥ ave | OI ≥ ave | |
11 | ROE ≥ ave | ROA ≥ ave and OI ≥ ave | |
12 | ROA ≥ ave and BIG4 = Y ** | OI ≥ ave | |
13 | ROA ≥ ave and OI ≥ ave | ROE ≥ ave | |
14 | ROA ≥ ave | ROE ≥ ave and OI ≥ ave |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, K.-C.; Huang, M.-J.; Fu, C.-K.; Wang, K.-H.; Wang, H.-M.; Lin, L.-H. Establishing a Multiple-Criteria Decision-Making Model for Stock Investment Decisions Using Data Mining Techniques. Sustainability 2021, 13, 3100. https://doi.org/10.3390/su13063100
Cheng K-C, Huang M-J, Fu C-K, Wang K-H, Wang H-M, Lin L-H. Establishing a Multiple-Criteria Decision-Making Model for Stock Investment Decisions Using Data Mining Techniques. Sustainability. 2021; 13(6):3100. https://doi.org/10.3390/su13063100
Chicago/Turabian StyleCheng, Kuo-Chih, Mu-Jung Huang, Cheng-Kai Fu, Kuo-Hua Wang, Huo-Ming Wang, and Lan-Hui Lin. 2021. "Establishing a Multiple-Criteria Decision-Making Model for Stock Investment Decisions Using Data Mining Techniques" Sustainability 13, no. 6: 3100. https://doi.org/10.3390/su13063100
APA StyleCheng, K.-C., Huang, M.-J., Fu, C.-K., Wang, K.-H., Wang, H.-M., & Lin, L.-H. (2021). Establishing a Multiple-Criteria Decision-Making Model for Stock Investment Decisions Using Data Mining Techniques. Sustainability, 13(6), 3100. https://doi.org/10.3390/su13063100