Is It Worth the Effort? Considerations on Text Mining in AI-Based Corporate Failure Prediction
Abstract
:1. Introduction
2. Related Research
3. Data Presentation, Understanding, and Preparation
4. Modeling
5. Evaluation
Evaluated Models | ||||
---|---|---|---|---|
Training Data (See Table 1) | FA | FA + FT | FA + FC | FA + FT + FC |
Precision | 0.9953 | 0.9957 | 0.9964 | 0.9968 |
Recall | 0.6831 | 0.7007 | 0.7200 | 0.7441 |
Balanced Accuracy | 0.7112 | 0.7262 | 0.7541 | 0.7755 |
F1-Score | 0.8102 | 0.8225 | 0.8360 | 0.8521 |
False-Positive-Rate (FPR) | 0.2608 | 0.2482 | 0.2119 | 0.1930 |
False-Negative-Rate (FNR) | 0.3169 | 0.2993 | 0.2800 | 0.2559 |
6. Discussion and Implications
6.1. Contributions to Research and Practice
6.2. Limitations and Future Research Opportunities
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Jones, S. Corporate bankruptcy prediction: A high dimensional analysis. Rev. Account. Stud. 2017, 22, 1366–1422. [Google Scholar] [CrossRef]
- Kloptchenko, A.; Eklund, T.; Karlsson, J.; Back, B.; Vanharanta, H.; Visa, A. Combining data and text mining techniques for analyzing financial reports. Intell. Syst. Account. Financ. Manag. 2004, 12, 29–41. [Google Scholar] [CrossRef]
- Nassirtoussi, A.; Aghabozorgi, S.; Ying Wah, T.; Ngo, D.C.L. Text mining for market prediction: A systematic review. Expert Syst. Appl. 2014, 41, 7653–7670. [Google Scholar] [CrossRef]
- Schumaker, R.P.; Zhang, Y.; Huang, C.-N.; Chen, H. Evaluating sentiment in financial news articles. Decis. Support Syst. 2012, 53, 458–464. [Google Scholar] [CrossRef]
- Kloptchenko, A.; Magnusson, C.; Back, B.; Visa, A.; Vanharanta, H. Mining Textual Contents of Financial Reports. Int. J. Digit. Account. Res. 2004, 4, 1–29. [Google Scholar] [CrossRef]
- Veganzones, D.; Severin, E. Corporate failure prediction models in the twenty-first century: A review. Eur. Bus. Rev. 2021, 33, 204–226. [Google Scholar] [CrossRef]
- Wirth, R.; Hipp, J. CRISP-DM: Towards a standard process model for data mining. In Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, Manchester, UK, 11–13 April 2000; pp. 29–39. [Google Scholar]
- Altman, E.I. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 1968, 23, 589–609. [Google Scholar] [CrossRef]
- Altman, E.I.; Haldemann, R.G.; Narayanan, P. ZETA™ analysis: A new model to identify bankruptcy risk of corporations. J. Bank. Financ. 1977, 10, 29–54. [Google Scholar] [CrossRef]
- Ohlson, J.A. Financial Ratios and the Probabilistic Prediction of Bankruptcy. J. Account. Res. 1980, 18, 109. [Google Scholar] [CrossRef] [Green Version]
- Kirkos, E. Assessing methodologies for intelligent bankruptcy prediction. Artif. Intell. Rev. 2015, 43, 83–123. [Google Scholar] [CrossRef]
- Lohmann, C.; Ohliger, T. Bankruptcy prediction and the discriminatory power of annual reports: Empirical evidence from financially distressed German companies. J. Bus. Econ. 2020, 90, 137–172. [Google Scholar] [CrossRef]
- Caserio, C.; Panaro, D.; Trucco, S. Management discussion and analysis: A tone analysis on US financial listed companies. Manag. Decis. 2020, 58, 510–525. [Google Scholar] [CrossRef]
- Le Maux, J.; Smaili, N. Annual Report Readability And Corporate Bankruptcy. J. Appl. Bus. Res. 2021, 37, 73–80. [Google Scholar] [CrossRef]
- Ajina, A.; Laouiti, M.; Msolli, B. Guiding through the Fog: Does annual report readability reveal earnings management? Res. Int. Bus. Financ. 2016, 38, 509–516. [Google Scholar] [CrossRef]
- Bjornsson, C.H. Readability of Newspapers in 11 Languages. Read. Res. Q. 1983, 18, 480. [Google Scholar] [CrossRef]
- Loughran, T.I.M.; McDonald, B. When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks. J. Financ. 2011, 66, 35–65. [Google Scholar] [CrossRef]
- Loughran, T.I.M.; McDonald, B. Measuring Readability in Financial Disclosures. J. Financ. 2014, 69, 1643–1671. [Google Scholar] [CrossRef]
- Bannier, C.; Pauls, T.; Walter, A. Content analysis of business communication: Introducing a German dictionary. J. Bus. Econ. 2019, 89, 79–123. [Google Scholar] [CrossRef]
- Wambsganss, T.; Engel, C.; Fromm, H. Improving Explainability and Accuracy through Feature Engineering: A Taxonomy of Features in NLP-based Machine Learning. In Proceedings of the ICIS 2021, Austin, TX, USA, 12–15 December 2021. [Google Scholar]
- Zhao, W.; Zhang, G.; Yuan, G.; Liu, J.; Shan, H.; Zhang, S. The Study on the Text Classification for Financial News Based on Partial Information. IEEE Access 2020, 8, 100426–100437. [Google Scholar] [CrossRef]
- van Dijk, B. Amadeus Database. Available online: https://www.bvdinfo.com/de-de/unsere-losungen/daten/international/amadeus (accessed on 3 November 2021).
- Nießner, T.; Gross, D.H.; Schumann, M. Evidential Strategies in Financial Statement Analysis: A Corpus Linguistic Text Mining Approach to Bankruptcy Prediction. J. Risk Financ. Manag. 2022, 15, 459. [Google Scholar] [CrossRef]
- Nießner, T.; Wiederspan, O.; Schumann, M. Consideration of the Use of Language in Corporate Bankruptcy Prediction: A data analysis on German Companies. In Proceedings of the PACIS 2022, Virtual, 5–9 July 2022. [Google Scholar]
- Remus, R.; Quasthoff, U.; Heyer, G. SentiWS—A Publicly Available German-language Resource for Sentiment Analysis. In Proceedings of the 7th International Language Resources and Evaluation, Valletta, Malta, 17–23 May 2010; pp. 1168–1171. [Google Scholar]
- Humpherys, S. Discriminating Fraudulent Financial Statements by Identifying Linguistic Hedging. In Proceedings of the AMCIS 2009 Proceedings, San Francisco, CA, USA, 6–9 August 2009. [Google Scholar]
- Brants, S.; Dipper, S.; Eisenberg, P.; Hansen-Schirra, S.; König, E.; Lezius, W.; Rohrer, C.; Smith, G.; Uszkoreit, H. TIGER: Linguistic Interpretation of a German Corpus. Res. Lang Comput. 2004, 2, 597–620. [Google Scholar] [CrossRef]
- Pamuk, M.; Grendel, R.O.; Schumann, M. Towards ML-based Platforms in Finance Industry—An ML approach to Generate Corporate Bankruptcy Probabilities based on Annual Financial Statements. In Proceedings of the 32nd Australasian Conference on Information Systems, Sydney, Australia, 6–10 December 2021; pp. 1–12. [Google Scholar]
- Brédart, X.; Séverin, E.; Veganzones, D. Human resources and corporate failure prediction modeling: Evidence from Belgium. J. Forecast. 2021, 40, 1325–1341. [Google Scholar] [CrossRef]
- Nießner, T.; Nießner, S.; Schumann, M. Influence of corporate industry affiliation in Financial Business Forecasting: A data analysis concerning competition. In Proceedings of the AMCIS 2022 Proceedings, Minneapolis, MN, USA, 10–14 August 2022. [Google Scholar]
- Statista. Fläche der Deutschen Bundesländer zum 31. Dezember 2020. Available online: https://de.statista.com/statistik/daten/studie/154868/umfrage/flaeche-der-deutschen-bundeslaender/ (accessed on 23 January 2022).
- Zensus. Die Ergebnisse des Zensus. Available online: https://ergebnisse2011.zensus2022.de/datenbank/online/ (accessed on 24 January 2022).
- Eurostat. Aufstellung der Statistischen System der Wirtschaftszweige. Available online: https://ec.europa.eu/eurostat/de/web/products-manuals-and-guidelines/-/ks-ra-07-015 (accessed on 17 February 2022).
- SciKit Learn. Machine Learning in Python. Available online: https://scikit-learn.org/stable/ (accessed on 28 January 2022).
- Fromm, H.; Wambsganss, T.; Söllner, M. Towards a taxonomy of text mining features. In Proceedings of the 27th European Conference on Information Systems, Uppsala, Sweden, 8–14 June 2019; pp. 8–14. [Google Scholar]
Accounting-Based Features (FA) | Text-Related Features | (FT) | Company Features (FC) |
---|---|---|---|
Asset coverage ratio Capital Debt ratio Debt structure Debt-to-Equity ratio EBITDA Equity multiplier Equity ratio Leverage ratio Long-term liabilities Net loss for the year Profit margin Return on Assets ratio Return on Capital Employed ratio Return on Equity ratio Short-term debt ratio Short-term liabilities Working capital ratio | Meta-Information: Publication quarter M Text_length ratio D Time_difference_publication M Character/Word matching (Count): Comma ratio C Excalamation_Marks ratio C Question_Marks ratio C Bankruptcy_words ratio W Chances_words ratio W Opportunities_word ratio W Project_words ratio W Research_words ratio W Risk_words ratio W POS-Tagging (Count): ADJ_ratio W ADP_ratio W ADV_ratio W AUXVERB_ratio W CONJ_ratio W DET_ratio W NOUN_ratio W NUMERAL_ratio PROPNOUN_ratio W VERB_ratio W | Dictionary approaches: Hedging score D (see [24]) SentiWS_Hits_ratio D SentiWS_POS_Hits_ratio D SentiWS_NEG_Hits_ratio D SentiWS_POS_Score_ratio D SentiWS_NEG_Score_ratio D SentiWS_Score D BPW_UNC_ratio W BPW_POS_ratio W BPW_NEG_ratio W Semantic POS-tagging (see [23]): Evidential_Strategies_ratioS Evidential_Strategies_Pos_ratioS Evidential_Strategies_Neg_ratioS POS-Pattern matcher: Bool_Deficit D All_Sentence_ratio S Passive_Sentence_ratio S (see [24]) | Count_branch_offices Count_companies_group Count_employees Count_shareholders Count_subsidiaries Count_CEO_change Count_CEO_last_10_years Company_age Company_size NACE_identification Competitor_density Population_density |
Feature Expressions | ||
---|---|---|
Granularity Level | Character |
|
Word |
| |
Sentence |
| |
Document |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nießner, T.; Nießner, S.; Schumann, M. Is It Worth the Effort? Considerations on Text Mining in AI-Based Corporate Failure Prediction. Information 2023, 14, 215. https://doi.org/10.3390/info14040215
Nießner T, Nießner S, Schumann M. Is It Worth the Effort? Considerations on Text Mining in AI-Based Corporate Failure Prediction. Information. 2023; 14(4):215. https://doi.org/10.3390/info14040215
Chicago/Turabian StyleNießner, Tobias, Stefan Nießner, and Matthias Schumann. 2023. "Is It Worth the Effort? Considerations on Text Mining in AI-Based Corporate Failure Prediction" Information 14, no. 4: 215. https://doi.org/10.3390/info14040215
APA StyleNießner, T., Nießner, S., & Schumann, M. (2023). Is It Worth the Effort? Considerations on Text Mining in AI-Based Corporate Failure Prediction. Information, 14(4), 215. https://doi.org/10.3390/info14040215