Epistemological Considerations of Text Mining: Implications for Systematic Literature Review
Abstract
:1. Introduction
2. The Text Mining (Textual Statistical Analysis)
3. Big Data, Research and the Evolution of Science
4. Systematic Literature Review (SLR)
5. Methodology
5.1. Step 1: Design of the Systematic Literature Review
- Does every single publication talk about a different issue, or are there general topics that appear in the publications?
- Can we simplify the analysis of the literature?
- Are there vocabularies that are specific to each period?
- What are they talking about?
- Can we identify some articles that appear in every period? Which ones?
5.2. Step 2: Developing a Research Plan
5.3. Step 3: Definition of SLR Analysis Criteria
5.4. Data Characterization
6. Results and Discussion
6.1. Example 1: SRL in Sport Sciences (Inequalities and Professionalization of Women’s Sport)
6.2. Example 2: SRL in Health Sciences (Low Back Pain Prevention)
6.3. Example 2: SRL in Multidisciplinary Area (Text Mining)
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Kar, A.K.; Dwivedi, Y.K. Theory building with big data-driven research—Moving away from the “What” towards the “Why”. Int. J. Inf. Manag. 2020, 54, 102205. [Google Scholar] [CrossRef]
- Antons, D.; Grünwald, E.; Cichy, P.; Salge, T. The application of text mining methods in innovation research: Current state, evolution patterns, and development priorities. R&D Manag. 2020, 50, 329–351. [Google Scholar] [CrossRef] [Green Version]
- Törnberg, P.; Törnberg, A. The limits of computation: A philosophical critique of contemporary Big Data research. Big Data Soc. 2018, 5. [Google Scholar] [CrossRef]
- Greenacre, M. Contribution biplots. J. Comput. Graph. Stat. 2013, 22, 107–122. [Google Scholar] [CrossRef]
- Gabriel, K.R. MANOVA biplots for twoway contingency tables. In Recent Advances in Descriptive Multivariate Analysis; Krzanowski, W., Ed.; Clarendon Press: Oxford, UK, 1995; pp. 227–268. [Google Scholar]
- Gower, J.; Lubbe, S.; le Roux, N. Understanding Biplots; Wiley: Chichester, UK, 2011. [Google Scholar]
- Luhn, H.P. The automatic creation of literature abstracts. IBM J. Res. Dev. 1958, 2, 159–165. [Google Scholar] [CrossRef] [Green Version]
- McCarthy, J. Epistemological problems of artificial intelligence. In Readings in Artificial Intelligence; Elsevier: Amsterdam, The Netherlands, 1981; pp. 459–465. [Google Scholar] [CrossRef] [Green Version]
- Ng, C.; Alarcon, J. Text mining. In Artificial Intelligence in Accounting: Practical Applications; Routledge: Oxfordshire, UK, 2020; pp. 46–70. [Google Scholar]
- Avasthi, S.; Chauhan, R.; Acharjya, D.P. Techniques, applications, and issues in mining large-scale text databases. In Advances in Information Communication Technology and Computing; Springer: New York, NY, USA, 2020; pp. 385–396. [Google Scholar]
- Chambua, J.; Niu, Z. Review text based rating prediction approaches: Preference knowledge learning, representation and utilization. Artif. Intell. Rev. 2021, 54, 1171–1200. [Google Scholar] [CrossRef]
- Malhotra, N.K.; Rush Charles, B.; Uslay, C. Correspondence analysis. Methodological perspectives, issues, and applications. In Review of Marketing Research (Review of Marketing Research, Vol. 1); Emerald Group Publishing Limited: Bingley, UK, 2005; pp. 285–316. [Google Scholar]
- Benzécri, J.P. L’Analyse des Données: L’Analyse des Correspondances; Dunod: Paris, Fance, 1973. [Google Scholar]
- Benzécri, J.P. Statistical analysis as a tool to make patterns emerge from data. In Methodologies of Pattern Recognition; Watanabe, S., Ed.; Academic Press: New York, NY, USA, 1969; pp. 35–74. [Google Scholar]
- Benzécri, J.P. Correspondence Analysis Handbook; Dekker: New York, NY, USA, 1992. [Google Scholar]
- Lebart, L.; Morineau, A.; Warwick, K. Multivariate Descriptive Statistical Analysis: Correspondence Analysis and Related Techniques for Large Matrices; Wiley: New York, NY, USA, 1984. [Google Scholar]
- Lebart, L.; Salem, A. Statistique Textuelle; Dunod: Paris, France, 1994. [Google Scholar]
- Lebart, L.; Salem, A.; Berry, L. Exploring Textual Data; Springer: Dordrecht, The Netherlands, 2011. [Google Scholar]
- Reinert, M. Classification descendante hiérarchique et analyse lexicale par contexte: Application au corpus des poésies d’A. Rimbaud. Bull. Méthodol. Sociol. 1987, 13, 53–90. [Google Scholar] [CrossRef]
- Reinert, M. Alceste une méthodologie d’analyse des données textuelles et une application: Aurelia De Gerard De Nerval. Bull. Méthodol. Sociol. 1990, 26, 24–54. [Google Scholar] [CrossRef]
- Reinert, M. Proposition d’une méthodologie d’analyse des données séquentielles. Bull. la Société Française pour l’Etude du Comport. Anim. 1991, 1, 53–60. [Google Scholar]
- Reinert, M. Système Alceste: Une méthodologie d’analyse des données textuelles. In JADT 1990; Polytechnic University of Catalonia: Barcelona, Spain, 1992; pp. 144–161. [Google Scholar]
- Osuna, Z. Contribuciones al Análisis de Datos Textuales. Ph.D. Thesis, Universidad de Salamanca, Salamanca, Spain, 2006. [Google Scholar]
- Gabriel, K.R. The biplot graphic display of matrices with application to principal component analysis. Biometrika 1971, 58, 453–467. [Google Scholar] [CrossRef]
- Dalud-Vincent, M. Alceste comme outil de traitement d’entretiens semi-directifs: Essai et critiques pour un usage en sociologie. Lang. Société 2011, 135, 9. [Google Scholar] [CrossRef]
- Caballero-Julia, D.; Vicente, M.P.; Galindo, M.P. Grupos de discusión y HJ-biplot: Una nueva forma de análisis textual. RISTI Rev. Ibérica Sist. Technol. Inf. 2014, 2, 19–36. [Google Scholar] [CrossRef]
- Galindo, M.P.; Cuadras, C.M. Una Extensión del Método Biplot y su Relación con Otras Técnicas; Universidad de Barcelona: Barcelona, Spain, 1986. [Google Scholar]
- Galindo, M.P. Contribuciones a la Representación Simultánea de Datos Muldimensionales. Ph.D. Thesis, Universidad de Salamanca, Salamanca, Spain, 1985. [Google Scholar]
- Galindo, M.P. An alternative for simultaneous representation: HJ-biplot. Questiió 1986, 10, 13–23. [Google Scholar]
- Martin, A.; Adelé, S.; Reutenauer, C. Stratégies du voyageur: Analyse croisée d’entretiens semi-directifs. In Proceedings of the 13ème Journées internationales d’Analyse statistique des Données Textuelles, Nice, France, 13–15 June 2016. [Google Scholar]
- Heiden, S.; Magué, J.-P.; Pincemin, B. TXM: Une plateforme logicielle open-source pour la textométrie—Conception et développement. In Proceedings of the 10th International Conference on the Statistical Analysis of Textual Data—JADT 2010, Rome, Italy, 6–11 June 2010. [Google Scholar]
- Ratinaud, P. Amélioration de la précision et de la vitesse de l’algorithme de classification de la méthode Reinert dans IRaMuTeQ. In Proceedings of the 14th International Conference on Statistical Analysis of Textual Data, Rome, Italy, 12–14 June 2018. [Google Scholar]
- Bécue-Bertaut, M. Analyse Textuelle Avec R; Presses Universitaires de Rennes: Rennes, France, 2018. [Google Scholar]
- Paveau, M.-A. L’alternative quantitatif/qualitatif à l’épreuve des univers discursifs numériques. Corela 2014. [Google Scholar] [CrossRef] [Green Version]
- Merriam, S.; Tisdell, E. Qualitative Research: A Guide to Design and Implementation; Jossey-Bass: San Francisco, CA, USA, 2015. [Google Scholar]
- Dumez, H. Comprehensive Research: A Methodological and Epistemological Introduction to Qualitative Research; Business School Press: Copenhagen, Denmark, 2016. [Google Scholar]
- Wu, X.; Zhu, X.; Wu, G.Q.; Ding, W. Data mining with big data. IEEE Trans. Knowl. Data Eng. 2014, 26, 97–107. [Google Scholar] [CrossRef]
- Kune, R.; Konugurthi, P.K.; Agarwal, A.; Chillarige, R.R.; Buyya, R. The anatomy of big data computing. Softw. Pract. Exp. 2016, 46, 79–105. [Google Scholar] [CrossRef] [Green Version]
- Snyder, H. Literature review as a research methodology: An overview and guidelines. J. Bus. Res. 2019, 104, 333–339. [Google Scholar] [CrossRef]
- The World Bank. Scientific and Technical Journal Articles. 2021. Available online: https://data.worldbank.org/indicator/IP.JRN.ARTC.SC?end=2018&start=2000&view=chart (accessed on 2 June 2021).
- Boyd, D.; Crawford, K. Critical questions for big data. Inf. Commun. Soc. 2012, 15, 662–679. [Google Scholar] [CrossRef]
- Favaretto, M.; De Clercq, E.; Schneble, C.O.; Elger, B.S. What is your definition of big data? Researchers’ understanding of the phenomenon of the decade. PLoS ONE 2020, 15, e0228987. [Google Scholar] [CrossRef]
- Munn, Z.; Peters, M.D.J.; Stern, C.; Tufanaru, C.; McArthur, A.; Aromataris, E. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med. Res. Methodol. 2018, 18, 1–7. [Google Scholar] [CrossRef]
- Higgins, J.; Green, S. (Eds.) Cochrane Handbook for Systematic Reviews of Interventions; 5.1.0; The Cochrane Collaboration: London, UK, 2011. [Google Scholar]
- Pranckutė, R. Web of Science (WoS) and Scopus: The titans of bibliographic information in today’s academic world. Publications 2021, 9, 12. [Google Scholar] [CrossRef]
- Clarivate. Web of Science Platform. Available online: https://clarivate.libguides.com/webofscienceplatform/coverage (accessed on 9 June 2021).
- Elsevier. Content Coverage Guide. 2020. Available online: https://www.elsevier.com/__data/assets/pdf_file/0007/69451/Scopus_ContentCoverage_Guide_WEB.pdf (accessed on 9 June 2021).
- Elsevier. Scopus. Available online: https://www.scopus.com/ (accessed on 9 June 2021).
- Caballero-Julia, D. El HJ-Biplot como Herramienta en el Análisis de Grupos de Discusión; Repositorio Institucional Gredos de la Universidad de Salamanca: Salamanca, Spain, 2011. [Google Scholar]
- Osuna, Z.; Galindo-Villardon, M.P.; Martin-Vallejo, J. Análisis estadístico de datos textuales. Aplicación al estudio de las declaraciones del Libertador Simón Bolívar. Aled Rev. Latinoam. Estud. del Discurso 2004, 4, 55–62. [Google Scholar] [CrossRef]
- Gabriel, K.R. Analysis of meteorological data by means of canonical decomposition and Biplots. J. Appl. Meteorol. 1972, 11, 1071–1077. [Google Scholar] [CrossRef] [Green Version]
- Amaro, I.R.; Vicente, J.L.; Galindo, M.P. MANOVA biplot para arreglos de tratamientos con dos factores basado en modelos lineales generales multivariantes. Interciencia 2004, 29, 26–32. [Google Scholar]
- Caballero-Julia, D.; Galindo, M.P.; Garcia, M.-C. JK-meta-biplot y STATIS dual como herramientas de análisis de tablas textuales múltiples. RISTI—Rev. Iber. Sist. e Tecnol. Inf. 2017, 25, 18–33. [Google Scholar] [CrossRef] [Green Version]
- Vicente, J.L.; Galindo, M.P.; Avila, C.; Fernandez, M.J.; Martín, J.; Bacala, N. JK-META-BIPLOT: Una alternativa al método statis para el estudio espacio temporal de ecosistemas. In Proceedings of the Conferencia Internacional de Estadística en Estudios Medioambientales, Universidad de Cádiz, Cádiz, Spain, 21–23 November 2001. [Google Scholar]
- Varas, M.J.; Vicente, S.; Molina, E.; Vicente, J.L. Role of canonical biplot method in the study of building stones: An example from spanish monumental heritage. Environmetrics 2005, 16, 405–419. [Google Scholar] [CrossRef]
- Vicente, J.L. MULTBIPLOT: A Package for Multivariate Analysis Using Biplots; Departamento de Estadística, Universidad de Salamanca: Salamanca, Spain, 2014. [Google Scholar]
- Gallagher, R.J.; Frank, M.R.; Mitchell, L.; Schwartz, A.J.; Reagan, A.J.; Danforth, C.M.; Dodds, P.S. Generalized word shift graphs: A method for visualizing and explaining pairwise comparisons between texts. EPJ Data Sci. 2021, 10, 4. [Google Scholar] [CrossRef]
- Nelson, L.K.; Burk, D.; Knudsen, M.; McCall, L. The future of coding: A comparison of hand-coding and three types of computer-assisted text analysis methods. Sociol. Methods Res. 2021, 50, 202–237. [Google Scholar] [CrossRef] [Green Version]
Perspective | Distributive (Quantitative) | Structural (Qualitative) |
---|---|---|
Epistemology | Facts | Meaning units |
Nature of knowledge | Analyse patterns and correlations | Analyse discourses and associated social meaning |
Object of analysis | Study the distribution of phenomena Positivism | Studying the relationship between phenomena Constructivism |
Method | Deductive | Inductive |
Data collection techniques and instruments | Questionnaires, formal statistics, etc. | Focus groups, interviews, life stories … |
Specifications for SRL | Professionalization of Women’s Sport | Inequalities in Women’s Sport |
---|---|---|
Database | Web of Science Scopus | Web of Science Scopus |
Keywords | Sport; Women; Professional | Sport; Women; Inequality |
Periods | All years to present | All years to present |
Exclusion criteria | The physical aspect of a woman Woman’s body The physiology of women The media coverage of sports Physical performance Women-specific workouts Sports nutrition Professions other than sports Incomplete bibliographic information (abstract, year of publication). | Inequalities in sport outside women’s sport. Gender inequalities outside sport. Women’s sport excluding inequality. Incomplete bibliographic information (abstract, year of publication). |
Number of articles obtained | 1283 (1257 WoS; 26 Scopus) | 384 (179 WoS; 205 Scopus) |
Number of articles retained | 128 (115 WoS; 13 Scopus) | 165 (70 WoS; 94 Scopus) |
Specifications for SRL | Text Mining | Low Back Pain Prevention |
---|---|---|
Database | Web of Science | Web of Science Scopus Pubmed |
Keywords | Text Mining | Low Back Pain Education Prevention |
Periods | The last five years (2017–2021) | All years to present |
Exclusion criteria | All the articles not using or not describing any text mining technique Not an article document Articles without an abstract or year of publication | Medical treatment of pain Unrelated articles Articles without an abstract or year of publication |
Number of articles obtained | 6053 references where 3690 for articles | 141 |
Number of articles retained | 3521 | 77 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Caballero-Julia, D.; Campillo, P. Epistemological Considerations of Text Mining: Implications for Systematic Literature Review. Mathematics 2021, 9, 1865. https://doi.org/10.3390/math9161865
Caballero-Julia D, Campillo P. Epistemological Considerations of Text Mining: Implications for Systematic Literature Review. Mathematics. 2021; 9(16):1865. https://doi.org/10.3390/math9161865
Chicago/Turabian StyleCaballero-Julia, Daniel, and Philippe Campillo. 2021. "Epistemological Considerations of Text Mining: Implications for Systematic Literature Review" Mathematics 9, no. 16: 1865. https://doi.org/10.3390/math9161865
APA StyleCaballero-Julia, D., & Campillo, P. (2021). Epistemological Considerations of Text Mining: Implications for Systematic Literature Review. Mathematics, 9(16), 1865. https://doi.org/10.3390/math9161865