Quartz: A Template for Quantitative Corpus Data Visualization Tools
Abstract
:1. Introduction
1.1. Recurrent Challenges with Visualizing Corpus Data
- Comparing large, complex datasets (including multiple corpora);
- Filtering data with multiple variables;
- Utilizing a variety of visualization types, from bar charts to innovative methods;
- Balancing easy interpretation with the limits of visualization types;
- Increasing the interactivity of visualizations;
- Speeding up hypothesis development and testing;
- Reducing technical barriers to users;
- Linking text and quantitative data in interfaces;
- Increasing the standardization of formats and procedures;
- Incorporating statistical procedures;
- Connecting several tools, i.e., via application programming interface (API);
- Developing publicly available tools adaptable to other projects.
1.2. Analysis and Visualization Goals for the Humanitarian Encyclopedia
- Integration with at least one corpus management system, keeping visualization available throughout data exploration and analysis;
- Visualization as a modality of data exploration, where clicking quantitative data points opens the corresponding query and concordance viewer;
- Automatic drawing of visualizations with every query, utilizing a corpus software’s full query syntax;
- Comparing multiple queries with side-by-side visualizations;
- Querying any corpus available to the software, with comparisons of multiple corpora based on comparable text types;
- Filtering of data points (corpus attribute values), as well as cross-filtering (i.e., viewing the frequency of a query by attribute B within a point of attribute A);
- Open-source, modular design that allows for the addition of new visualization types and adaptation to new research avenues;
- Portability, to facilitate implementation on multiple servers;
- Navigation to any visualization via URL query string;
- Exportation of visualized data in tabular format;
- Incorporation of frequency statistics (absolute, relative, etc.);
- Features for advanced and lay users in one interface.
2. Materials and Methods
lemma_lc=“against”][lc=“women” | lemma_lc=“women”]|[lc=“VAW” |
lemma_lc=“VAW”]
3. Results
- “there is no single definition of localization”;
- “localization has become the buzzword in 2017”;
- “the localization agenda is a Pandora’s Box of issues”;
- “not only is localisation a vague concept, it is also an ongoing and difficult debate”;
- “there is not yet a globally accepted definition of aid localization”;
- “the fundamental issue is between a technical or a political interpretation”;
- “the ongoing debate over localisation is complicated by the multitude of different understandings of the concept”;
- “could potentially create or worsen tensions between local and international actors”;
- “‘localisation’ as ‘decentralisation’ actually turns into an incentive to accelerate the ‘multi-nationalisation’ of INGOs”;
- “localization is counterproductive and likely to produce sub-optimal results for the effective delivery of aid to people in need of immediate relief” in armed conflicts”.
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Correction Statement
1 | https://github.com/engisalor/quartz, accessed on 9 December 2023. |
2 | https://humanitarianencyclopedia.org, accessed on 9 December 2023. |
References
- Anthony, Laurence. 2018. Visualization in corpus-based discourse studies. In Corpus Approaches to Discourse: A Critical Review. Edited by Charlotte Taylor and Anna Marchi. London: Routledge, pp. 197–224. [Google Scholar]
- Anthony, Laurence. 2022. What can corpus software do? In The Routledge Handbook of Corpus Linguistics, 2nd ed. Edited by Anne O’Keeffe and Michael J. McCarthy. London: Routledge, pp. 103–125. [Google Scholar]
- Anthony, Laurence. 2023. AntConc (4.2.4) [Computer Software]. Waseda University. Available online: https://www.laurenceanthony.net/software/antconc/ (accessed on 19 February 2024).
- Anthony, Laurence, and Stefan Evert. 2019. Embracing the concept of data interoperability in corpus tools development. Paper presented at the Corpus Linguistics 2019 Conference, Cardiff, UK, July 23–27. [Google Scholar]
- Barbelet, Veronique, Gemma Davies, Josie Flint, and Eleanor Davey. 2021. Interrogating the Evidence Base on Humanitarian Localisation: A Literature Study. HPG Literature Review. London: ODI. [Google Scholar]
- Brezina, Vaclav, and William Platt. 2023. #LancsBox X (3.0.0) [Computer Software]. Lancaster University. Available online: https://lancsbox.lancs.ac.uk/ (accessed on 19 February 2024).
- Caple, Helen, Laurence Anthony, and Monika Bednarek. 2019. Kaleidographic: A data visualization tool. International Journal of Corpus Linguistics 24: 245–61. [Google Scholar] [CrossRef]
- Chambó, Santiago, and Pilar León-Araúz. 2021. Visualising lexical data for a corpus-driven encyclopaedia. Paper presented at the 2021 Electronic Lexicography in the 21st Century Conference (eLex 2021), virtual, July 5–7; pp. 29–55. [Google Scholar]
- Chambó, Santiago, and Pilar León-Araúz. 2023a. Corpus-driven conceptual analysis of epidemic and coronavirus for the Humanitarian Encyclopedia: A case study. Terminology 29: 180–223. [Google Scholar] [CrossRef]
- Chambó, Santiago, and Pilar León-Araúz. 2023b. Operationalising and representing conceptual variation for a corpus-driven encyclopaedia. Paper presented at the 2023 Electronic Lexicography in the 21st Century conference (eLex 2023), Brno, Czech Republic, June 27–29; pp. 587–612. [Google Scholar]
- Davies, Mark. 2020. English-Corpora.org: A Guided Tour. Available online: https://www.english-corpora.org/pdf/english-corpora.pdf (accessed on 19 February 2024).
- Fabre, Cyprien, and Manu Gupta. 2017. Localising the Response: Putting Policy into Practice. The Commitments into Action Series; Paris: OECD, World Humanitarian Summit. [Google Scholar]
- Gries, Stefan Th. 2022. How to use statistics in quantitative corpus analysis. In The Routledge Handbook of Corpus Linguistics, 2nd ed. Edited by Anne O’Keeffe and Michael J. McCarthy. London: Routledge, pp. 168–81. [Google Scholar]
- Isaacs, Loryn. 2022. Sketch Grammar Explorer (v0.5.5) [Computer Software]. Zenodo. Available online: https://zenodo.org/records/6812335 (accessed on 9 December 2023).
- Isaacs, Loryn. 2023. Humanitarian reports on ReliefWeb as a domain-specific corpus. Paper presented at the 2023 Electronic Lexicography in the 21st Century conference (eLex 2023), Brno, Czech Republic, June 27–29; pp. 248–69. [Google Scholar]
- Isaacs, Loryn, and Pilar León-Araúz. 2023. Aggregating and visualizing collocation data for humanitarian concepts. Paper presented at the 2nd International Conference on Multilingual Digital Terminology Today (MDTT 2023), Lisbon, Portugal, June 29–30. [Google Scholar]
- Jakubíček, Miloš, Adam Kilgarriff, Diana McCarthy, and Pavel Rychlý. 2010. Fast syntactic searching in very large corpora for many languages. Paper presented at the 24th Pacific Asia Conference on Language, Information and Computation (PACLIC 2010), Sendai, Japan, November 4–7; pp. 741–47. [Google Scholar]
- Kilgarriff, Adam. 2005. Language is never, ever, ever, random. Corpus Linguistics and Linguistic Theory 1: 263–76. [Google Scholar] [CrossRef]
- Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. The Sketch Engine: Ten years on. Lexicography 1: 7–36. [Google Scholar] [CrossRef]
- Koplenig, Alexander. 2019. Against statistical significance testing in corpus linguistics. Corpus Linguistics and Linguistic Theory 15: 321–46. [Google Scholar] [CrossRef]
- Larsson, Tove, Jesse Egbert, and Douglas Biber. 2022. On the status of statistical reporting versus linguistic description in corpus linguistics: A ten-year perspective. Corpora 17: 137–57. [Google Scholar] [CrossRef]
- Luz, Saturnino, and Shane Sheehan. 2020. Methods and visualization tools for the analysis of medical, political and scientific concepts in Genealogies of Knowledge. Palgrave Communications 6: 1–20. [Google Scholar] [CrossRef]
- National Laboratory for Digital Heritage, Eötvös Loránd University Department of Digital Humanities. 2023. NoSketch Engine Docker (5.0.0) [Computer Software]. Available online: https://github.com/ELTE-DH/NoSketch-Engine-Docker (accessed on 19 February 2024).
- Rayson, Paul. 2018. Increasing interoperability for embedding corpus annotation pipelines in Wmatrix and other corpus retrieval tools. Paper presented at the LREC 2018 Workshop: 6th Workshop on the Challenges in the Management of Large Corpora, Miyazaki, Japan, May 7; pp. 33–36. [Google Scholar]
- Rayson, Paul, John Mariani, Bryce Anderson-Cooper, Alistair Baron, David Gullick, Andrew Moore, and Steve Wattam. 2016. Towards interactive multidimensional visualisations for corpus linguistics. Journal for Language Technology and Computational Linguistics 31: 27–49. [Google Scholar] [CrossRef]
- Roepstorff, Kristina. 2020. A call for critical reflection on the localisation agenda in humanitarian action. Third World Quarterly 41: 284–301. [Google Scholar] [CrossRef]
- Rychlỳ, Pavel. 2007. Manatee/Bonito—A modular corpus manager. Paper presented at the First Workshop on Recent Advances in Slavonic Natural Language Processing (RASLAN 2007), Brno, Czech Republic, December 14–16; pp. 65–70. [Google Scholar]
- Säily, Tanja, and Jukka Suomela. 2017. types2: Exploring word-frequency differences in corpora. In Studies in Variation, Contacts and Change in English 19. Edited by Turo Hiltunen, Joe McVeigh and Tanja Säily. Helsinki: VARIENG, vol. 19. [Google Scholar]
- Shamoug, Aladdin, Stephen Cranefield, and Grant Dick. 2023. SEmHuS: A semantically embedded humanitarian space. Journal of International Humanitarian Action 8: 3. [Google Scholar] [CrossRef] [PubMed]
- Tamagnone, Nicolò, Selim Fekih, Ximena Contla, Nayid Orozco, and Navid Rekabsaz. 2023. Leveraging domain knowledge for inclusive and bias-aware humanitarian response entry classification. Paper presented at the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI 2023), Macau, China, August 19–25; pp. 6219–27. [Google Scholar] [CrossRef]
he19: name: HE color: “#AB63FA” comparable: - class.DATE - class.TYPE - class.ORGANIZATION_TYPE label: class.DATE: year class.ID: id class.ORGANIZATION_SUBTYPE: organization subtype class.ORGANIZATION_TYPE: organization type class.REGION: publication region class.TYPE: format exclude: - doc.filename - doc.wordcount - doc.id |
RW_EN | HE | Statistic |
---|---|---|
24 | 14 | Attributes (count of year values) |
20,125 | 551 | Corpus absolute frequency |
20,125 | 551 | Sum of attribute frequency |
10.15 | 6.49 | Corpus FPM |
74.17 | 110.13 | Mean relative frequency % |
7.53 | 7.14 | Mean relative text type FPM |
0.42 | 0.46 | Mean FPM |
838.54 | 39.36 | Mean absolute frequency |
There is no single definition of ‘localisation’ but for the purpose of this research, it refers to a series of measures which different constituent parts of the international humanitarian system should adopt in order to re-balance the system more in favour of national actors, so that a re-calibrated system works to the relevant strengths of its constituent parts and enhances partnership approaches to humanitarian action. | Catholic Agency for Overseas Development |
The term ‘localisation’ has become the buzzword of 2017, a subject that has taken on a new dimension due to the commitments made as part of the Grand Bargain agreed at the World Humanitarian Summit in May 2016. | Trócaire—Groupe Urgence |
The research found that the localisation ‘agenda’ is a Pandora’s Box of issues linked to the political economy of aid and North/South relations. If badly managed, it could potentially create or worsen tensions between local and international actors. | Trócaire—Groupe Urgence |
The fundamental issue is between a technical or a political interpretation of ‘localisation’. A technical interpretation puts the emphasis on ‘proximity’ to the crisis-area. If international agencies and their decision-making can ‘decentralise’, then localisation will have been achieved. The political interpretation sees it as a ‘shifting the power’, from international to national actors. Problematically, the discussions and working groups on different aspects of ‘localisation’ are currently all concentrated in Western capitals like Geneva, New York and London. There is very little participation and input from ‘national’ actors. | All India Disaster Mitigation Institute |
‘Transformers’ are concerned that ‘localisation’ as ‘decentralisation’ actually turns into an incentive to accelerate the ‘multi-nationalisation’ of INGOs: creating more and more ‘national’ offices and national ‘affiliates’, that sooner or later will also compete in fundraising from the domestic market. | Start Network |
Within the TSC Project there were differing opinions as to what constituted localisation reflecting the Start Network’s own research that “not only is localisation a vague concept, it is also an ongoing and difficult debate” | Start Network |
While MSF has seen many examples of the important humanitarian contributions that national and local actors make, it has also witnessed a number of constraints and challenges that confront these actors when delivering humanitarian assistance, especially in situations of armed conflict. These limitations, which have been largely ignored by the localisation agenda, are examined from both a conceptual and practical point of view in: Schenkenberg, E. 2016. | Médecins sans Frontières |
There is not yet a globally accepted definition of aid localization. To frame the discussion around the different components of this concept, the following common definition emerged: Aid localisation is a collective process involving different stakeholders that aims to return local actors, whether civil society organisations or local public institutions, to the centre of the humanitarian system with a greater role in humanitarian response. | Trócaire—Groupe Urgence |
However, until now localisation had not been afforded a broad definition, whilst international agreements focused only on financial targets. The new framework published today takes a deeper and more critical view of localisation, looking at the quality (not just the quantity) of funding, partnerships and participation, capacity development, and the influence of local and national organisations. | Start Network |
For example, Médecins Sans Frontières (MSF) argues that localisation is counterproductive and “likely to produce sub-optimal results for the effective delivery of aid to people in need of immediate relief” in armed conflicts. The main argument is that local and national NGOs would not be able to deliver impartial humanitarian aid to conflict-affected populations. | ActionAid |
The ongoing debate over localisation is complicated by the multitude of different understandings of the concept. This plurality is the result of several factors, namely the vagueness of the concept, a lack of clarity regarding the problem it is supposed to mitigate, and sometimes even institutional interest. | Start Network |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Isaacs, L.; Odlum, A.; León-Araúz, P. Quartz: A Template for Quantitative Corpus Data Visualization Tools. Languages 2024, 9, 81. https://doi.org/10.3390/languages9030081
Isaacs L, Odlum A, León-Araúz P. Quartz: A Template for Quantitative Corpus Data Visualization Tools. Languages. 2024; 9(3):81. https://doi.org/10.3390/languages9030081
Chicago/Turabian StyleIsaacs, Loryn, Alex Odlum, and Pilar León-Araúz. 2024. "Quartz: A Template for Quantitative Corpus Data Visualization Tools" Languages 9, no. 3: 81. https://doi.org/10.3390/languages9030081
APA StyleIsaacs, L., Odlum, A., & León-Araúz, P. (2024). Quartz: A Template for Quantitative Corpus Data Visualization Tools. Languages, 9(3), 81. https://doi.org/10.3390/languages9030081