Toward a Model to Evaluate Machine-Processing Quality in Scientific Documentation and Its Impact on Information Retrieval
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Background
- ∘
- Evaluate the licenses to reuse the data in a legal context.
- ∘
- Evaluate whether the dataset is contained in the metadata.
- ∘
- Determine whether metadata remain even if the data are no longer available.
- ∘
- Determine the level of access to publications and public or restricted data, and the conditions of access.
- ∘
- Evaluate the standard and machine-readable format to describe metadata.
- ∘
- Measure domain-independent core metadata.
3.2. Applying the AHP Method to Prioritize Quality Indicators in Scientific Documents
- 1.
- Definitions of the indicators and metrics that allow us to evaluate the quality of the document and thus improve its retrieval; they were grouped into three areas:
- 2.
- Building the hierarchical model of quality indicators and metrics to be machine-processable. As shown in Figure 1, two hierarchical levels have been identified: the first (red) corresponds to the scopes defined in the previous step that cover the quality indexes and the second level (green) corresponds to each of the sub-scopes (nine in total) into which the scopes of the first level are divided to classify each of the previously defined indicators. The AHP method allows us to group the different indicators to facilitate and to measure the influence on the general objectives.
3.3. Technological Implementation: Indexing and Retrieval Process
3.4. Applying the Quality Model to a Set of Documents
4. Experimentation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, X.; Li, X.; Jiang, S.; Li, X.; Xie, B. Evolution Analysis of Information Retrieval based on co-word network. In Proceedings of the 2019 3rd International Conference on Electronic Information Technology and Computer Engineering (EITCE), Xiamen, China, 18–20 October 2019; IEEE: Xiamen, China, 2019; pp. 1837–1840. [Google Scholar]
- Tan, J.; Tian, Y. Fuzzy retrieval algorithm for film and television animation resource database based on deep neural network. J. Radiat. Res. Appl. Sci. 2023, 16, 100675. [Google Scholar] [CrossRef]
- Wang, Y.; Chen, L.; Wu, G.; Yu, K.; Lu, T. Efficient and secure content-based image retrieval with deep neural networks in the mobile cloud computing. Comput. Secur. 2023, 128, 103163. [Google Scholar] [CrossRef]
- Bhopale, A.P.; Tiwari, A. Transformer based contextual text representation framework for intelligent information retrieval. Expert Syst. Appl. 2023, 238, 121629. [Google Scholar] [CrossRef]
- Thakur, N.; Reimers, N.; Rücklé, A.; Srivastava, A.; Gurevych, I. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models 2021. arXiv 2021, arXiv:2104.08663. [Google Scholar] [CrossRef]
- Koga, S.; Martin, N.B.; Dickson, D.W. Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders. Brain Pathol. 2023, e13207. [Google Scholar] [CrossRef] [PubMed]
- Zhang, R.; Han, J.; Zhou, A.; Hu, X.; Yan, S.; Lu, P.; Li, H.; Gao, P.; Qiao, Y. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. arXiv 2023, arXiv:2303.16199. [Google Scholar] [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- Sánchez-Ruiz, L.M.; Moll-López, S.; Nuñez-Pérez, A.; Moraño-Fernández, J.A.; Vega-Fleitas, E. ChatGPT Challenges Blended Learning Methodologies in Engineering Education: A Case Study in Mathematics. Appl. Sci. 2023, 13, 6039. [Google Scholar] [CrossRef]
- Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2021, arXiv:2108.07258. [Google Scholar]
- Feilmayr, C. Optimizing Selection of Assessment Solutions for Completing Information Extraction Results. Comput. Y Sist. 2013, 17, 169–178. [Google Scholar]
- Zaman, G.; Mahdin, H.; Hussain, K.; Atta-Ur-Rahman; Abawajy, J.; Mostafa, S.A. An Ontological Framework for Information Extraction from Diverse Scientific Sources. IEEE Access 2021, 9, 42111–42124. [Google Scholar] [CrossRef]
- Suárez López, D.; Alvarez Rodriguez, J.M. Quality in Documentation: Key Factor for the Retrieval Process. In Proceedings of the Information Technology and Systems, Bogota, Colombia, 5–7 February 2020; Rocha, Á., Ferrás, C., Montenegro Marin, C.E., Medina García, V.H., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 67–74. [Google Scholar]
- Rodríguez Leyva, P.; Delgado Mesa, Y.; Viltres Sala, H.; Estrada Sentí, V.; Febles, J.P. Modelo computacional para el desarrollo de sistemas de recuperación de información. Rev. Cuba. Cienc. Informáticas 2018, 12, 173–188. [Google Scholar]
- Tamrakar, A.; Vishwakarma, S.K. Analysis of Probabilistic Model for Document Retrieval in Information Retrieval. In Proceedings of the 2015 International Conference on Computational Intelligence and Communication Networks (CICN), Jabalpur, India, 12–14 December 2015; IEEE: Jabalpur, India, 2015; pp. 760–765. [Google Scholar]
- Li, X.; Li, K.; Qiao, D.; Ding, Y.; Wei, D. Application Research of Machine Learning Method Based on Distributed Cluster in Information Retrieval. In Proceedings of the 2019 International Conference on Communications, Information System and Computer Engineering (CISCE), Haikou, China, 5–7 July 2019; IEEE: Haikou, China, 2019; pp. 411–414. [Google Scholar]
- Taylor, S.J.E.; Anagnostou, A.; Fabiyi, A.; Currie, C.; Monks, T.; Barbera, R.; Becker, B. Open science: Approaches and benefits for modeling & simulation. In Proceedings of the 2017 Winter Simulation Conference (WSC), Las Vegas, NV, USA, 3–6 December 2017; IEEE: Las Vegas, NV, 2017; pp. 535–549. [Google Scholar]
- Sidi, M.L.; Gunal, S. A Purely Entity-Based Semantic Search Approach for Document Retrieval. Appl. Sci. 2023, 13, 10285. [Google Scholar] [CrossRef]
- Nagumothu, D.; Eklund, P.W.; Ofoghi, B.; Bouadjenek, M.R. Linked Data Triples Enhance Document Relevance Classification. Appl. Sci. 2021, 11, 6636. [Google Scholar] [CrossRef]
- Frihat, S.; Beckmann, C.L.; Hartmann, E.M.; Fuhr, N. Document Difficulty Aspects for Medical Practitioners: Enhancing Information Retrieval in Personalized Search Engines. Appl. Sci. 2023, 13, 10612. [Google Scholar] [CrossRef]
- Al Sibahee, M.A.; Abdulsada, A.I.; Abduljabbar, Z.A.; Ma, J.; Nyangaresi, V.O.; Umran, S.M. Lightweight, Secure, Similar-Document Retrieval over Encrypted Data. Appl. Sci. 2021, 11, 12040. [Google Scholar] [CrossRef]
- Yeshambel, T.; Mothe, J.; Assabie, Y. Amharic Adhoc Information Retrieval System Based on Morphological Features. Appl. Sci. 2022, 12, 1294. [Google Scholar] [CrossRef]
- Novak, E.; Bizjak, L.; Mladenić, D.; Grobelnik, M. Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval. Knowl.-Based Syst. 2022, 244, 108545. [Google Scholar] [CrossRef]
- Lechtenberg, F.; Farreres, J.; Galvan-Cara, A.-L.; Somoza-Tornos, A.; Espuña, A.; Graells, M. Information retrieval from scientific abstract and citation databases: A query-by-documents approach based on Monte-Carlo sampling. Expert Syst. Appl. 2022, 199, 116967. [Google Scholar] [CrossRef]
- Abadal Falgueras, E.; Anglada Ferrer, L. Ciencia Abierta: Cómo han evolucionado la denominación y el concepto. An. Doc. 2020, 23, 1. [Google Scholar] [CrossRef]
- Hasselbring, W.; Carr, L.; Hettrick, S.; Packer, H.; Tiropanis, T. FAIR and Open Computer Science Research Software. arXiv 2019, arXiv:1908.05986. [Google Scholar]
- Bezjak, S.; Clyburne-Sherin, A.; Conzett, P.; Fernandes, P.; Görögh, E.; Helbig, K.; Kramer, B.; Labastida, I.; Niemeyer, K.; Psomopoulos, F.; et al. Open Science Training Handbook; Zenodo: Genève, Switzerland, 2018. [Google Scholar]
- Mons, B.; Neylon, C.; Velterop, J.; Dumontier, M.; da Silva Santos, L.O.B.; Wilkinson, M.D. Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud. Inf. Serv. Use 2017, 37, 49–56. [Google Scholar] [CrossRef]
- FAIR-Aware Online Assessment Tool. Available online: https://fairaware.dans.knaw.nl (accessed on 28 July 2021).
- FAIRsFAIR Data Object Assessment Metrics: Request for Comments; FAIRsFAIR: Calgary, AB, USA, 2020.
- DG for Research and Innovation. Reproducibility of Scientific Results in the EU Scoping Report; DG for Research and Innovation: Brussels, Belgium, 2020; p. 32. [Google Scholar]
- Echtler, F.; Häußler, M. Open Source, Open Science, and the Replication Crisis in HCI. In Proceedings of the Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; ACM: Montreal, QC, Canada, 2018; pp. 1–8. [Google Scholar]
- Hasselbring, W.; Carr, L.; Hettrick, S.; Packer, H.; Tiropanis, T. From FAIR research data toward FAIR and open research software. IT—Inf. Technol. 2020, 62, 39–47. [Google Scholar] [CrossRef]
- Munafò, M.R.; Nosek, B.A.; Bishop, D.V.M.; Button, K.S.; Chambers, C.D.; Percie du Sert, N.; Simonsohn, U.; Wagenmakers, E.-J.; Ware, J.J.; Ioannidis, J.P.A. A manifesto for reproducible science. Nat. Hum. Behav. 2017, 1, 0021. [Google Scholar] [CrossRef] [PubMed]
- Shokraneh, F. Reproducibility and replicability of systematic reviews. World J. Meta-Anal. 2019, 7, 66–76. [Google Scholar] [CrossRef]
- Sivagnanam, S.; Nandigam, V.; Lin, K. Introducing the Open Science Chain: Protecting Integrity and Provenance of Research Data. In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), Chicago, IL, USA, 28 July–1 August 2019; ACM: Chicago, IL, USA, 2019; pp. 1–5. [Google Scholar]
- Kedron, P.; Li, W.; Fotheringham, S.; Goodchild, M. Reproducibility and replicability: Opportunities and challenges for geospatial research. Int. J. Geogr. Inf. Sci. 2021, 35, 427–445. [Google Scholar] [CrossRef]
- Carreño, R.L.; Méndez, F.J.M. Sistemas de recuperación de información implementados a partir de CORD-19: Herramientas clave en la gestión de la información sobre COVID-19. Rev. Española Doc. Científica 2020, 43, e275. [Google Scholar] [CrossRef]
- Roberts, K.; Alam, T.; Bedrick, S.; Demner-Fushman, D.; Lo, K.; Soboroff, I.; Voorhees, E.; Wang, L.L.; Hersh, W.R. TREC-COVID: Rationale and structure of an information retrieval shared task for COVID-19. J. Am. Med. Inform. Assoc. 2020, 27, 1431–1436. [Google Scholar] [CrossRef]
- Lipovetsky, S. AHP in nonlinear scaling: From two-envelope problem to modeling by predictors. Production 2021, 31, e20210007. [Google Scholar] [CrossRef]
- Willmer Escobar, J.; Willmer Escobar, J. Metodología para la toma de decisiones de inversión en portafolio de acciones utilizando la técnica multicriterio AHP. Contaduría Y Adm. 2015, 60, 346–366. [Google Scholar] [CrossRef]
- Clinio, A. Ciência Aberta na América Latina: Duas perspectivas em disputa. Transinformação 2019, 31, e190028. [Google Scholar] [CrossRef]
- Hernandez, D.; León, D.; Torres, D. Importancia de las revistas de acceso abierto: La indización como meta fundamental. Dictam. Libre 2020, 13, 81–98. [Google Scholar] [CrossRef]
- Vainshtein, R.; Katz, G.; Shapira, B.; Rokach, L. Assessing the Quality of Scientific Papers. arXiv 2019, arXiv:1908.04200. [Google Scholar]
- 5.22. File Location (Ubicación del Archivo) (MA)—Documentación de Directrices Para Repositorios Institucionales de Investigación de la Red Colombiana de Información Científica (RedCol) 2020—1.0. Available online: https://redcol.readthedocs.io/es/latest/field_filelocation.html#aire-file (accessed on 27 July 2021).
- Metadata in Science Publishing. Available online: http://wwwis.win.tue.nl/infwet03/proceedings/8/ (accessed on 27 July 2021).
- López-Anguita, R.; Montejo-R´aez, A.; Mart´ınez-Santiago, F.; D´ıaz-Galiano, M. Legibilidad del texto, métricas de complejidad y la importancia de las palabras. Proces. Del Leng. Nat. 2018, 61, 101–108. [Google Scholar] [CrossRef]
- Baquedano, M.M. Legibilidad Y Variabilidad de los Textos. Boletín Investig. Educ. 2006, 21, 13–25. [Google Scholar]
- Goepel, K.D. Implementation of an Online Software Tool for the Analytic Hierarchy Process (AHP-OS). Int. J. Anal. Hierarchy Process 2018, 10, 469–487. [Google Scholar] [CrossRef]
- Mendoza, A.; Solano, C.; Palencia, D.; Garcia, D.; Mendoza, A.; Solano, C.; Palencia, D.; Garcia, D. Application of the Analytical Hierarchy Process (AHP) for decision-making with expert judgment. Ingeniare Rev. Chil. Ing. 2019, 27, 348–360. [Google Scholar] [CrossRef]
- Shah, N.; Willick, D.; Mago, V. A framework for social media data analytics using Elasticsearch and Kibana|SpringerLink. Red Ina. 2022, 1179–1187. [Google Scholar] [CrossRef]
- Metadata 2020 Principles. Available online: https://metadata2020.org/resources/metadata-principles/ (accessed on 21 May 2023).
- Yang, A.; Zhu, S.; Li, X.; Yu, J.; Wei, M.; Li, C. The research of policy big data retrieval and analysis based on elastic search. In Proceedings of the 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 26–28 May 2018; pp. 43–46. [Google Scholar]
Indicator | Metrics | Description | Value |
---|---|---|---|
Visibility | Open access | Identify that there are no restrictions for users to access digital resources freely. | 0|1 |
Restricted access | Validate whether a username and password or payment are required for access or download. | 0|1 | |
Embargoed access | Verify if the resources are available for a limited time. | 0|1 | |
Access to metadata only | Limited access to only metadata implies that the resources are not available in this case. | 0|1 | |
Full text | Verify full access to the metadata of the document or a portion of it. | 0|1 | |
Dataset | Verify access to the structured set of information such as images, videos, numbers, text, tables, etc. | 0|1 | |
Content Metadata | Author’s name | Check for the existence of metadata; if found, a value of 1 is assigned; otherwise, 0 is assigned. This assignment is conducted for each of the metadata elements. | |
Title | |||
Year | |||
Keywords | |||
Classification codes | |||
Abstract | |||
Multimedia objects | |||
From location | |||
DOI | |||
URI | |||
URL | |||
Format and versions | |||
Multimedia objects | |||
From location | |||
Links | |||
Editorial Policy | Full open access | Verify that the documents are available for free. | 0|1 |
Pay per download | Validate that articles can be individually downloaded for a fee without a subscription. | 0|1 | |
Partial access | Confirm the existence of a partial access model for accessing the content. | 0|1 | |
Subscription | Identify if the content can be accessed via subscription through a regular fee. | 0|1 |
Indicator | Metrics | Description | Value |
---|---|---|---|
Comprehensibility | Lexical density | A greater number of different words per text results in an increased difficulty for comprehension. | 0–1 |
Frequency of use | The more frequent a word is, the fewer cognitive resources it will demand for perception, recognition, and integration into text processing. As the words in a text become less frequent, reading becomes more burdensome, and the process slows down. | 0–1 | |
Sentence complexity | Measure the number of words per sentence, thus obtaining the sentence length index, and the number of complex clauses per sentence, yielding a complex clause index. | 0–1 | |
Syntactic complexity | Measure sentence length and the quantity of modifiers. | 0–1 | |
Punctuation marks | The average number of punctuation marks is used as one of the complexity indicators. | 0–1 | |
Readability | SSR index (measures vocabulary) | The focus is on measuring vocabulary and sentence structure to predict the relative readability difficulty of a text. | 0–100 [47]
|
Readability index | Calculate the number of words, the mean number of letters per word, and its variance. | 0–100 [48]
| |
Text analysis metrics | Represent the grammatical structure of a text in the form of an abstract syntax tree to facilitate the measurement of its depth and density. In this structure, each node represents a word or phrase, and the connections between them symbolize grammatical relationships. | 0–1 | |
Content structure | Degree of compliance with a standard structure | Measures compliance considering the following:
| According to the number of items found: 1–3: low 3–6: half More than 6: high |
Depth of sections | Measure the levels of depth of the sections within the document, according to their importance: main sections, subsections, and sub-subsections. | Add one point (1) for each section level identified |
Resource Type | Metrics | Description | Value |
---|---|---|---|
| Repository | Verify the existence of a digital asset management system in which digital resources such as documents, software, multimedia files, etc., are stored and controlled. | 0|1 |
Platform | Verify if it provides services or resources that are useful for algorithms or source code. | 0|1 | |
Site | Confirm access to the location where the information is hosted, typically personal websites or blogs. | 0|1 | |
License of use | Ensure that the resources can be used without restrictions by the scientific community. | 0|1 | |
Authorization license | Combine copyright with non-commercial use of the resources. | 0|1 | |
Dataset | Verify the existence of a structured set of information resulting from analysis and studies, such as images, videos, numbers, text, etc. | 0|1 | |
Text format | Formats may vary depending on the repository, so it is necessary to identify whether they are in plain text, i.e., those without formatting; with programming language extensions such as Java or Python; or structured in JSON or XML format. | 0|1 |
Decision Hierarchy | |||
---|---|---|---|
Level 0 | Level 1 | Global Priorities | Rank |
Quality papers | Accessibility | 31.9% | 2 |
Content | 13.8% | 3 | |
Reproducibility | 54.3% | 1 |
Decision Hierarchy | |||
---|---|---|---|
Metrics | Level 2 | Global Priorities | Rank |
Accessibility | Visibility | 33.3% | 2 |
Metadata | 45.2% | 1 | |
Editorial policy | 21.5% | 3 | |
Content | Compressibility | 43.7% | 2 |
Readability | 11.9% | 3 | |
Content structure | 44.4% | 1 | |
Reproducibility | Algorithms/source | 42.8% | 1 |
Equations/theorems | 13.4% | 3 | |
Raw data | 40.2% | 2 | |
Processed data | 3.6% | 4 |
Indicator | Metrics | Normalized | Weight |
---|---|---|---|
Accessibility w = 0.138 | Visibility | 0.333 | 0.1062 |
Metadata | 0.452 | 0.1442 | |
Editorial policy | 0.215 | 0.0686 | |
Content w = 0.319 | Compressibility | 0.437 | 0.0603 |
Readability | 0.119 | 0.0164 | |
Content structure | 0.444 | 0.0613 | |
Reproducibility w = 0.543 | Algorithms/source | 0.4280 | 0.2324 |
Equations/models | 0.1340 | 0.0728 | |
Raw data | 0.4020 | 0.2183 | |
Processed data | 0.0360 | 0.0195 |
DOI | Accessibility | Content | Reproducibility | Weight | Category |
---|---|---|---|---|---|
10.1007/s11831-020-09496-0 | 11.98 | 10.6064 | 0 | 22.1044 | Low |
10.1007/s11277-020-07108-5 | 12.5376 | 13.713 | 0 | 26.2506 | Moderate |
10.1109/JIOT.2017.2683200 | 11.0576 | 12.8324 | 0 | 23.89 | Low |
10.1007/s42979-021-00521-y | 12.5376 | 12.849 | 0 | 25.3866 | Moderate |
10.1007/s11277-021-08439-7 | 13.1876 | 9.8987 | 0 | 23.0863 | Low |
10.1109/ACCESS.2019.2930345 | 12.5778 | 13.9471 | 0 | 26.5249 | Moderate |
10.1109/ACCESS.2018.2842034 | 13.2278 | 12.6112 | 0 | 25.839 | Moderate |
10.1109/ACCESS.2018.2842034 | 13.2278 | 12.6112 | 0 | 25.839 | Moderate |
10.1109/ACCESS.2019.2908684 | 12.5778 | 9.6833 | 0 | 22.2611 | Low |
10.1109/ACCESS.2018.2877293 | 12.5778 | 10.931 | 0 | 23.5088 | Low |
10.1109/JTEHM.2018.2822681 | 12.5778 | 11.653 | 0 | 24.2308 | Low |
10.1109/ACCESS.2018.2864675 | 12.5778 | 13.365 | 0 | 25.9428 | Moderate |
10.22430/22565337.1485 | 6.3678 | 13.2353 | 0 | 19.6031 | Low |
10.1109/ACCESS.2020.3024066 | 13.2278 | 12.9525 | 0 | 26.1803 | Low |
10.1007/s40860-020-00116-z | 12.5376 | 15.7444 | 0 | 28.282 | Moderate |
10.1109/ACCESS.2020.3004486 | 12.5778 | 12.3583 | 0 | 24.9361 | Low |
10.1109/ACCESS.2020.2998983 | 12.5778 | 12.3886 | 0 | 24.9664 | Low |
10.1109/ACCESS.2020.2986381 | 12.5778 | 14.1379 | 0 | 26.7157 | Moderate |
10.1007/s11036-018-1085-0 | 12.5376 | 16.4839 | 0 | 29.0215 | Moderate |
10.1109/ACCESS.2019.2951164 | 12.5778 | 12.9714 | 0 | 25.5492 | Moderate |
10.1109/TASE.2020.3004313 | 13.0976 | 12.5862 | 0 | 25.6838 | Moderate |
10.15446/esrj.v24n2.87441 | 6.3678 | 12.8262 | 0 | 19.194 | Low |
10.1007/s11227-021-03653-3 | 12.5376 | 12.8196 | 0 | 25.3572 | Moderate |
10.1109/MS.2017.2 | 11.0576 | 9.6594 | 0 | 20.717 | Low |
10.1016/j.jnca.2016.10.013 | 11.0576 | 16.9416 | 0 | 27.9992 | Moderate |
10.1109/ACCESS.2020.3022641 | 12.5778 | 12.7838 | 0 | 25.3616 | Moderate |
10.1109/ACCESS.2019.2893445 | 6.3678 | 8.6874 | 0 | 15.0552 | Low |
10.19053/01211129.v26.n46.2017.7326 | 6.3678 | 10.5453 | 0 | 16.9131 | Low |
10.1109/ACCESS.2019.2956980 | 12.5778 | 14.2777 | 0 | 26.8555 | Moderate |
10.1109/ACCESS.2019.2910411 | 12.5778 | 13.4773 | 0 | 26.0551 | Moderate |
10.1109/ACCESS.2019.2906265 | 12.5778 | 10.4612 | 0 | 23.039 | Low |
10.1109/ACCESS.2019.2905017 | 12.5778 | 13.3965 | 0 | 25.9743 | Moderate |
10.2991/icaset-18.2018.20 | 3.1188 | 5.2043 | 0 | 8.3231 | Low |
10.1007/s11277-020-07446-4 | 12.5376 | 13.3678 | 0 | 25.9054 | Moderate |
10.1109/ACCESS.2019.2932609 | 12.5778 | 13.2871 | 0 | 25.8649 | Moderate |
10.1007/s11227-018-2288-7 | 12.5376 | 12.4159 | 0 | 24.9535 | Low |
10.1109/CCAA.2016.7813916 | 10.018 | 13.9867 | 0 | 24.0047 | Low |
10.1109/ACCESS.2020.2988059 | 13.0976 | 16.497 | 0 | 29.5946 | Moderate |
10.15517/eci.v8i1.30010 | 5.848 | 14.3093 | 0 | 20.1573 | Low |
10.1109/ACCESS.2020.2986681 | 13.0976 | 12.4434 | 0 | 25.541 | Regular |
10.1007/s12525-020-00405-8 | 13.2278 | 12.6494 | 0 | 25.8772 | Moderate |
10.1109/ACCESS.2019.2941978 | 12.5778 | 13.8413 | 0 | 26.4191 | Moderate |
10.1109/ACCESS.2019.2958257 | 12.5778 | 13.4266 | 0 | 26.0044 | Moderate |
10.1007/s10270-020-00785-7 | 13.7476 | 16.165 | 23.2404 | 53.153 | High |
10.11144/Javeriana.iyu21-1.iprc | 5.848 | 10.9357 | 0 | 16.7837 | Low |
10.1109/ACCESS.2018.2793280 | 12.5778 | 10.9511 | 0 | 23.5289 | Low |
10.1109/ACCESS.2019.2895368 | 12.5778 | 9.9894 | 0 | 22.5672 | Low |
10.1109/JIOT.2020.2988321 | 13.0976 | 12.6334 | 0 | 25.731 | Moderate |
10.1186/s13635-020-00111-0 | 13.7476 | 11.9492 | 0 | 25.6968 | Moderate |
10.1109/ACCESS.2019.2946400 | 12.5778 | 14.5298 | 0 | 27.1076 | Moderate |
10.15446/dyna.v85n204.68264 | 6.3678 | 14.245 | 0 | 20.6128 | Low |
10.1007/s00521-020-04874-y | 12.5376 | 15.6602 | 0 | 28.1978 | Moderate |
10.1109/ACCESS.2019.2951168 | 12.5778 | 13.2015 | 0 | 25.7793 | Moderate |
10.1109/ACCESS.2020.2998739 | 12.5778 | 13.2083 | 0 | 25.7861 | Moderate |
10.1109/ACCESS.2019.2902865 | 12.5778 | 10.6249 | 0 | 23.2027 | Low |
10.1007/s11227-019-02928-0 | 12.5376 | 9.9435 | 0 | 22.4811 | Low |
10.1109/ACCESS.2020.2997761 | 12.5778 | 13.1285 | 0 | 25.7063 | Moderate |
10.1109/ACCESS.2019.2899828 | 12.5778 | 10.6818 | 0 | 23.2596 | Low |
10.1007/s40860-016-0027-5 | 8.8376 | 15.2348 | 0 | 24.0724 | Low |
10.1109/ACCESS.2017.2692247 | 12.5778 | 15.162 | 0 | 27.7398 | Moderate |
10.1109/TCC.2019.2902380 | 13.0976 | 10.0035 | 0 | 23.1011 | Low |
10.1109/ACCESS.2017.2717818 | 12.5778 | 9.7447 | 0 | 22.3225 | Low |
10.1109/TVT.2019.2944926 | 13.0976 | 13.2034 | 0 | 26.301 | Moderate |
10.1109/ACCESS.2018.2872799 | 12.5778 | 13.018 | 0 | 25.5958 | Moderate |
10.1109/ACCESS.2020.2987749 | 12.5778 | 13.4614 | 0 | 26.0392 | Moderate |
10.1109/ACCESS.2019.2933014 | 12.5778 | 12.8661 | 0 | 25.4439 | Moderate |
10.1109/ACCESS.2020.3034466 | 12.5778 | 12.3171 | 0 | 24.8949 | Low |
10.1109/JIOT.2015.2483023 | 11.0576 | 11.3964 | 0 | 22.454 | Low |
10.1109/ACCESS.2016.2607786 | 12.5778 | 9.7146 | 0 | 22.2924 | Low |
10.1109/ACCESS.2018.2871271 | 12.5778 | 13.7031 | 0 | 26.2809 | Moderate |
10.1007/s11227-016-1684-0 | 12.5376 | 12.5583 | 0 | 25.0959 | Moderate |
10.1007/s11277-020-07649-9 | 12.5376 | 9.5646 | 0 | 22.1022 | Low |
10.1016/j.jnca.2016.08.007 | 11.0576 | 13.1881 | 0 | 24.2457 | Low |
10.1109/IoTDI.2015.22 | 10.018 | 11.8547 | 0 | 21.8727 | Low |
10.1109/ACCESS.2019.2927394 | 12.5778 | 12.1005 | 0 | 24.6783 | Low |
10.1109/JCN.2019.000049 | 13.0976 | 11.5436 | 0 | 24.6412 | Low |
10.1007/s10916-019-1158-z | 12.5376 | 10.9914 | 0 | 23.529 | Low |
10.1109/ACCESS.2019.2929915 | 12.5778 | 12.5362 | 0 | 25.114 | Moderate |
10.1109/TCSI.2020.2973908 | 13.0976 | 12.3713 | 0 | 25.4689 | Moderate |
10.1109/ACCESS.2019.2931868 | 12.5778 | 13.0582 | 0 | 25.636 | Moderate |
10.1016/j.softx.2022.101218 | 13.7476 | 17.1149 | 23.2404 | 54.1029 | High |
10.1016/j.softx.2022.101081 | 13.7476 | 15.689 | 47.0238 | 76.4604 | High |
10.1016/j.iot.2022.100677 | 13.7476 | 13.5239 | 40.725 | 67.9965 | High |
10.1016/j.comnet.2020.107673 | 13.1876 | 14.5269 | 47.0238 | 74.7383 | High |
10.1016/j.softx.2022.101089 | 13.7476 | 16.6358 | 47.0238 | 77.4072 | High |
10.1016/j.softx.2021.100661 | 13.7476 | 16.4539 | 23.2404 | 53.4419 | High |
10.1016/j.iot.2020.100255 | 13.1876 | 14.0631 | 47.0238 | 74.2745 | High |
10.1016/j.softx.2023.101390 | 13.7476 | 14.0765 | 47.0238 | 74.8479 | High |
10.1016/j.dib.2023.109248 | 13.7476 | 12.45 | 47.0238 | 73.2214 | High |
10.1016/j.dib.2022.108400 | 13.0976 | 10.6664 | 35.5 | 59.264 | High |
10.1016/j.dib.2021.107530 | 13.7476 | 11.0984 | 47.0238 | 71.8698 | High |
10.1016/j.simpa.2022.100282 | 13.7476 | 12.1564 | 47.0238 | 72.9278 | High |
10.1016/j.dib.2022.108026 | 13.7476 | 12.4982 | 47.0238 | 73.2696 | High |
10.1016/j.dib.2021.106826 | 13.0976 | 11.3757 | 28.154 | 52.6273 | High |
10.1016/j.dib.2021.107453 | 13.7476 | 11.0544 | 47.0238 | 71.8258 | High |
10.1016/j.simpa.2020.100029 | 13.7476 | 12.9906 | 47.0238 | 73.762 | High |
10.1016/j.comnet.2021.108627 | 13.1876 | 12.7975 | 54.3 | 80.2851 | High |
10.1016/j.softx.2022.100991 | 13.7476 | 12.6896 | 47.0238 | 73.461 | High |
10.1016/j.softx.2022.101180 | 13.7476 | 13.7839 | 47.0238 | 74.5553 | High |
10.1016/j.dib.2022.108366 | 13.7476 | 13.0354 | 45.2500 | 72.0330 | High |
10.1016/j.softx.2022.100991 | 13.7476 | 17.1149 | 23.2404 | 54.1029 | High |
ID | Queries |
---|---|
Q1 | Smart home IoT |
Q2 | Security system protocol |
Q3 | Protocol access control |
Q4 | Arduino base |
Q5 | Standalone device to internet |
Q6 | Smart energy management |
Q7 | Global positioning system |
Q8 | IoT hardware service |
Q8 | Blockchain network IoT |
Q9 | Management protocol |
Q10 | Big data for IoT |
Q11 | Sensor measurement |
Q12 | Industrial control system |
Q13 | Security and privacy data |
Q14 | Data analytics for IoT |
Q15 | Communication architecture for IoT |
Q16 | Information systems |
Q17 | Machine learning for IoT |
Q18 | Biometric data authentication |
Q19 | Storage data management |
Q20 | Smart agriculture |
Q21 | Remote IoT users |
Q22 | Wearable sensor |
Q23 | Secure IoT framework |
Q24 | Data quality of service |
Q25 | Smart city |
ID of Query | Low Documents | High Documents | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
Q1 | 1.00 | 0.40 | 0.50 | 0.67 | 0.67 | 0.67 |
Q2 | 0.00 | 0.00 | 0.00 | 0.67 | 0.50 | 0.00 |
Q3 | 1.00 | 0.25 | 0.29 | 1.00 | 0.50 | 0.67 |
Q4 | 0.00 | 0.00 | 0.00 | 0.50 | 1.00 | 0.67 |
Q5 | 1.00 | 0.50 | 0.67 | 0.50 | 0.50 | 0.50 |
Q6 | 0.00 | 0.00 | 0.00 | 1.00 | 0.50 | 0.67 |
Q7 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 |
Q8 | 1.00 | 0.50 | 0.67 | 0.80 | 0.57 | 0.67 |
Q9 | 0.50 | 0.20 | 0.20 | 0.50 | 0.50 | 0.50 |
Q10 | 0.50 | 0.50 | 0.50 | 0.60 | 1.00 | 0.75 |
Q11 | 1.00 | 0.33 | 0.40 | 0.50 | 1.00 | 0.67 |
Q12 | 1.00 | 0.50 | 0.67 | 0.67 | 0.25 | 0.36 |
Q13 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 |
Q14 | 1.00 | 0.33 | 0.40 | 0.75 | 1.00 | 0.86 |
Q15 | 0.00 | 0.00 | 0.00 | 0.50 | 0.50 | 0.50 |
Q16 | 0.00 | 0.00 | 0.00 | 1.00 | 0.33 | 0.50 |
Q17 | 1.00 | 0.25 | 0.29 | 0.67 | 0.67 | 0.67 |
Q18 | 0.00 | 0.00 | 0.00 | 0.60 | 1.00 | 0.75 |
Q19 | 0.00 | 0.00 | 0.00 | 0.50 | 1.00 | 0.67 |
Q20 | 0.00 | 0.00 | 0.00 | 0.67 | 1.00 | 0.80 |
Q21 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 |
Q22 | 1.00 | 0.20 | 0.22 | 1.00 | 0.50 | 0.67 |
Q23 | 0.00 | 0.00 | 0.00 | 0.75 | 0.75 | 0.75 |
Q24 | 0.00 | 0.00 | 0.00 | 1.00 | 0.50 | 0.67 |
Q25 | 1.00 | 0.25 | 0.00 | 0.75 | 0.50 | 0.60 |
AVG | 0.42 | 0.17 | 0.20 | 0.74 | 0.71 | 0.66 |
Low Documents | High Documents | |
---|---|---|
Mean | 0.440 | 0.744 |
Mean standard error | 0.097 | 0.039 |
Standard deviation | 0.485 | 0.199 |
Observational sample | 20 | |
Variance | 0.235 | 0.003 |
Student t-test | ||
t-test–one tailed | 0.0033 | |
t-test–two tailed | 0.0067 |
Low Documents | High Documents | |
Mean | 0.158 | 0.709 |
Mean standard error | 0.039 | 0.051 |
Standard deviation | 0.199 | 0.259 |
Observational sample | 20 | |
Variance | 0.039 | 0.067 |
Student t-test | ||
t-test–one tailed | 0.043 | |
t-test–two tailed | 1 |
Low Documents | High Documents | |
---|---|---|
Mean | 0.192 | 0.662 |
Mean standard error | 0.049 | 0.042 |
Standard deviation | 0.249 | 0.210 |
Observational sample | 20 | |
Variance | 0.062 | 0.044 |
Student t-test | ||
t-test–one tailed | 0.059 | |
t-test–two tailed | 0.042 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Suárez López, D.; Álvarez-Rodríguez, J.M.; Molina-Cardenas, M. Toward a Model to Evaluate Machine-Processing Quality in Scientific Documentation and Its Impact on Information Retrieval. Appl. Sci. 2023, 13, 13075. https://doi.org/10.3390/app132413075
Suárez López D, Álvarez-Rodríguez JM, Molina-Cardenas M. Toward a Model to Evaluate Machine-Processing Quality in Scientific Documentation and Its Impact on Information Retrieval. Applied Sciences. 2023; 13(24):13075. https://doi.org/10.3390/app132413075
Chicago/Turabian StyleSuárez López, Diana, José María Álvarez-Rodríguez, and Marvin Molina-Cardenas. 2023. "Toward a Model to Evaluate Machine-Processing Quality in Scientific Documentation and Its Impact on Information Retrieval" Applied Sciences 13, no. 24: 13075. https://doi.org/10.3390/app132413075
APA StyleSuárez López, D., Álvarez-Rodríguez, J. M., & Molina-Cardenas, M. (2023). Toward a Model to Evaluate Machine-Processing Quality in Scientific Documentation and Its Impact on Information Retrieval. Applied Sciences, 13(24), 13075. https://doi.org/10.3390/app132413075