Special Issue on the Curative Power of Medical Data

: With the massive amounts of medical data made available online, language technologies have proven to be indispensable in processing biomedical and molecular biology literature, health data or patient records. With huge amount of reports, evaluating their impact has long ceased to be a trivial task. Linking the contents of these documents to each other, as well as to specialized ontologies, could enable access to and the discovery of structured clinical information and could foster a major leap in natural language processing and in health research. The aim of this Special Issue, “Curative Power of Medical Data” in Data , is to gather innovative approaches for the exploitation of biomedical data using semantic web technologies and linked data by developing a community involvement in biomedical research. This Special Issue contains four surveys, which include a wide range of topics, from the analysis of biomedical articles writing style, to automatically generating tests from medical references, constructing a Gold standard biomedical corpus or the visualization of biomedical data.


Introduction
The interest in biomedical data, along with the continuous development of various qualitative and quantitative text analyses tools, made language technologies a natural choice for analyzing the evolution of the scientific life.
Biomedical Text Mining (BTM), known also as Health Informatics (HI), is a discipline situated at the intersection of computer science, medicine care, information systems and linguistics. It focuses on the resources, methods and applications needed to optimize the acquisition, storage, recovery and mining of information in biomedicine. Additionally, BTM uses sophisticated predictive models to understand, identify and extract concepts from a large collection of scientific texts in medicine, biology, biophysics, chemistry, etc., with the aim to discover knowledge that can add value to medical research [1].
In this context, a wide range of language resources has been developed, including complex lexicons, thesauri and ontologies that cover the entire spectrum of clinical concepts. In order to provide a uniform conceptual understanding, some authors have described a terminological and typology system [2,3].
Innovative approaches for the understanding of biomedical information using semantic web technologies and linked data have started to be investigated by bringing together practitioners, researchers, and scholars to share examples, use cases, theories and analyses of biomedical data. A Data 2019, 4,85 suitable forum for this was the MEDA Workshop on the Curative Power of Medical Data 1 , the starting point for this Special Issue, aimed at consolidating an internationally appreciated forum for scientific research in biomedical text mining, with an emphasis on crowdsourcing, the semantic web, knowledge integration and data linking.
The purpose of this guest editorial is to address the article contributions in this Special Issue [4] in order to use them as a basis in this area of science, whilst also encouraging communication among the various disciplines by identifying and grouping together complementary research solutions.

Challenges in the Biomedical Field
Due to the diversity of formats and text collections, to analyse biomedical data is a challenging task. Most of the online information is available nowadays as unstructured documents [5], although several initiatives have been proposed to convince users to manually label their online publications using a specialized markup language [6].
Among the hottest research topics covered by the biomedical research domain, we can mention: crowdsourcing and collaborative approaches in biomedicine in the era of big data [7,8]; creating, organizing and storing biomedical data in digital libraries [9]; extracting various information from medical texts and images [10,11]; the creation of conceptual graphs, medical ontologies and specialized semantic networks [12]; applications specially designed for medicine and biology, such as medical question answering or summarization; medical search engines [13] or deep learning techniques for bioinformatics [14]; or knowledge discovery using text mining [15].
One of the big enablers of the recent progress in the field has been an enormous growth in the amount of annotated data available. A recent publication cited 25 corpora of biomedical data that had become available within the past five years only [16,17], a previously unprecedented growth rate.

Summary
The aim of the Special Issue "Curative Power of Medical Data" of the Data Journal was to develop a community of researchers involved in biomedical research. This Special Issue contains four surveys, which include a wide range of topics, from author confidence in biomedical articles [18]; generating tests from medical references [19]; constructing a Gold standard biomedical corpus for the Romanian language [20]; up to the visualization of biomedical data among the Chinese elderly [21].
Mining biomedical literature to extract the science behind it, such as concepts, patterns or relations, is a very productive research area. However, extracting non-scientific information from biomedical data has recently also seen an increasing interest, with applications ranging from identifying speculative language, to retrieving papers with a specific writing style, in an attempt to cope with different reader preferences. Aside from mining knowledge, a new research direction, presented in [18], tries to identify the factors that drive readers to choose one scientific article instead of another.
Another direction proposed in the Special Issue was the automatic generation of tests for the evaluation of a student's academic performance in the medical domain. Using medical reference texts as input and supported by a specially designed medical ontology, [19] propose a system that generates different types of test questions (multiple-choice, fill in the blanks, true/false, and match), which can have a customizable length and difficulty, and most importantly can be automatically graded.
Besides the development of tools, this Special Issue also introduces a resource, i.e., the first morphologically and terminologically annotated biomedical corpus of the Romanian language [20]. With almost 14,000 tokens distributed in three medical subdomains (cardiology, diabetes and endocrinology), the corpus contains manually validated parts of speech and named entity annotations, useful for training specific biomedical applications. Another research direction is studied in [21], aiming to explore, through a visual analysis, the factors which contribute to the life satisfaction of elder people, ranging from demographic, physiological, psychological and economic perspectives up to social characteristics. The findings revealed that the influence of most factors has been changing in the past ten years. The influence of bright personality has in particular been increasing since 2005, while the influence of medical accessibility has been declining. At the same time, self-rated health, the self-evaluation of the economic level, economic self-sufficiency and bright personality have proven to be the most important factors.