“Who Is the FAIRest of Them All?” Authors, Entities, and Journals Regarding FAIR Data Principles

: The perceived need to improve the infrastructure supporting the re-use of scholarly data since the second decade of the 21st century led to the design of a concise number of principles and metrics, named FAIR Data Principles . This paper, part of an extended study, intends to identify the main authors, entities, and scientiﬁc journals linked to research conducted within the FAIR Data Principles . The research was developed by means of a qualitative approach, using documentary research and a constant comparison method for codiﬁcation and categorization of the sampled data. The sample studied showed that most authors were located in the Netherlands, with Europe accounting for more than 70% of the number of authors considered. Most of these are researchers and work in higher education institutions. These entities can be found in most of the territorial-administrative areas under consideration, with the USA being the country with more entities and Europe being the world region where they are more numerous. The journal with more texts in the used sample was Insights , with 2020 being the year when more texts were published. Two of the most prominent authors present in the sample texts were located in the Netherlands, while the other two were in France and Australia.


Introduction
Since the second decade of the 21st century, there has been a perceived need to improve the infrastructure supporting the re-use of scholarly data. As part of this, a set of stakeholders, including representatives from academia, industry, funding agencies, and academic publishers, have designed a concise number of principles and metrics, which they named FAIR Data Principles [1,2]. The acronym FAIR refers to the characteristics of Location (Findable), Accessibility (Accessible), Interoperability (Interoperable), and Reusability (Reusable). These principles specifically emphasize improving the ability of machines (in this context, interpreted as digital repositories) to find and use data automatically, as well as supporting its reuse by individuals [2] (p. 1).
Within this framework, the 15 principles presented appear divided by the identified categories. In order to be findable (Findable), (1) data (and/or metadata) are assigned a globally unique and persistent identifier; (2) data are described with enriched metadata (defined by principle 10); (3) metadata clearly and explicitly include the identifier of the data it describes; (4) data (and/or metadata) are recorded or indexed in a searchable resource. In order to be accessible (Accessible) (5), the data (and/or metadata) are retrievable via their identifier using a standardized communication protocol; (5.1) the protocol is open, free, and universally implementable; (5.2) the protocol allows an authentication and authorization procedure, where necessary; and (6) the metadata are accessible, even when the data are no longer available. In order to be interoperable (Interoperable), (7) the data (and/or metadata) use a formal, accessible, shared, and widely applicable language for knowledge representation; (8) the data (and/or metadata) use vocabulary that follow FAIR principles; and (9) the data (and/or metadata) include qualified references to other data (and/or In this regard, within the scope of the cutting-edge international scientific literature and by means of document analysis of papers regarding FAIR Data Principles, dating from 2016 onward, collected in three scientific databases, the study presented in this paper specifically intended to identify the main authors, entities, and scientific journals linked to research emphasizing the FAIR Data Principles since its inception. For such aim, it defined questions concerning: the authors, their locations and professional occupation; the organizations/entities where the authors perform their professional and research activities, including the place where such entities are based; the journals where and when the authors' research was published; the most prominent authors in terms of authorship. The research questions can be summarized as follows: Who is writing about FAIR Data Principles? 2.
Where are they based? 3.
What institutions and countries are leading the effort? 4.
In which scientific journals are they publishing their research?
The study allowed us to establish the landscape in which research dedicated to, or adopting, the FAIR Principles takes place. This considers both theoretical research and exercises of the practical application of the said principles.
Within this scope, the study of the researchers' profiles revealed the original contexts of those who developed this type of research.
We established the year 2016 and later as the chronological boundaries for collecting the sample of scientific papers used for the document analysis since that was the year when the "FAIR Guiding Principles for scientific data management and stewardship" were published [2]. We chose the LISA-Library, Information Science Abstracts; LISTA-Library, Information Science & Technology Abstracts; and Scopus databases because we were given access to them. Only peer-reviewed documents with full text were considered so that they could be used in future expansions of this research. Future research will allow us to collect and analyze an expanded number of information sources to build a structured information corpus that will be further analyzed and interpreted.
Although delimited by the sample, this allows us to identify trends that, in further research, may be contrasted with data collected from a larger number of sources and with broader research criteria.
It should be noted that it is not the focus of this paper to make a thematic distinction between the texts in the sample or a typological delimitation as to the types of research presented in those texts.

Materials and Methods
We developed a study with a qualitative approach, using documentary research, which is considered a systematic procedure for reviewing or assessing documentary material with textual or image information recorded without human intervention. The researcher then uses the examination and interpretation of data to extract meaning, gain understanding and develop empirical knowledge [11]. This analytic procedure involves identifying, selecting, evaluating, and synthesizing the documents' data. Document analysis produces data-excerpts, quotes, or entire passages-which are then organized into main themes, categories, and case examples specifically through content analysis [11].
This documentary research provided the means to establish the contextualization necessary to outline the problem and define the FAIR Data Principles scenario, and, on the other hand, it helped to develop the approach to that issue within the international scientific literature.
We then developed a systematization proposal, delimited by the data collection, and elaborated upon the categories collected and identified through the constant comparison method (CCM). CCM is a qualitative analysis approach developed within the grounded theory methodology, which focuses on the comparison of and between all the data elements, which can be identified as a procedure for interpreting texts, through coding and analysis, in order to develop theory [12,13] (p. 437). This method has four phases: (1) the comparison of incidents applicable to each category, (2) the integration of categories and their properties, (3) the delimitation of theory, and (4) the writing of theory [12] (p. 51-53) [13] (p.439-443). The analysis, which was carried out on the texts collected through the CCM, was developed by means of a back-and-forth process of coding, categorization, and saturation of the data on which this work is based, which constitutes a progressive spiral whose reflections allowed us to verify the main authors, entities, and scientific journals linked to the research conducted, with a special emphasis on the FAIR Data Principles since its inception. The use of the CCM will allow proceeding with the future analysis and interpretation of the information corpus resulting from expanded data gathering to be performed by the use of the grounded theory method.
The constructed model enabled the content analysis of the collected texts' main themes, originating a set of memoranda per category, which this study identifies. We sought, specifically, to reveal an invisible dimension, as explained by Bardin: "From the moment the content analysis decides to codify its material, it must produce a system of categories. The categorization has as its first objective (in the same way as the documentary analysis) to provide, by condensation, a simplified representation of the raw data. (...) Content analysis implicitly rests on the belief that categorization (the passage from raw data to organized data) does not introduce deviations (by excess or by refusal) in the material, but that it makes invisible indices known, at the level of the raw data" [14] (p. 148-149, our translation).
On 5 April 2022, research at LISA-Library, Information Science Abstracts; LISTA-Library, Information Science & Technology Abstracts; and Scopus databases were carried out in order to collect bibliographic references containing the terms "FAIR Data Principles". This delimitation was applied due to the acronym "FAIR" being easily confused for the word "fair". For that matter, it seems easy to find papers that use the concept of "fair principles" but are not concerned with "FAIR Data Principles". Nevertheless, future research will take that into account, especially the uses of the concept of "fair principles" that are especially concerned with FAIR Data Principles.
As stated before, the search strategy defined for collecting the sample of scientific papers used for the document analysis considered the year 2016 and later as the chronological boundaries since that was the year when the "FAIR Guiding Principles for scientific data management and stewardship" were published [2]. The three databases were chosen because the author has authorized access to them. Only peer-reviewed documents with full text were considered so that they could be used in future work expanding this research. Table 1 identifies the following search strategies. A total of 41 articles were retrieved. The cleaning of duplicates (3) and the exclusion of articles that were not in English (5) were made. After reading the abstracts, one paper was excluded on account of the full text being behind a paywall. This resulted in 32 articles. Table A1 in the Appendix A presents the papers that were retrieved and analyzed regarding the present research.
It is reiterated that it was not the focus of this particular paper to make a thematic distinction of the texts in the sample or a typological delimitation as to the types of research presented in them. Therefore, in the collection and selection of texts, as well as in the content analysis, there was no distinction or typological or thematic analysis of the research presented in the studied texts.
As mentioned before, the analysis was performed by means of CCM, a qualitative analysis approach. During this analysis, incidents in the data were compared for the creation of pre-textual codes, such as the names of authors, their professional occupations, the countries and institutions where they were based, the scientific journals that published the sampled texts, and the year in which they were published, in a back-and-forth process. These codes led to the identification of categories (which are more abstract codes) that led to further analysis of the collected texts to saturate the data regarding the professional occupation and the entity type.
Although this effort is presented very briefly, it constitutes the central task of this research, both for the time it took to be executed and for allowing the verification of which main authors, entities, and scientific journals were linked to the research conducted and for defining the organizational logic of the data that are reflected in the descriptions provided in the results and discussion section.
Since this paper is part of qualitative research, the results produced at the current stage must be confirmed or tested by further research, gathering data from other databases and information sources to produce sound theoretical assumptions and as a form of quality control. In order to help the validation and quality control of this research, memoranda will be produced during the future analysis and interpretation of the information corpus resulting from expanded data gathering as a methodological tool of the grounded theory method.

Results and Discussion
The papers from the sample were analyzed, and the data were collected and codified. The resulting codes from this process allowed for a more abstract codification, which permitted the development of categories. These categories revealed the need to saturate the properties of each of these categories, which led to the search for information in other sources (such as academic and professional platforms, personal or institutional, and journal websites). This information allowed us to establish information about the authors, entities, and journals and structure the presentation of what emerged from the analysis, as shown below.

Distribution of Authors by Country
The data on the provenance of the authors and their professional occupation were extracted from analyzed texts that were accompanied by a biographical note, from the authors' biographical notes that were available on the publications' websites and/or websites of the professional associations that publish these publications, from the curricula vitae available on the websites of the higher education institutions where they carry out their academic activity, and from the curricula vitae available on the ORCID and LinkedIn platforms. Unless otherwise stated, the information refers to the moment of authorship of the text(s). The provenance considers the location of the authors at the time of their publications.
From the sample of thirty-three texts used for this part of the research, 128 authors were identified.
The authors' distribution data by country at the time of authorship of the texts are shown in Figure 1 and Table A2 (in the Appendix A). It is noticeable that the majority of authors are from the Netherlands (25%, a quarter of the authors), followed ex aequo by Germany and the United States (19.53%, almost a fifth of the authors), Ireland (7.03%, with nine authors), and the United Kingdom (6.25%, with eight authors). The list of countries with more than one author also includes France, Italy, Australia, Canada, China, Spain, and Switzerland. It should be noted that an author from the Netherlands was at the time working in the VASCERN European Reference Centre, an International Reference Network (Leo Schultze Kool). In the specific situations of the countries where only one text was published by a single author, we indicate a Czech university lecturer (Lenka Kourimska), a Japanese researcher (Kai Nishikawa), a Portuguese researcher (Isabel Castanheira), a Romanian researcher (Nastasia Belc), a Slovenian university lecturer and researcher (Nives Ogrinc) and a PhD student at two higher education institutions in the United States and France (Coline Ferrant).

Professional Occupation
The coding of the authors' professional occupation resulted in a set of categories, which include CEO, independent consultant, information professionals, IT professionals, management, professors, researchers, and students. The independent consultant category includes senior analyst contractors and science publishers who work independently. Information professionals (library and information science-LIS) refers to employees, directors, and coordinators who work as librarians, data stewards, data management consultants, research data services professionals, data librarians, repository managers, and open access service managers. IT professionals refer to any computer or software engineer and/or specialist, technical data services professional, software developer, scientific and technical officer (STO)-data management, and data analyst/coordinator. Management refers to professionals who perform activities such as communication manager, director of partnerships, director of biosciences, and manager of the technology's programs, mainly in a corporate context. Professors exercise, in some way, teaching activities in higher education institutions, while researchers are those who develop professional activities in the field of research. Finally, students refer to those individuals who were undertaking their PhD.
We also found that some authors accumulate more than one professional role, which leads to the development of specific categories for those cases. It is noticeable that the majority of authors are from the Netherlands (25%, a quarter of the authors), followed ex aequo by Germany and the United States (19.53%, almost a fifth of the authors), Ireland (7.03%, with nine authors), and the United Kingdom (6.25%, with eight authors). The list of countries with more than one author also includes France, Italy, Australia, Canada, China, Spain, and Switzerland. It should be noted that an author from the Netherlands was at the time working in the VASCERN European Reference Centre, an International Reference Network (Leo Schultze Kool). In the specific situations of the countries where only one text was published by a single author, we indicate a Czech university lecturer (Lenka Kourimska), a Japanese researcher (Kai Nishikawa), a Portuguese researcher (Isabel Castanheira), a Romanian researcher (Nastasia Belc), a Slovenian university lecturer and researcher (Nives Ogrinc) and a PhD student at two higher education institutions in the United States and France (Coline Ferrant).

Professional Occupation
The coding of the authors' professional occupation resulted in a set of categories, which include CEO, independent consultant, information professionals, IT professionals, management, professors, researchers, and students. The independent consultant category includes senior analyst contractors and science publishers who work independently. Information professionals (library and information science-LIS) refers to employees, directors, and coordinators who work as librarians, data stewards, data management consultants, research data services professionals, data librarians, repository managers, and open access service managers. IT professionals refer to any computer or software engineer and/or specialist, technical data services professional, software developer, scientific and technical officer (STO)-data management, and data analyst/coordinator. Management refers to professionals who perform activities such as communication manager, director of partnerships, director of biosciences, and manager of the technology's programs, mainly in a corporate context. Professors exercise, in some way, teaching activities in higher education institutions, while researchers are those who develop professional activities in the field of research. Finally, students refer to those individuals who were undertaking their PhD.
We also found that some authors accumulate more than one professional role, which leads to the development of specific categories for those cases.
Despite being attached to an organization (in this case, the Marine Institute, an Irish Governmental Agency), authors such as Caoimhín Kelly defined themselves as contractors and were therefore coded as belonging to the category of independent consultants.
Despite being attached to an organization (in this case, the Marine Institute, an Irish Governmental Agency), authors such as Caoimhín Kelly defined themselves as contractors and were therefore coded as belonging to the category of independent consultants.
The data concerning the professional occupation of the authors can be found in Figure 2 and Table A3 (in the Appendix A). These data allowed us to verify that 51 authors carried out research activities (39.85%), 12 of which were also professors (9.38%), one (Karl Presser) was a CEO, and another (Martijn Kersloot) a manager (0.78% each). However, within the academic universe, there were 34 higher education professors, corresponding to more than a quarter of the authors (26.56%), where, in addition to the previously indicated 12 authors who combined research activities (9.38% previously indicated), there were two who also worked as information professionals, Ayla Stein Kenfield and John J. Meier (1.56%). There were 20 full-time professors (15.62%). The existence of six students (4.69%) is also worth mentioning in this context.
More than a fifth of the authors (21.09%) were information (LIS) professionals. Apart from the two authors who combined such functions with university professorship (1.56%), 25 were exclusively involved in LIS activities (19.53%).
The fifteen IT professionals corresponded to more than a tenth of the authors (11.72%). There were five authors with management activities (3.91%), four of whom-Heather Staines, Linda van den Berg, Maryann E Martone and Merlijn N. van Rijswijk-were fulltime managers (3.13%). Likewise, the authors with CEO functions corresponded to four (3.13%): three in exclusive, Dominic Farace, Tiberius Ignat, and Wolfgang Colsman (2.34%); and Karl Presser, who already indicated that he also had a part in research activities (0.78%).

Distribution of Authors by Organizations and Entities
It is also important to know where the authors work and which type of organization. This information is provided in Table A4 in the Appendix A.
The coding process regarding the organizations where the authors carried out their activities required the identification of the parent entities where such organizations were integrated. There were many situations in which the entities were public, namely higher education organizations and research centers. However, it was decided to identify as governmental agencies only those organizations whose mandate specifically identified them These data allowed us to verify that 51 authors carried out research activities (39.85%), 12 of which were also professors (9.38%), one (Karl Presser) was a CEO, and another (Martijn Kersloot) a manager (0.78% each). However, within the academic universe, there were 34 higher education professors, corresponding to more than a quarter of the authors (26.56%), where, in addition to the previously indicated 12 authors who combined research activities (9.38% previously indicated), there were two who also worked as information professionals, Ayla Stein Kenfield and John J. Meier (1.56%). There were 20 full-time professors (15.62%). The existence of six students (4.69%) is also worth mentioning in this context.
More than a fifth of the authors (21.09%) were information (LIS) professionals. Apart from the two authors who combined such functions with university professorship (1.56%), 25 were exclusively involved in LIS activities (19.53%).
The fifteen IT professionals corresponded to more than a tenth of the authors (11.72%). There were five authors with management activities (3.91%), four of whom-Heather Staines, Linda van den Berg, Maryann E Martone and Merlijn N. van Rijswijk-were full-time managers (3.13%). Likewise, the authors with CEO functions corresponded to four (3.13%): three in exclusive, Dominic Farace, Tiberius Ignat, and Wolfgang Colsman (2.34%); and Karl Presser, who already indicated that he also had a part in research activities (0.78%).

Distribution of Authors by Organizations and Entities
It is also important to know where the authors work and which type of organization. This information is provided in Table A4 in the Appendix A.
The coding process regarding the organizations where the authors carried out their activities required the identification of the parent entities where such organizations were integrated. There were many situations in which the entities were public, namely higher education organizations and research centers. However, it was decided to identify as governmental agencies only those organizations whose mandate specifically identified them as such. Furthermore, it was deemed necessary to separate independent organizations from private organizations, given that the former refers to non-profit entities and the latter is more associated with the corporate world. Moreover, international reference networks are organizations created or funded by international entities to gather resources for research purposes (in this case, the VASCERN European Reference Centre).
Despite being attached to an organization (in this case, the Marine Institute, an Irish Governmental Agency), we reinforce that authors such as Caoimhín Kelly defined them-selves as contractors and were therefore coded as belonging to the category of an independent consultant.
In the data presented, it should be noted that Barend Mons was a lecturer at three entities, namely Leiden University, Dutch Techcentre for Life Sciences, and The Netherlands eScience Centre; Coline Ferrant was a PhD student at Northwestern University (USA) and at the Paris Institute of Political Studies (Sciences Po, in France); Hélène Prost was an IT professional (Information engineer) at the University of Lille III and at the Centre National de la Recherche Scientifique, both in France; Jaap Heringa was a researcher at the Dutch Techcentre for Life Sciences and a professor at the Vrije Universiteit (VU) Amsterdam, both in the Netherlands; Karl Presser was a researcher at the Swiss Federal Institute of Technology and CEO of Premotec GmbH, both in Switzerland; Leo Schultze Kool was a professor at the VASCERN European Reference Centre and Radboud University (the Netherlands); Luiz Olavo Bonino da Silva Santos was a professor at the Vrije Universiteit (VU) Amsterdam and at the Dutch Techcentre for Life Sciences, both in the Netherlands; Marc Rittberger was a professor at the Darmstadt University of Applied Sciences and a researcher at the Leibniz Institute for Research and Information in Education, both in Germany; Marco Roos was a professor at Leiden University and researcher at the Dutch Techcentre for Life Sciences, both in the Netherlands; Martijn Kersloot was a researcher at Vrije Universiteit (VU) Amsterdam, and a Manager (Product Owner Data and Innovation) at Castor EDC, Amsterdam, both in the Netherlands; and Renaud Fabre was a professor at the University of Paris VIII and a researcher at the Centre National de la Recherche Scientifique. Figure 3 provides information regarding the number of authors per type of entity.
as such. Furthermore, it was deemed necessary to separate independent organizations from private organizations, given that the former refers to non-profit entities and the latter is more associated with the corporate world. Moreover, international reference networks are organizations created or funded by international entities to gather resources for research purposes (in this case, the VASCERN European Reference Centre). Despite being attached to an organization (in this case, the Marine Institute, an Irish Governmental Agency), we reinforce that authors such as Caoimhín Kelly defined themselves as contractors and were therefore coded as belonging to the category of an independent consultant.
In the data presented, it should be noted that Barend Mons was a lecturer at three entities, namely Leiden University, Dutch Techcentre for Life Sciences, and The Netherlands eScience Centre; Coline Ferrant was a PhD student at Northwestern University (USA) and at the Paris Institute of Political Studies (Sciences Po, in France); Hélène Prost was an IT professional (Information engineer) at the University of Lille III and at the Centre National de la Recherche Scientifique, both in France; Jaap Heringa was a researcher at the Dutch Techcentre for Life Sciences and a professor at the Vrije Universiteit (VU) Amsterdam, both in the Netherlands; Karl Presser was a researcher at the Swiss Federal Institute of Technology and CEO of Premotec GmbH, both in Switzerland; Leo Schultze Kool was a professor at the VASCERN European Reference Centre and Radboud University (the Netherlands); Luiz Olavo Bonino da Silva Santos was a professor at the Vrije Universiteit (VU) Amsterdam and at the Dutch Techcentre for Life Sciences, both in the Netherlands; Marc Rittberger was a professor at the Darmstadt University of Applied Sciences and a researcher at the Leibniz Institute for Research and Information in Education, both in Germany; Marco Roos was a professor at Leiden University and researcher at the Dutch Techcentre for Life Sciences, both in the Netherlands; Martijn Kersloot was a researcher at Vrije Universiteit (VU) Amsterdam, and a Manager (Product Owner Data and Innovation) at Castor EDC, Amsterdam, both in the Netherlands; and Renaud Fabre was a professor at the University of Paris VIII and a researcher at the Centre National de la Recherche Scientifique. Figure 3 provides information regarding the number of authors per type of entity. Almost two-thirds of the authors were working in higher education institutions (60.28%), and slightly less than a quarter were working in research centers (23.40%). The entities with more referenced authors were, ex aequo, a higher education institution and a research center, respectively, the Delft University of Technology and the Dutch Techcentre for Life Sciences (6.38% of the referred authors each), both from the Netherlands. Government agencies and private organizations had nine authors each (6.38%), and Almost two-thirds of the authors were working in higher education institutions (60.28%), and slightly less than a quarter were working in research centers (23.40%). The entities with more referenced authors were, ex aequo, a higher education institution and a research center, respectively, the Delft University of Technology and the Dutch Techcentre for Life Sciences (6.38% of the referred authors each), both from the Netherlands. Government agencies and private organizations had nine authors each (6.38%), and independent organizations had two authors (1.42%). Two authors identified themselves as an independent consultant (1.42%). Only one type of entity has only one author, corresponding to the international reference networks (VASCERN European Reference Centre).
The information on the 68 entities where the addressed authors develop their activities, plus two independent consultants (Jan Velterop and Caoimhín Kelly), can be found in sponding to the international reference networks (VASCERN European Reference Centre).
The information on the 68 entities where the addressed authors develop their activities, plus two independent consultants (Jan Velterop and Caoimhín Kelly), can be found in  Figure 4 presents the number of organizations per type of entity.  These data show that almost two-thirds of the organizations are higher education institutions (62.86%), with research centers accounting for almost one-fifth of the organizations surveyed (17.14%). Private organizations represent less than one-tenth of the entities identified (8.57%). Government agencies include three organizations (4.29%). Two types of entities include two organizations or refer to two persons (the independent as an independent consultant (1.42%). Only one type of entity has only one author, corresponding to the international reference networks (VASCERN European Reference Centre).
The information on the 68 entities where the addressed authors develop their activities, plus two independent consultants (Jan Velterop and Caoimhín Kelly), can be found in  Figure 4 presents the number of organizations per type of entity.  These data show that almost two-thirds of the organizations are higher education institutions (62.86%), with research centers accounting for almost one-fifth of the organizations surveyed (17.14%). Private organizations represent less than one-tenth of the entities identified (8.57%). Government agencies include three organizations (4.29%). Two types of entities include two organizations or refer to two persons (the independent  These data show that almost two-thirds of the organizations are higher education institutions (62.86%), with research centers accounting for almost one-fifth of the organizations surveyed (17.14%). Private organizations represent less than one-tenth of the entities identified (8.57%). Government agencies include three organizations (4.29%). Two types of entities include two organizations or refer to two persons (the independent organizations and the independent consultant), and one type of entity refers to one organization (international reference network). Figure 5 presents the distribution of entity types per territorial-administrative area (countries and the European Union). This shows us that higher education institutions are also those that cover most of the territorial-administrative areas under consideration and are only not represented in the sample at the international level and by countries such as Portugal and Romania. They are followed by the research centers, which limit their representation within the sample studied to France, Germany, the Netherlands, Portugal, Slovenia, the United Kingdom, and the United States.
Only one type of entity is represented in only one country or territorial-administrative unit: the International Reference Network in the European Union.
Most of the entities retrieved from the sample belong to the United States (18.57%, just under a fifth of the entities), followed by the Netherlands (17.14%), Germany (15.71%, just over a sixth), France (8.57%, less than a tenth), and the United Kingdom (7.14%). Four countries have three entities listed (Australia, Ireland, Italy, and Switzerland), two countries have two entities listed (Canada and Spain), and seven countries or territorial entities have only one of the entities identified (China, Czechia, European Union, Japan, Portugal, Romania, and Slovenia).
Almost one-sixth of the organizations surveyed were higher education institutions from the United States (14.29%), followed by German and Dutch higher education institutions, which each account for one-tenth of the organizations surveyed (10% each), French higher education institutions (7.14%), and Australian higher education institutions (4.29%), together with German research centers (also 4.29%). Overall, there is a preponderance of higher education institutions from the European Union, which represent more than a third (35.71%) of the institutions surveyed, while this type of institution from the English-speaking countries represents less than a quarter of the total number of institutions (24.3%).
The Netherlands had the broadest coverage in terms of entity types since it does not only include independent consultants and international reference networks. It was followed in this respect by Germany, Ireland, the United Kingdom, and the United States, each with three different types of entity. Canada, France, Italy, Spain, and Switzerland each had two types of entities. Finally, Australia, China, Czechia, the European Union, Japan, Portugal, Romania, and Slovenia had only one type of entity in the sample analyzed.

Distribution of Papers by Journals and Years
The information on the distribution of analyzed texts by scientific journals is presented in Figure 6 and Table A6 (in the Appendix A).  Two journals account for a single author in the sample (International Journal of Information Management and Library Technology Reports).
The chronological distribution of the scientific texts in the sample is shown in Table 2 and Figure 7.  This information allows us to note that, within the defined chronological delimitation, only in 2016 and 2022 (up to 5 April, when the research and collection of texts were carried out) were there no results. Thus, the years between 2017 and 2021 are represented by at least two of the sample texts. 2020 presents itself as the year in which a greater number of texts were published (37.5%, more than a third of the sample), followed by the number of texts from 2018 (28.13%, more than a quarter of the texts), followed by the number of texts from 2019 (21.88%, more than a fifth of the texts). The years 2017 and 2021 hold, ex aequo, fewer texts in this sample (only two each).  This information allows us to note that, within the defined chronological delimitation, only in 2016 and 2022 (up to 5 April, when the research and collection of texts were carried out) were there no results. Thus, the years between 2017 and 2021 are represented by at least two of the sample texts. 2020 presents itself as the year in which a greater number of texts were published (37.5%, more than a third of the sample), followed by the number of texts from 2018 (28.13%, more than a quarter of the texts), followed by the number of texts from 2019 (21.88%, more than a fifth of the texts). The years 2017 and 2021 hold, ex aequo, fewer texts in this sample (only two each).  Figure 8 is reproduced on a greater scale in Figure A1, in the Appendix B, for better visualization.

Distribution of Authors by Journal and Place of Origin
(21.88%, more than a fifth of the texts). The years 2017 and 2021 hold, ex aequo, fewer texts in this sample (only two each).   A) show the data concerning the authors covered in the universe of publications used in this part of the study. Figure 8 is reproduced on a greater scale in Figure A1, in the Appendix B, for better visualization.  Information regarding journals by countries of origin of the authors of the articles is provided in Figure 9 and Table A8 in the Appendix A. Figure 9 is reproduced on a greater scale in Figure A2, in the Appendix B, for better visualization. We underlined that the provenance/geographical distribution of the authors considers the location of the authors at the time of their publications. The study does not assume their nationality.

Distribution of Authors by Journal and Place of Origin
This information shows that most of the articles from Studies in Health Technology and Informatics in the studied sample were written by authors located in the Netherlands (7.63%), including one article whose authorship was shared between eight authors located in Germany  Information regarding journals by countries of origin of the authors of the articles is provided in Figure 9 and Table A8 in the Appendix A. Figure 9 is reproduced on a greater scale in Figure A2, in the Appendix B, for better visualization. We underlined that the provenance/geographical distribution of the authors considers the location of the authors at the time of their publications. The study does not assume their nationality.

This information shows that most of the articles from Studies in Health Technology and
Informatics in the studied sample were written by authors located in the Netherlands (7.63%), including one article whose authorship was shared between eight authors located in Germany (Christian-Alexander Behrendt, Dennis Kadioglu, Fatlume Sadiku, Frank Ückert, Holger Storf, Jannik Shaaf, Jens Goebel, and Thomas O.F. Wagner) and two from the Netherlands (David van Enckevort and Marco Roos). The journal Frontiers in Chemistry appears in the sample with only one text whose authorship is shared by the greatest number of authors in the sample, in addition to being the one with the greatest number of authors from different geographical provenance. Thus, it has two authors located in Italy (Claudia Zoani and Giovanna Zappa) (1.53%), in addition to one author located in each of these countries: Czechia (Lenka Kourimska); France (Olivier F.X. Donard); Germany (Michael Rychlik); The Netherlands (Marga C. Ocké); Portugal (Isabel Castanheira); Romania (Nastasia Belc); Slovenia (Nives Ogrinc); Spain (Larraitz Añorga); and Switzerland (Karl Presser).
The journals Computers and Geosciences and Ecological Informatics share the largest number of authors from the same country (6.87% each), considering that each one is represented in the studied sample by a scientific paper. The authors who published in the first journal were located in Germany (Carsten Hoffmann, Kristian Senkler, M.A. Muqit Zoarder, Markus Stecker, Nikolai Svoboda, Philipp Gärtner, Udo Einspanier, Uwe Heinrich, and Xenia Specka), while those of the second journal was located in Ireland (Adam Leadbetter, Andrew Conway, Caoimhín Kelly, Deirdre Brophy, Elizabeth Tray, Elvira de Eyto, Niall Ó Maoiléidigh, Siobhan Moran, Will Meaney).
The journal LIBER Quarterly is represented in the sample by two papers, one authored by eight authors (6.11%) from the Netherlands (Esther Plomp, Heather Andrews Mancilla, Jasper van Dijck, Kees den Heijer, Marta Teperek, Robbert Eggermont, Shalini Kurapati, and Yasemin Turkyilmaz-van der Velden), and the other by an author residing in Canada (David Wilcox).
Another journal that corresponds to nine authors who published texts from the sample is Insights: The UKSG Journal. From this journal, the sample presents two papers published each by an author, where one comes from the United States (William H Walters) and another from Australia (Cameron Neylon). It also presents two articles, each published by two authors, where one is authored by two authors from the United States (Heather Staines and Maryann E Martone), and the other has shared authorship by authors located in the United Kingdom (Paul Ayris) and Switzerland (Tiberius Ignat). It also presents another paper that shares authorship between two authors from the United Kingdom (Rosie Higman and Sarah Jones) and one from Germany (Daniel Bangert).
Regarding the other journals in the sample, seven other journals were identified as sharing authors from different countries. Thus, Information Services & Use has only one text written by three authors located in the Netherlands (Barend Mons, Luiz Olavo Bonino da Silva Santos, Michel Dumontier), one from Australia (Cameron Neylon), one resident in Spain (Mark D. Wilkinson), and another from the United Kingdom (Jan Velterop). Knowledge Organization presents a text by two authors from France (Hélène Prost and Joachim Schöpfel), one from Italy (Antonella Zane), and one from the Netherlands (Dominic Farace). The ASLIB Journal of Information Management presents a text by three authors located in Germany (Christoph Schindler, Julian Hocker and Marc Rittberger) and another text by an author located in Japan (Kai Nishikawa). From Data Technologies and Applications comes a text published by three authors living in France (Francis Andre, Joachim Schöpfel, and Renaud Fabre) and a PhD student who shares his location between France and the United States (Coline Ferrant). The Journal of Medical Internet Research is represented by one text from three authors located in Germany (Atinkut Alamirrew Zeleke, Dagmar Waltemath, and Esther Thea Inau) and one from the United States (Jean Sack). Library Trends also features a text shared by three authors from China (Jie Hu, Jilong Zhang, and Shenqin Yin) and one located in Australia (Menghao Jia). Moreover, the Journal of New Music Research presents a text by an author located in France (Francesca Frontini) and another one in Italy (Silvia Calamai).
In addition to these, there are journals with texts in the sample shared only by authors located in the same country. The United States presents the largest number of scientific journals that, in the sample, appear with texts published by authors located in that country. Within this scope, Information Technology and Libraries present a text by seven authors located in the same country (Guillaume Viger, Joseph P.

Most Frequent Authors: Production and Profile
The information about the authors with more than one scientific paper in the used sample in this study is shown in Table 3. Table 3. Authors with more than one paper in the used sample. The data allow us to perceive that the authors with more scientific papers in the sample used in this study are Barend Mons, Cameron Neylon, Joachim Schöpfel, and Marco Roos, who appear with two articles each. From this group of authors, it can be noted that Barend Mons and Cameron Neylon share the same article, which has a total of six authors. The other paper with Barend Mons' participation has a total of eight authors, while Cameron Neylon is the sole author of her second paper. Each of Joachim Schöpfel's articles has a total of four authors, while Marco Roos accounts for one article written by ten authors and another by nine authors. Moreover, Marco Roos' papers were published in the same journal (Studies in Health Technology and Informatics). Figure 10 presents the number of scientific papers from the sample, distributed by the number of authors.
The research resulted in Barend Mons [40] obtaining a PhD in molecular biology from Leiden University in 1986. His research focuses on malaria, in close collaboration with endemic countries, and computer-assisted knowledge discovery. He was part (as an expert) of the INCO-DC European Commission program (1993)(1994)(1995)(1996) and the Netherlands Organisation for Scientific Research (NWO 1966(NWO -1999. The author also co-founded several spin-off companies, such as the Biosemantics group. Currently, he is a professor in biosemantics at the Leiden University Medical Center. He was also Head of ELIXIR-NL at the Dutch Techcentre for Life Sciences (until 2015), Integrator Life Sciences at the Netherlands eScience Center, and board member of the Leiden Centre of Data Science. He was one of the authors that, in 2014, initiated the FAIR data initiative and, in the following year, was appointed Chair of the European Commission's High-Level Expert Group for the "European Open Science Cloud" until 2016. Currently, Barend is an ambassador of GO FAIR and co-founder of the GO FAIR initiative and was elected President of the Executive Committee of CODATA.
Cameron Neylon's [41] earlier research focused on structural biology and biophysics and on researchers' culture, the political economy of research institutions, and how these interact, and collide with, the changing technological environments. He is currently a professor of research communication at Curtin University, where he co-leads the Curtin Open Knowledge Initiative, a project examining the future of universities in a networked world. He is also a director of KU research and an advocate of open research practice who has worked in research and support areas, including chemistry, advocacy, policy, technology, publishing, political economy, and cultural studies. He was a contributor to the Panton Overall, the sample shows one paper with eleven authors [15], one with ten authors [16], three with nine authors [17][18][19], two with eight authors [8,20], one with seven authors [21], one with six authors [3], seven with four authors [7,9,[22][23][24][25][26], two with three authors [4,27], seven with two authors [10,[28][29][30][31][32][33], and seven with one author (referring to the following authors: Cameron Neylon [1]; Bohyun Kim [34]; William H Walters [35]; Matthew I. Bellgard [36]; Kai Nishikawa [37]; Ayla Stein Kenfield [38]; and David Wilcox [39]).
The research resulted in Barend Mons [40] obtaining a PhD in molecular biology from Leiden University in 1986. His research focuses on malaria, in close collaboration with endemic countries, and computer-assisted knowledge discovery. He was part (as an expert) of the INCO-DC European Commission program (1993)(1994)(1995)(1996) and the Netherlands Organisation for Scientific Research (NWO 1966(NWO -1999. The author also co-founded several spin-off companies, such as the Biosemantics group. Currently, he is a professor in biosemantics at the Leiden University Medical Center. He was also Head of ELIXIR-NL at the Dutch Techcentre for Life Sciences (until 2015), Integrator Life Sciences at the Netherlands eScience Center, and board member of the Leiden Centre of Data Science. He was one of the authors that, in 2014, initiated the FAIR data initiative and, in the following year, was appointed Chair of the European Commission's High-Level Expert Group for the "European Open Science Cloud" until 2016. Currently, Barend is an ambassador of GO FAIR and co-founder of the GO FAIR initiative and was elected President of the Executive Committee of CODATA.
Cameron Neylon's [41] earlier research focused on structural biology and biophysics and on researchers' culture, the political economy of research institutions, and how these interact, and collide with, the changing technological environments. He is currently a professor of research communication at Curtin University, where he co-leads the Curtin Open Knowledge Initiative, a project examining the future of universities in a networked world. He is also a director of KU research and an advocate of open research practice who has worked in research and support areas, including chemistry, advocacy, policy, technology, publishing, political economy, and cultural studies. He was a contributor to the Panton Principles for Open Data, the Principles for Open Scholarly Infrastructure, the altmetrics manifesto, founding board member, and past president of FORCE11 and served on the boards and advisory boards of organizations such as Impact Story, Crossref, altmetric.com, OpenAIRE, the LSE Impact Blog, and various editorial boards. His previous positions include Advocacy Director at PLOS, Senior Scientist (Biological Sciences) at the STFC, and tenured faculty at the University of Southampton.
Joachim Schöpfel [42] holds a PhD in psychology from the University of Hamburg and is a lecturer in information and communication sciences at the University of Lille and a member of the GERiiCO laboratory. He is interested in scientific communication, in particular in open science and grey literature, and in the evolution of the functions, professions, and institutions of scientific and technical information. His current projects focus on the use of digital resources in different contexts, on the link between informational practices and scientific production, on the evolution of scientific information systems and the link with research infrastructures and systems, on the legal aspects of scientific communication, and on the development of libraries and documentary services. He directed the UFR IDIST from 2009 to 2012, was director of the Atelier National de la Reproduction des Thèses from 2012 to 2018, and is responsible for the first year of the Master Information Documentation in the SID department. He is an independent consultant and partner of Ourouk, Paris.
Marco Roos [43][44][45] is an advocate of FAIR Data Principles and Linked Data to create a powerful substrate and a robust worldwide infrastructure for knowledge discovery across heterogeneous data distributed over institutes and countries. His earlier scientific interest was in biology, regarding the role of chromatin in the functioning of the cell, to bridge between genotype and phenotype using data linking techniques and data science. After including computer science subjects in his MSc in molecular biology, Marco worked as a multidisciplinary researcher in research groups of life science and computer science. Currently, his research focuses on state-of-the-art computer science applied to enhancing biomedical research, particularly for rare human diseases, and with knowledge discovery and data linking techniques. As group leader of the Biosemantics research group of Prof. Dr. Barend Mons, LUMC, he leads the research, development, and application of knowledge discovery methods for human genetics research. He co-leads the rare disease community of the European life science data infrastructure ELIXIR, FAIR Data Principles at source' activities in the European Joint Program Rare Diseases, and initiated the Rare Diseases Global Open FAIR implementation network.
This allowed us to perceive that the most prominent authors have a career as professors in higher education and that the original scientific area of most of them is Biology, except for Joachim Schöpfel, whose scientific area of origin is Psychology.
As stated earlier, since this paper is part of qualitative research, the results presented at the current stage of research must be confirmed or tested by further research, gathering data from other databases and information sources, to produce sound theoretical assumptions, and as a form of quality control.
This research takes into consideration the dynamic dimension of the phenomenon under study. The need for continued research to ensure that the developments concerning this phenomenon are captured is evident. This implies the periodic repetition of the same research, which will allow comparison with the current dataset and its updating.

Conclusions
This paper is part of developing research focused on identifying and analyzing, in a comparative way, the main programs and projects regarding, or making use, of FAIR Data Principles at a worldwide level; identifying the main actors and contrasting their perceptions and meanings about the said principles; distinguishing the proposals and solutions that will emerge from the analysis of perceptions and meaning related to the said principles. The intended future results of this research are a critical and trend-based theoretical construction of the examined literature. This might allow us to formulate recommendations for the use of FAIR Data Principles, in addition to showing possible consensuses and dissents, uncertainties, and certainties behind what is perceived of the said principles and their uses.
Our intention was, within the scope of cutting-edge international scientific literature, to identify the main authors, entities, and scientific journals linked to research conducted, with a special emphasis on the FAIR Data Principles since its inception. This allowed the establishment of the general scenario in which research dedicated to or adopting FAIR Principles takes place. Within this scope, the study of the researchers' profiles affords better awareness of the original contexts of those who develop this type of research. For such purpose, we defined questions regarding the authors, their locations, and professional occupation; the organizations/entities where the authors perform their professional and research activities, including the place where such entities are based; the journals where and when the authors' research was published; and the most prominent authors in terms of authorship. This research was developed by means of a qualitative approach, using documentary research and a constant comparison method for codification and categorization of the sampled data extracted from a final set of 32 documents.
In conclusion, it can be stated that, with regard to the authors in the sample, the majority are located in the Netherlands, that the European continent (including the United Kingdom) accounts for more than 70% of the authors, and that the English-speaking countries (including Ireland) comprise just over one-third of the authors discussed in this paper. Four Asian authors are also noted (three from China and one from Japan). The only authors located in the Southern Hemisphere are based in Australia.
Most of these authors are researchers, followed by information (LIS) professionals, and thirdly, the professors in higher education institutions (a quarter of the authors in the sample). Less than five percent of the authors are students.
There are also four CEOs, two independent consultants, and five authors that perform management roles.
More than half of the authors in the sample work in a higher education institution (either professionally or as students), with the institutions with more authors in this sample being, ex aequo, the Delft University of Technology and the Dutch Techcentre for Life Sciences. In addition to the research centers, private organizations, and governmental agencies, there are types of entities represented by two authors (Independent Organisations) and one author (International Reference Network, being the VASCERN Europe-a Reference Centre). There are also two independent consultants (Jan Velterop and Caoimhín Kelly).
Since more than half of the organizations with which the authors of the sample are associated are higher education institutions, these cover most of the territorial-administrative areas under consideration. Research centers account for more than one-sixth of the countries in the sample. All types of entities exist in more than one country, including the International Reference Network, as it is an international entity.
The preponderance of the United States, countrywide, can be seen in the number of entities to which this sample refers, with 70% of the entities being from the European world region (58.57% from the European Union) and the English-speaking countries (including Ireland) accounting for just over a third (37.14%). In the case of the higher education institutions considered in the sample, the United States is predominant, and the European Union countries represent more than a third (35.71%) of this type of institution. In the case of the English-speaking countries, there is less than a quarter (24.3%) of the organizations in the sample. The Netherlands appears as the country with the broadest range of entity types in the sample.
At the publication level, it can be seen that the journal where more texts of the sample were published was Insights: The UKSG Journal, and most of the authors in the sample were published in Studies in Health Technology and Informatics. Most of the articles in the sample published in this journal originate from authors from the Netherlands, followed by authors located in Germany. In addition to Studies in Health Technology and Informatics, only Frontiers in Chemistry has articles written by more than ten authors in the sample. This last journal appears in the sample with only one text whose authorship is shared by the greatest number of authors in the sample, in addition to being the one with the greatest number of authors from different geographical provenance (Czechia, France, Germany, Italy, The Netherlands, Portugal, Romania, Slovenia, Spain, and Switzerland). Finally, the most prominent authors in terms of authorship in the sample texts are Barend Mons and Marco Roos, both from the Netherlands, Cameron Neylon (Australia), and Joachim Schöpfel (France). Although delimited by the sample, this allows us to identify trends that, in further research, may be contrasted with data collected from a larger number of sources and with broader research criteria. In this case, it was perceived that the most prominent authors have a university professor career and that the scientific area of origin of most of them is Biology, except Joachim Schöpfel, whose scientific area of origin is Psychology.
This research takes into consideration the dynamic dimension of the phenomenon under study. The need for continued research to ensure that the developments concerning this phenomenon are captured is evident. This implies the periodic repetition of the same research, which will allow comparison with the current dataset and its updating.
In future work, we will consider expanding the data gathering from other academic journals, conference proceedings, reports, and theses from other databases and collections as sources to verify and compare with the results obtained in the present study. It is also intended to, by means of a trend analysis of the specific scientific literature, identify projects, initiatives, and programs of international expression on the FAIR Data Principles.
Moreover, it is intended to proceed to identify the authors and their thoughts, addressing the discussions, perceptions, and meanings that are carried out in and the around this phenomenon. This will allow us to identify and analyze, in a comparative way, the main programs and projects regarding, or making use, of FAIR Data Principles at a worldwide level; identify the main actors (also as a way of validation of the results brought out by this paper) and contrast their perceptions and meanings about the said principles; and distinguish the proposals and solutions that will emerge from the analysis of perceptions and meaning related to the said principles. Since the current paper makes use of the CCM, the future analysis and interpretation of the information corpus resulting from expanded data gathering will be performed by the use of the Grounded Theory Method. Following the particular nature of this methodology, the intended results are a critical and trend-based theoretical construction of the examined literature. This might allow the formulation of recommendations for the use of FAIR Data Principles, besides showing possible consensuses and dissents, uncertainties, and certainties behind what is perceived as the said principles and their uses.
We will also consider guiding the documentary research into developing an analysis that will allow coding and categorization to discern thematic or typological distinctions regarding the research presented by the studied texts. This will also take into account the uses of the concept of "fair principles" that are especially concerned with FAIR Data Principles.
Furthermore, future research will look into the scientific areas of origin of the researchers (those already found in this paper and others that will be presented with the expanded data gathering). This will allow us to check the hypothesis that most of the authors who engage in this type of research have biology as their scientific area of origin, as it was perceived by the analysis of the researchers' profiles of the most prominent authors in terms of authorship in the sample texts.
The main limitation of this study is concerned with the amount of data retrieved and the time needed for a deeper analysis, as this theme is already well documented in the scientific literature. Nevertheless, it should be addressed that there were constraints with the collection of the full texts of scientific papers due to the fact that most periodical publications are not freely accessible and are not part of the publishers' contractual packages with the institutions to which we belong. This matter is intrinsically linked to the question of open science and affects the way research can be conducted.  Tables   Table A1. List of papers analyzed.        Table A8. Geographical distribution of authors by journal.

Journals/Provenance of Authors Number of Authors % Author
Applications in Plant Sciences 4 3.05% USA 4 3.05%    Figure A2. Geographical distribution of authors by journal.