A Comprehensive Dataset of the Spanish Research Output and Its Associated Social Media and Altmetric Mentions (2016–2020)

This paper presents data on research publications authored by scientists affiliated with Spanish institutions between 2016 and 2020, along with their associated social media and altmetric mentions, and on researchers affiliated with Spanish institutions whose work is highly mentioned on social media and non-academic outlets. The first dataset contains 219,988 records and 24 attributes. Each observation represents a scientific publication (article, review or letter) extracted from the Web of Science database. For each record, we provide bibliographic metadata, its subject area and a battery of altmetric indicators extracted from Altmetric.com. The second dataset includes 4209 records and four attributes. Each record corresponds to a researcher. For each record, we include their full name, an author identifier (ORCID), their affiliation and their list of publications connecting to the first dataset. Dataset: https://doi.org/10.6084/m9.figshare.19204686. Dataset License: CC-BY.


Summary
Altmetrics, alternative metrics based on mentions of scientific publications in social media [1], have been proposed for research evaluation [2]. They still have a long way to go, as there are several limitations attributed to their use [3,4]. However, they offer a different perspective to that offered by citations and can potentially inform on scientific literature consumption beyond academia. The research project "InfluScience-Scientists with social influence: a model to measure knowledge transfer in the digital society" (https://influscience.eu/, accessed on 11 March 2022) was launched with the aim to explore the potential of altmetrics, and to study the social influence of Spanish researchers.
As a result of this project, two datasets have been generated in tab-separated values (tsv) format. The first one includes scientific publications authored by scientists affiliated with Spanish institutions between 2016 and 2020 that were retrieved from Web of Science and InCites, thematically classified, and with their altmetric mentions retrieved from Altmetric.com. The most influential Spanish researchers are included in the second one.
Using statistical methods, Spanish scientific activity and the attention received in social media can be studied to identify patterns, trends and distributions within the different metrics, differentiating by scientific areas. These data are of interest to researchers working on scientometrics, altmetrics, science of science or science communication interested on analyzing bibliometric and altmetric production at the macro level.

Methods
Data were collected from Web of Science, InCites and Altmetric.com on the 3 March 2021. We first downloaded the publications (articles, editorial material, letters, and proceedings papers) in which an author with Spanish affiliation is listed published between 2016 and 2020 from Web After reassigning multidisciplinary publications to specific categories, these were classified into the 22 research fields included in the Essential Science Indicators (ESI). We created an equivalence scheme in which each of the 254 subject categories from Web of Science were matched with one ESI field [5]. This classification was conducted following the schema proposed by Tan [6]. Subject categories included in the A&HCI are not integrated into the ESI classification, so we included arts and humanities as an extra research field.
To retrieve altmetric mentions, we used the Digital Object Identifier (DOI) assigned to each publication to query Altmetric.com and obtain all tracked publications and their mentions. From 406,621 records that included a DOI (93.51% of the total), 238,508 were indexed in Altmetric.com (54.85% of the total).

Publications Dataset
The first dataset contains scientific publications in which there is at least one author with Spanish affiliation who was mentioned at least once in the Altmetric.com database in the 2016-2020 period. It includes 219,988 records, each being a publication, and 25 variables, including bibliographic information and mentions received. Each publication is assigned to a subject area based on the ESI schema (Essential Science Indicators) provided by Clarivate. Additionally, we include the Arts and Humanities subject area. In total, 18 variables including altmetric indicators are provided: an aggregated score (Influscore) and 17 indicators corresponding to different social and non-academic sources (e.g., Twitter mentions, Facebook mentions, news media). The Influscore is the Altmetric Attention Score (AAS) [7] provided by Altmetric.com on 3 March 2021. These variables are detailed in Table 1. Figure 1 shows the volume of publications retrieved with and without altmetric mentions differentiated by ESI field to reflect the coverage of the dataset provided.

Authors Dataset
The second dataset includes the top 250 most influential authors at the general level and for each of the ESI fields based on their Influscore. Their information has been reviewed and normalized to produce the Influscience ranking (https://ranking.in-

Authors Dataset
The second dataset includes the top 250 most influential authors at the general level and for each of the ESI fields based on their Influscore. Their information has been reviewed and normalized to produce the Influscience ranking (https://ranking.influscience.eu/, accessed on 11 March 2022). For the disambiguation of authors and institutions, we used the algorithm proposed by Caron and van Eck [8]. The dataset is composed of a total of 4209 observations, each being a researcher affiliated to a Spanish institution, and four variables including bio data and linking to the publication dataset. The variables of the second dataset are detailed in Table 2.  Informed Consent Statement: Not applicable.

Data Availability Statement:
The dataset is openly available in figshare at https://doi.org/10.6084/ m9.figshare.19204686. It does not include any data directly extracted from any Clarivate platform (e.g., Web of Science) as they have only been used as intermediary sources to identify publications authored by Spanish researchers and then to recover their mentions from Altmetric.com. In this sense, the authors' names of the publications have been normalized and clustered and a new paper subject classification has been implemented. With regard to Altmetric.com, a licensing agreement allows us to share this dataset.

Conflicts of Interest:
The authors declare no conflict of interest.