Representation of Slovak Research Information (A Case Study)

: In the light of the increasing importance of the societal impact of research, this article attempts to address the question as to how social sciences and humanities (SSH) research outputs from 2019 are represented in Slovak research portfolios in comparison with those of the EU-28 and the world. The data used for the analysis originate from the R&D SK CRIS and bibliographic Central Register of Publication Activities (CREP ˇC) national databases, and WoS Core Collection/InCites. The research data were appropriate for the analysis at the time they were structured, on the national level; of high quality and consistency; and covering as many components as possible and in mutual relations. The data resources should enable the research outputs to be assigned to research categories. The analysis prompts the conclusion that social sciences and humanities research outputs in Slovakia in 2019 are appropriately represented and in general show an increasing trend. This can be documented by the proportion represented by the SSH research projects and other entities involved in the overall Slovak research outputs, and even the higher ratio of SSH research publications in comparison with the EU-28 and the world. Recommendations of a technical character include research data management, data quality, and the integration of individual systems and available analytical tools.


Introduction
The information system on research, development and innovation at the national level has been operated in Slovakia for more than five years. This system is a part of information support to science at the national level, which is provided by the Slovak Centre of Scientific and Technical Information. Work with the system has changed over time. Initially, the data were consolidated, migrated from an older version of the software, and at the same time a methodology was prepared for data collection. After some time, the Ministry of Education, Science, Research and Sport and other central state administration bodies began increasingly demanding more and more information on research outputs from the system. The data have served mainly to substantially contribute to annual reports on research in Slovakia issued by the Ministry of Education, Science, Research and Sport. This current study stems from these previous reports from 2017 [1] and 2018 [2] in some respect, but the research presented is conducted in a much broader scope and includes more aspects; it attempts to identify the available data sources, indicate what data are monitored in these resources and point out interpretation options provided by the processed data. During the processing of data from 2019 [3], the basis for the selection of indicators and a proposal for the procedure developed as part of this research was created. It was tested and used for the first time on the example of data from the social sciences and humanities.
In the Slovak national portfolio of research activities, social sciences and humanities are often relegated to the margins. Slovak national science policies are mainly focused on developments in the natural and technical sciences. The priorities of the state science policy of Slovakia [4] largely follow pan-European trends. However, it should be emphasised that the preferred scientific disciplines such as materials research and nanotechnology, information and communication technologies, biotechnology and biomedicine, agriculture and the environment, sustainable energy and energy all bear an inherent societal impact.
Here the question arises as to how well the social sciences and humanities are represented in the Slovak research portfolio in comparison with other sciences. To answer this question, an analysis of the available data on entities, activities and results of research and development for 2019 was carried out. The year 2019 was selected in order to work with the most complete, up-to-date data. The year 2020 was not suitable since, at the time of preparation of this article, the data in the databases were incomplete.

Materials and Methods
The theoretical background on which we based our study is quite broad. Current research information systems (CRIS) are relatively powerful tools, supporting the mutual communication of the research community [5] and the formation of scientific policy [6]. They are intended for the large target group to which belong researchers, research managers, research strategists, publication editors, intermediaries and those responsible for technology transfer, as well as the media and public. CRIS are closely associated with the relational data format CERIF (Common European Research Information Format) [7], standard for managing and exchanging research data [8]. Regarding the methods of processing research results and its main category publications, bibliometrics methods and tools are used, mainly applied in relation to the data recorded in the CRIS [9].
The operation of systems and databases containing research information enables evaluation of science and support of open science [10], and influences target groups and their behaviour in the online space. It helps to provide information support to the work of communities in the online space [11] and to assess the impact of social networking services [12]. However, quality data management should be applied in the processing of research information with an emphasis on their completeness, timeliness and quality [13].
Regarding the practical implementation of the analysis of research information at the national level, we relied primarily on our own experience and the data we had available [1][2][3].
In our analysis, the following procedures were applied: 1. The first step was to identify the R&D data affiliated to authors with an address in Slovakia and the resources of the R&D data.

2.
In the second step, an analysis of data usability was performed as being fit-for-purpose. A data source can be considered suitable for determining the representation of individual scientific disciplines when it meets several criteria: contains structured data on research and development at the national level; contains data of sufficient quality and consistency; monitors R&D data as widely as possible and in interrelationships; and classifies R&D data according to scientific disciplines.

3.
At first glance, locating data may appear to be a simple step. However, a variety of obstacles must be overcome when formulating search requirements-for example, ambiguity in data entry or inclusion, incomplete data, search interface properties, or missing tools.

4.
Evaluation and interpretation of the findings, including evaluation of the whole process and proposals for improving R&D data-processing.

Data and Their Sources
A comprehensive picture of all aspects resulting from R&D activities is usually provided by the current research information system (CRIS). The scope and structure of the data to be registered in CRIS is determined by the methodology and standards for research information. This is primarily the CERIF format [14], supported by the European Commission, which is being developed by the international research information organisation euroCRIS.
The basic entities (objects) of the CRIS system are: The CERIF relational data model enables the interconnection of registered data objects using linking entities. Linking entities connect basic system entities with each other: cfProj_Pers, cfPers_OrgUnit, cfProj_OrgUnit.
Hence it is possible to discover, for example, how many projects a particular researcher participated in, but also aggregated data such as how many organisations and researchers resolved projects over a selected period of time and/or in specific scientific areas or geographic regions. Another possible source of data is international or national bibliographic databases. To be able to determine the representation of individual scientific disciplines in the total number of R&D outputs, information on the scientific disciplines of publications must also be provided. To map the situation in Slovakia, it was necessary to combine several sources.
In the SK CRIS on research, development and innovation [15], research projects, research organisations and researchers at the national level and in the science and technology research areas are registered.
The CERIF data format also contains research results in three subcategories: publications, patents and products. However, not all of these subcategories are adequately covered in the Slovak Current Research Information System (SK CRIS) and, moreover, this system does not identify the basic research areas of science and technology with which the research results are associated.
In this analysis, the publications, with Slovakia as affiliation and published in 2019, were selected as the most appropriate research outputs with structured data and with research areas assigned in a content-related manner of subject classification. The data originated from the international database Web of Science Core Collection and the national Central Register of Publishing Activity (CREPČ) of Slovak universities [16].
The data sources used are characterised as follows: The Slovak Current Research Information System (SK CRIS), as part of the Central Information Portal for Research, Development and Innovation (CIP RDI), has the status of an information system of public administration, defined by Act 275/2006 on information systems of public administration. The Slovak Centre of Scientific and Technical Information (SCSTI) is responsible for operating, maintaining and providing technical support to this system on behalf of the Ministry of Education, Science, Research and Sport of the Slovak Republic. SK CRIS uses the CERIF 1.3 data model, put into operation in 2013.
The SK CRIS information system integrates data on research in Slovakia from four registers: of R&D organisations, R&D projects, researchers and R&D results. This system is also deployed in the administration of the assessment of competence of organisations to perform R&D, as well as the supplementary statistical survey of the research and development potential for the Ministry of Education, Science, Research [17] and Sport of the Slovak Republic. As on 31 December 2019, the total numbers of structured data registered in the SK CRIS information system were as follows: 19,452 R&D projects 448 calls for R&D projects proposals 2292 R&D organisations 32,705 researchers 426,169 R&D results The operational aims of SK CRIS may be divided into four groups:

1.
To present information and new knowledge and the sharing of both between scientists and researchers in an electronic environment.

2.
To accelerate the implementation of knowledge in practice, to inform enterprises and business about research results and to involve them in the application of this knowledge to practice.

3.
To gain general support for science and research. 4.
To make science, research and their practical results publicly available in a comprehensible and attractive way to the public. These activities should positively influence the public perception of science and the use of public funding [18].
The Central Register of Publication Activity (CREPČ) was established in 2007. It represents a bibliographic database of publishing activities of 37 public, state and private universities in the Slovak Republic. [19] To date, the CREPČ database has accumulated a total of more than half a million publications.
In 2019, 32 Slovak universities contributed to the register and published 42,088 publications of various document types included in this register [20]. These publications are divided into 84 categories according to the type of documents published, i.e., journal articles, books, anthologies/edited books, book chapters, conference proceedings, and doctoral theses [21]. The categories are highly detailed; however, for simplicity, the following aggregated groups are often used: The CREPČ is deployed in particular to evaluate the scientific outputs of universities and their faculties. Publication activities represent one of several criteria for allocation of the public funding to higher education institutions. Like SK CRIS, CREPČ is also operated by SCSTI.
Data from the Slovak Academy of Sciences are not collected because the academy is not obliged to contribute to the CREPČ: the publications are registered in its own system based on almost the same rules as those of the CREPČ system [22]. However, no scientific disciplines are registered in this system. However, no scientific disciplines are registered in this system V tomto systéme však nie sú zaregistrované žiadne vedecké disciplíny. Avšak v tomto systéme nie sú registrované veterinárne vedy. Úplné výsledky sa nepodarilo načítat' Skúsit' znova Opakuje sa pokus . . . Opakuje sa pokus . . .

Web of Science Core Collection (WoS) is a well-established bibliographical database
and citation index on the Web of Science platform that covers the core of the global scientific and scholarly literature. The collection covers over 21,000 peer-reviewed, high-quality scholarly journals, which also include open-access journals, published globally in more than 250 scientific disciplines grouped into Science Citation Index Expanded, Social Science Citation Index and Arts & Humanities Citation Index. Conference proceedings and book data are also included in Conference Proceedings Citation and Index Book Citation Index. The journals in the collection are indexed cover-to-cover. Each paper in the collection includes all the authors, their affiliations, the abstract and keywords (if provided by the author), funding acknowledgements, including agency and grant numbers (if provided) and all the cited references. When possible, additional metadata are provided such as the ORCIDidentifiers (https://orcid.org/ (accessed on 16 March 2021)), more funding data from other resources (e.g., Medline, researchfish), unified institution names of over 5000 institutions globally in order to expedite the process of searching for an institution's total publication output.

Data Usability
The applicability of data to an analysis is determined by general assumptions, mainly by structure, data coverage at the national level and their quality and interoperability.
Here, the authors wish to focus attention on the scientific disciplines in relation to the subjects, activities and results of research.
Each of the data sources used applies a different classification of the scientific disciplines. The basic groups of R&D fields, as they are specified in the Frascati manual (OECD) [23], form one of the classification schemes used in SK CRIS in the description of basic entities (R&D projects, R&D organisations and researchers). For specification of the scientific disciplines in the SK CRIS system, the three-level code list of R&D fields [24] is used. The first level is identical with the basic fields in the Frascati manual. The second level differs slightly from the fields in the Frascati manual in some items, but this does not apply to the social sciences or humanities. The third level of the code list specifies scientific disciplines on the national level with higher granularity than the Frascati manual.
An example of three levels from the Slovak/SK CRIS code list of R&D fields is provided by way of illustration:

Social sciences > Psychological sciences > Clinical psychology
The full code list of R&D fields deployed in SK CRIS is given in Appendix A in A1 for social sciences and in Appendix A, A2 for humanities.
Classification of research areas are also used in CREPČ. The CREPČ register uses a classification based on 25 research areas for accreditation of universities [25]. The list includes 8 areas related to the social sciences and humanities (Appendix A, A3). However, the relatively small number of categories does not afford certainty as to whether publications included in a particular category can be clearly assigned to the respective scientific fields. The code list of R&D fields deployed in CREPČ is given in Appendix A in A3.
The WoS database uses 254 categories, of which 58 categories are assigned to the Social Sciences (Appendix A, A4) and 28 categories to the Arts and Humanities (Appendix A, A5). With the InCites tool, it is also possible to use a variety of different classification schemes of research areas to re-classify publications in the dataset (the data from WoS Core Collection platform or elsewhere). The first and slightly modified second level of the Frascati manual classification scheme are also available.
Disparity in the categorisation of scientific disciplines in the data resources under study did not, in principle, interfere with the analysis, as it was possible to identify social sciences and humanities (SSH) categories in each of the classification schemes listed in Appendix A. A small number of categories in the scheme used in CREPČ (Appendix A, A3) can negatively affect the accuracy of the analysis. For example, due to their multidisciplinary and interdisciplinary character, some publications can be assigned to different categories, e.g., Transportation and Security to both Social Sciences and Engineering and Technology depending on the context of a particular publication.

Data Search
The functionality of commonly available user interfaces of software applications (SK CRIS, CREPČ, WoS), WoS Core Collection database and InCites B&A, a specialised tool for benchmarking and analysis using WoS Core Collection data, as well as Microsoft SQL server tools (SK CRIS) were used to search for data.
A sample search for projects in the MS SQL Management Studio environment is shown in Figure 1. Figure 2 shows the search interface of SK CRIS.

Results
The results of the analysis can be divided by the basic entities (objects) of the CERIF format into four basic groups, including mutual connections by linking entities.

R&D Projects
The Research and Development Project is the main object (entity) of the current research information system. It contains the most relevant information about the R&D activities for users from different target groups.
For illustration, the representation in Table 1 documents projects solved in 2019 aggregated into basic groups of R&D fields. These were mostly projects spanning a period of several years that started in 2019 or earlier and, at the same time, were to be completed in 2019 or later. In total, 4193 R&D projects were retrieved, in which a total of 414 scientific research organisations took part.
The projects broken down by basic groups of R&D fields are shown in Table 1 and Figure 3. It is important to note that not all projects are included in this group of R&D fields. These are mainly multidisciplinary projects funded by international grant schemes or the EU Structural Funds. SK CRIS currently enables the registration of just one R&D field for one object. In some cases, it is not possible to readily identify an appropriate field. At present, the SK CRIS register does not provide information on the R&D field of the project.  Table 1 shows that, out of the total number of R&D projects solved in 2019, 18% are projects in social sciences and 11% are projects in humanities and comparison with 2017 and 2018.
In the period under study, a total of 1120 projects in the social sciences and humanities were solved, which represents 29% of the total number of projects. In total, 1120 SSH projects were solved by 183 R&D organisations from Slovakia (each university faculty is counted individually). The Faculty of Arts of Comenius University with 103 projects was the most active, followed by the Faculty of Arts of the University of Prešov in Prešov with 72 SSH projects. Fifty out of the total number of organisations solved just one project in 2019. Universities and their faculties predominate among the research organisations. Overall, 30 institutes and organisations of the Slovak Academy of Sciences (including those with no prime engagement in social sciences and humanities research), as well as 30 state, private and non-profit R&D organisations were involved in solving SSH projects in 2019.
A total of 4865 researchers were involved in the SSH projects.

R&D Organisations
The register of R&D organisations contains basic data on research organisations realising a research project, or those organisations that have applied for a certificate of competence to perform R&D, or those organisations that already are R&D certificateholders. The database also contains records on other organisations which declared a focus on R&D.
Hence, the register contains various historical records of R&D organisations, as well as data on those R&D organisations that, other than being registered in SK CRIS, have not declared any research activity. This study focuses on those organisations that are holders of certificates of competence to perform research and development. These organisations are a subset of the register of organisations, their records are up-to-date (as on 31 December 2019) and they are usually high-quality, efficient and active research organisations. The results and comparison with 2017 and 2018 are shown in Table 2 and Figure 4. Only 7% of R&D organisations are engaged in social sciences research, while 4% of organisations are dedicated to research in the humanities in 2019.
Certified organisations represent the core of research project beneficiaries, which pertains for all groups of science and technology disciplines. All the universities and the Slovak Academy of Sciences organisations automatically acquire a certificate of competence. In addition, the majority of grant schemes in Slovakia require grant applicants to submit this certificate.

Researchers
The register of researchers contains records on researchers, support staff and staff providing scientific and technical services dating from when the database was established. Data on researchers who are not active are retained without change and open.
Information on the field of science and technology for support and administrative staff and staff providing scientific and technical services is not included. SK CRIS also has the facility to create a simplified record on a researcher, where information on the field of science and technology is also not required. This functionality was designed for a situation in which data are inputted by the project manager or a person who acts on behalf of the entire organisation or for the project. Accordingly, in Table 3 the number of researchers broken down into groups of R&D fields does not correspond to the total number of researchers recorded in the register. Almost 24% of researchers are engaged in social sciences and 12% of researchers are engaged in humanities, representing a total of 8432 individuals. The proportion of researchers recorded in the register of researchers who at the same time declare themselves to be engaged in social sciences or humanities represents 36% of the total number of researchers assigned to the field of science and technology.
In 2019, 3584 out of 8432 researchers in the social sciences or humanities took part in resolving projects. However, the number of researchers who solved projects in 2019 is effectively greater, attaining 4865 researchers. This difference of almost 1300 people includes experts who work outside the social sciences and humanities. Quite a large number of researchers are those with a non-specified expertise.

R&D Results
Relevant outputs and outcomes resulting from research projects determine the success and significance of the projects. This analysis focused on publications as the most numerous and best-registered type of research results.
The bibliometric analysis of research results from 2019 was based on data registered in the Web of Science Core Collection (WoS) database (data acquired on 7 and 10 February 2021). Although there has been sufficient time for Web of Science services to register all the publications from 2019, the number of conference proceedings, in particular, may not necessarily represent the final number. It usually requires a considerable period of time to index this type of document. On the other hand, proceedings from conferences which were held prior to 2019 may also be included. All such publications were excluded from the WoS Core Collection dataset under scrutiny.
In 2019, in total, more than 9000 publications assigned to Slovakia are registered in the Web of Science Core Collection. Of those, 1214 publications are categorised in the Social Science Citation Index as publications in social sciences (Appendix A, A4) with the dominant categories illustrated in Figure 5. Unlike arts and humanities, the proceedings papers are the most frequent type of document used for scientific communication in social sciences. As a consequence, proceedings papers exceed the number of articles (378) by more than twofold (824). This is also reflected in the lower number of cited publications   From among the six basic sciences corresponding to Frascati first-level categorisation, publications in social sciences have higher representation in Slovakia than in the EU-28 countries and even higher in comparison with the world, as displayed on Figure 7. In arts and humanities, the publications from Slovakia are slightly more represented compared to those from the EU-28 and the world. In comparison with previous years, an absolute and relative increase in publishing in the social sciences and humanities (WoS) is observed in Slovakia [1,2]. This trend is evident in social sciences disciplines in particular with an increase by 16% in 2018 and by 43% in 2019 of publications, resp. as compared with 2017. In arts and humanities, this increase represents 18% in 2018 and 12% in 2019, resp. in comparison with 2017.
From the WoS database, it is also possible to obtain information on how many publications in the social sciences and humanities were the result of an R&D project. Out of 1214 publications in social sciences, 66% were published as a result of funding, with proceedings papers once again predominating. In arts and humanities, 19% out of 223 publications resulted from projects funded by various grant agencies. It is important to note that several publications may be an output of one project and, conversely, that one publication may be funded by several grant agencies, which renders analysis more complicated.
By comparison, the CREPČ central registry of publication activities contains 42,088 publications from 2019. A search of publications published in 2019 in the CREPČ database retrieved a list of 53,472 publications. This discrepancy may be due to the fact that the publications published in any given year may not be identical with the publications registered and reported in any given year.
Of the publications retrieved from 2019, 20,993 publications covered social sciences and humanities (i.e., assigned to 8 groups of the SSH scientific disciplines from A3 in Appendix A). Of those, 1492 publications were published in the CREPČ group B and C publications, i.e., registered in WoS and Scopus databases. The final numbers of B and C group publications from CREPČ (1492) and those retrieved from WoS for social sciences and humanities (1437 publications) for Slovakia and 2019 are in good agreement. The positive difference in favour of CREPČ data may be due to the fact that publications from the CREPČ database also comprise those publications which are indexed in the Scopus database. On the other hand and, as stated previously, CREPČ only comprises university research outputs. Out of 131 SSH papers from 2019 in WoS/InCites published by the Slovak Academy of Sciences, which is the other major contributor to the research outputs in Slovakia, one third were joint publications with universities.
As to resolving how many of publications registered in CREPČ are the outputs of an R&D project, the CREPČ search interface does not provide this information, despite the fact that information on a project underlying the respective publication is registered in the CREPČ system.

Discussion
As part of the analysis, we summarised what information about research is monitored in Slovakia and in what data sources. We state that the data on subjects and activities are sufficiently monitored within the information system on science and research SK CRIS.
When formulating the criteria, we were limited by the incompleteness of data on the annual budgets of R&D projects. We therefore had to withdraw from the processing of this indicator. We also encountered limitations in the implementation of the analysis according to research areas. We have identified a problem in multidisciplinary projects. As it is not possible to assign more than one research area to one project, the data is not filled in in case of dispute.
Although SK CRIS also monitors research results, data for bibliometric analysis need to be obtained from other sources. The reason is that the research areas of publications is not monitored in SK CRIS. For this reason, we worked with WoS and CREPČ data. However, these databases do not contain all types of research results for the whole of Slovakia. Publications indexed in WoS are only a fraction of publishing activity, and CREPČ records all publishing activity, but only for universities. Each of the databases also uses its own and different categorisation of disciplines. This fact is limiting especially when working with CREPČ data, where the categories are conceived so broadly that we consider the assignment of some categories to one of the basic six research areas to be ambiguous.

Conclusions
This analysis leads to the conclusion that in Slovakia in 2019 the area of social sciences is proportionally represented in R&D activities. This can be confirmed, on the one hand, by the rate by which solved projects and subjects contribute to the overall number of these in all the scientific disciplines in Slovakia. On the other hand, it is the representation of SSH publications in the entire Slovak research publication portfolio, which exceeds the proportion (in percentage) of SSH publications in the EU-28 and even more in the world.
The analysis also reveals a disparity between the relatively small number of R&D organisations performing in the disciplines of the social sciences and humanities and the relatively higher rate of SSH projects in the total number of research projects. A clear trend in the decrease in the number of organisations involved in research projects in SSH disciplines over a three-year period from 2017 to 2019 has been demonstrated in this study. There are two possible explanations for this: the first assumes that SSH organisations are more than averagely active in resolving projects, while the second is that some organisations which predominantly conduct research in other sciences are also actively involved in SSH projects (e.g., educational, environmental studies, etc.). The present data analysis confirms both findings. The experience acquired in this data analysis can be summarised in the following technical recommendations: Despite the evident improvement in scientometric data curation and the implementation of new automated algorithms (such as citation topics), many challenges remain which demand careful attention. All possible efforts should be focused on research data management to make the data available in high quality and validated, and to endeavour to include all aspects of R&D not only in CRIS systems but also in internationally recognised bibliographic/scientometric databases. The problem is mainly ambiguous entries of the names of author organisation in bibliographic databases. As regards the researchers, various forms of author names registration complicate the data analysis mostly in CRIS systems. It is imperative to achieve the maximum possible integration of available data resources for diverse purposes, amongst them, e.g., for the analysis of research data in order to obviate work with different interfaces and software applications. It means to use available API interfaces of bibliographic databases. Especially it would be useful to interconnect publications indexed in WoS with authors and researchers registered in the system SK CRIS. It is desirable to introduce high-quality tools for searching and analysing published data so as to render them available to the user and not be solely restricted to the data administrator. At present user interfaces in publication databases and CRIS system do not provide enough search criteria and their combinations. To prepare comprehensive and meaningful analyses should be possible without the use of administrative tools, such as MS SQL. The unique identification of R&D entities (organisations, researchers) at national and international levels needs to be established and strictly maintained. Categorisations of research areas need to be unified, at least at the national level and, ideally, also internationally. Otherwise, items from different scientific classifications need to be mapped to each other. This fact can distort the results of the analyses.
In future, implementation of the proposed measures would facilitate not only the analysis of data for the representation of the social sciences and humanities, but also any analyses of the R&D sector. Analysis based on underlying quality data and their correct interpretation is key to research management and the formulation of state science policies.