Many studies on the measurement of research outputs are based on publication counts of scientists, institutions, cities, countries, and topics. Common scientometric indicators are scientific productivity (by the measure of publication counts), scientific impact (by the measure of citation counts), and mentions in social media (by the measures provided by altmetrics). The basic indicator is the publication count which determines the scientist’s, the institution’s, etc. (in the following, “unit of assessment”) visibility. A publication has to be visible in order to be cited; it has to be visible as well in order to be mentioned on any social media. The quality of all mentioned measures depends on the completeness of the unit of assessment’s publication list. In this article, we introduce an indicator for the analysis and evaluation of publication lists: boundedness. The boundedness of publication lists consists of three manifestations:
truebounded publication lists;
overbounded publication lists;
underbounded publication lists.
A truebounded publication list of a scientific author (or of any other unit of assessment) consists of all publications that meet two criteria: (1) they are formally published (e.g., as a journal article, an article in conference proceedings or anthologies, or as a book); (2) they have scientific, scholarly, or academic content. A publication list is overbounded if it includes documents that do not meet the two criteria (e.g., unpublished documents or novels); a publication list is underbounded if it is incomplete. It is possible that a publication list is both underbounded (missing publications) and overbounded (including documents not meeting our two criteria or publications which do not originate from the particular author).
We borrowed the terms of boundedness from city research ([1
], pp. 6–7). Boundedness describes the relations between metropolitan areas and its administration unit(s). In an underbounded city, the administration unit is smaller than the entire metropolitan region. An example of an underbounded city is New York; as the metropolis covers not only the city of New York, NY, but additionally parts of New Jersey and large areas of Long Island. The overbounded city is larger than the core city and includes rural areas and further smaller towns, like Chóngqing in China (an area with more than 82 k km2
—as large as Austria). The ideal is the truebounded city, where administration unit and metropolitan region match (e.g., Singapore, SG).
To determine the state of boundedness, we address the measure of relative visibility [2
]. Relative visibility is the share of the number of the unit of assessment’s publications on a certain information service or repository relative to the unit of assessment’s entire œuvre. It is possible to calculate relative visibility according to results lists of scientific information services (say, Web of Science, Scopus, or Google Scholar) or according to publication lists found on the Web which are provided by the author him- or herself or by institutional repositories. Personal publication lists apply different criteria for the arrangement of the documents. Among others, they could be arranged in inverse chronological order, by document type, or by subject [3
As we are going to work only on case studies on the author level, we are not going to consider other units of assessment in this paper. However, the lessons to learn for those levels will be probably the same. This article pursues answering two research questions (RQs).
(RQ 1). Are authors’ personal publication lists, found on their personal sites on the Internet or on institutional repositories, truebounded, overbounded, or underbounded?
(RQ 2). Are the respective publication lists generated through bibliographic information services truebounded, overbounded, or underbounded?
1.1. Relative Visibility
In informetrics and scientometrics, it is well known that all information services, including general science databases such as Web of Science (WoS), Scopus, or Google Scholar as well as domain-specific bibliographic services such as for instance Medline or EMBASE for medicine, are incomplete [4
]. “The coverage of journals in cited reference enhanced databases can be surprisingly uneven,” Jacsó ([5
], p. 278) states. So completeness can vary from author to author as well as from database to database [6
]. Furthermore, not every researcher has the same database collection access (e.g., WoS with or without Book Citation Index). This makes it difficult to compare such studies and, based on them, the visibility of authors. Regarding the concept of “metric-wiseness” as proposed in an opinion letter by Rousseau and Rousseau, “metric-wise” authors may tend to publish only in sources that are covered by certain information services (especially WoS and Scopus) to achieve a high visibility [10
Databases like WoS or Scopus are rated as “quality” information services. Every covered publication is called a “quality paper,” so that bibliometric studies applying WoS or Scopus are always focused on “quality papers” of an author. What is a “quality paper”? A paper indexed in WoS or Scopus is considered a “quality” paper because the publishing journal has been assessed and was found to meet a series of quality thresholds. Why are WoS and Scopus “quality information services”? Because they include “quality papers.” That is why some authors prefer to speak of “mainstream journals” [11
] instead of “quality journals” that are included in such information services.
What is visibility? There exists a variety of definitions for this concept. For Cole and Cole ([12
], p. 398), it indicates through questionnaires “how well known” a scientist is and “characterizes the men (nowadays, of course, women as well) being looked at.” Ingwersen [13
] limits visibility to an author’s publications and thus the absolute number of publications in the National Science Indicators (NSI) database. NSI is derived from WoS, so the visibility is dependent on an author’s publication count within this information service. The same applies to Schlögl [14
] who defines visibility as the absolute number of publications in an information service (again, in WoS). However, visibility may not only refer to publication and citation counts within established academic databases. Social media services (like Twitter or Mendeley) can also be used to study the visibility of an author [15
], so the use of social media for scientific purposes can increase an author’s visibility. There is one further different approach. Dorsch [2
], following Gaillard [11
], Kirkwood [16
], and Hilbert et al. [4
], banks on personal publication lists published on authors’ or institutions’ websites. Complete personal publication lists can also allow for a comprehensive picture of a scientific institutions’ research activities [17
If we are able to collect 100 percent of an author’s publications (in order to create a truebounded publication list), the indicator of relative visibility of the author in the different databases and in his or her personal publication list arises. With the total number of an author’s publications within a database, homepage or repository (d
) and the union of all publications in all databases and in the personal publication list of the same author (r
), the database-specific author’s visibility can be calculated:
Relative Visibility(Author,IS) = (d/r) × 100
where IS is an information service (such as WoS or Scopus) as well as a personal publication list. If the visibility equals 100, the publication list is truebounded; if the visibility is below 100, the list is underbounded (or in other words, there are missing items); if it is above 100, the list is overbounded (thus, there are items which do not meet the criteria).
As it is possible that a database includes “false” hits (such as papers erroneously attributed to the author) and at the same time misses articles, the overall value of relative visibility may be misleading. Say, five articles are missing and simultaneously there are five “wrong” documents in the publication list, the relative visibility would be 100, which definitely does not reflect the whole story. Therefore, we have to take a closer look towards over- and underbounded publication lists.
1.2. What Is a Publication?
The crucial question for all publication lists of scientists is: What is a (scientific, academic, scholarly) publication? And what is not such a publication [18
]? For the decision for or against the acceptance of a document as an author’s publication, we propose two criteria, namely (1) the rule of formal publishing and (2) the rule of scientific content. Of course, there is an additional (rather self-evident) norm: in the publication’s by-line, the name of the author is stated.
With the emergence of social media, the concept of “publication” changed. Tokar [19
] distinguishes between publications (also by scientists) on the social web (for instance on Facebook, Twitter, or Reddit) and academic or scholarly publications in “classical” (online as well as offline) media (like in journals, proceedings, or books). Only academic publications are formally published, while authors can publish their documents on the social web without any formal gatekeeping instance. Also preprints on arXiv or other platforms are not formally published. Only formally published documents can be considered as publications in authors’ or databases’ publication lists.
What is a scientific (formally published) document? In philosophy of science, many authors have discussed criteria for demarcation between science and non-science or pseudo-science. For Carnap [20
], only reasonable (empirical as well as formal) sentences are able to be scientific; Popper [21
] calls only falsifiable propositions scientific; for Stegmüller [22
], science is the rational search for truth; for Haller [23
], it is adequacy in practice; and, finally, for OECD’s Frascati Handbook [24
], new knowledge and new applications determine science ([18
], pp. 252–256). Chase [25
] found normative criteria for scientific publications such as logical rigor, replicability of research techniques, clarity and conciseness of the writing style and originality, among others. In scientometrics, it would be very problematic to check demarcation and normative criteria in every single case. We propose to be guided by the scientific, academic, or scholarly character of the publishing source and to exclude all documents, which are not published in such media. Applying this rule, Dorsch [2
] skipped all novels from the publication list of a distinguished scientist.
Following Tillett [26
], the translation of a document is an independent expression of a work, so that translated publications are to be considered as separate publications. Moreover, revised proceedings or journal publications that appear at a different place were counted as independent publications.
2. Materials and Methods
We investigated nine authors of the ISSI Scientific Committee in order to determine their relative visibility in WoS, Scopus, and Google Scholar as well as in their personal publication lists as case studies. (The goal of this article is not to present a visibility study on selected authors, but only to test if the proposed method will generally work). The author selection is based on the ISSI Scientific Committee List, included in the Proceedings of ISSI 2013 [27
]. From the listed 200 members, we excluded authors with few publications and authors without personal publication lists. Finally, nine scholars that meet the above-mentioned criteria and that are well-known in the field of informetrics were selected. We chose ISSI Scientific Committee authors because they are researchers in the same subject area—the field of information science—and therefore widely comparable. Therefore, two sources of data are required, namely the personal publication list of each author as well as their publication lists in diverse information services.
A personal publication list consists of the publications’ metadata (for example, title, author(s), document type, volume, and publisher for each publication). For the creation of the truebounded personal publication lists, we used the publication information on the authors’ personal or institutional website. For authors who do not report on her/his publication online, it would be also possible to directly request this information from the author. We checked the lists’ completeness through online searches in different databases (among others, WoS, Scopus, Google Scholar, ACM Digital Library, and LISTA). We skipped passages in the personal publication lists that obviously did not contain publications, such as the “Media Coverage and Reviews” paragraph in Haustein’s publication list. We selected all scientific publications published between 1 January 2007 and 31 December 2016 from the personal publication list of each author and from information services corresponding to the stated selection criteria.
For Criteria 1, our lists include the following formally published documents: books chapters, monographs, proceedings (also including poster or workshop contributions if they were published in the proceedings), journal articles, editorials, and reviews. Edited material, web/blog contributions (not to be confused with formally, but only online published articles), first online/in-press articles, white papers, reports, preprints, lectures, talks, and all other informally or not-yet-published materials were excluded from analysis. For Criteria 2, we checked the scientific content applying title lists of scientific, academic, or scholarly recognized sources (e.g., journal title, title of proceedings, and publishing houses). We systematically excluded fiction or novels. To simplify the analysis, all entries were counted as 1 (regardless of document type, length, and number of co-authors). Additionally found publications (for instance, on Google Scholar) could not be considered for the truebounded publication list when there was no information as to whether the document had been published or not (such as the declaration of credits on the relevant journals’ websites or in the relevant conference proceedings). However, we certainly marked such occurrences for the analysis of overbounded lists.
Based on the generated truebounded publication lists, the relative visibility in WoS (Core Collection), Scopus, and Google Scholar could be determined. The databases represent a selection of widely used fee-based and free search services and were the subject of many visibility studies. Scopus and WoS are commercial multidisciplinary citation databases with a thematically widespread focus. However, WoS is a more selective index, since it includes about 18,200 journals and proceedings in its Core Collection. According to Mongeon and Paul-Hus [9
], there were exactly 13,605 journals in WoS in 2014. With about 22,800 serial titles, 20,346 of which are journals [9
], Scopus is more inclusive. Compared to Ulrich’s with around 63,013 active academic journals [9
], both databases cover only minor parts of the entire scientific production. Google Scholar is a free web search engine indexing multidisciplinary scholarly literature.
We searched in every database by author (and limited to the fixed time period) in March 2017. Umlauts in author names were considered for searching in WoS (e.g., AU = (Schlogl C* OR Schlögl C* OR Schloegl C*)). In Scopus, we searched by author ID. For Google Scholar, existing author profiles were considered (excluding publications with no date, except if they were found through title term search). In all concerned information services, an additional search by title terms took place to ensure that all publications were found. Due to continuous updating in the databases, the publication counts can vary from our stated results.
During the specified time window, the investigated ISSI Scientific Committee authors respectively published between 65 and 304 documents (Table 1
). These publication counts include all publications of an author’s personal publication list completed with additionally found publications in diverse databases; all listed documents were checked against the two criteria and thus the lists are recognized as truebounded publication lists.
In order to analyze the levels of underbounded publication lists, relative visibility values were calculated for each database, but also for the personal publication lists. No single database includes all publications from an author. Egghe and Bornmann receive the highest relative visibility values in all three databases. (It could be that both are notably metric-wise [10
], but of course we cannot know and should ask the authors). Generally, the relative visibility varies from database to database, always highest in Google Scholar followed by Scopus and WoS. Eight of our nine case-study authors reach 80 or more percent of visibility in Google Scholar, whereas such high values in Scopus are achieved by three authors and in WoS by only one. Closer inspection of WoS values shows that, for seven out of nine authors, more than half of their publications are missing. For one third, not even 30 percent are covered in this database. In contrast, seven authors reach relative visibility values of more than 50 percent in Scopus.
The generally lower values for Schlögl can be explained by the fact that 46 percent of his publications (30 out of 65) are not published in the English language. For scientific publications in information science, English is the lingua franca. Other languages, including German, do not have the same standing within the scientific community and are less likely to be listed in international information services than publications in English [28
]. Likewise, Harzing [29
] discussed a higher coverage of non-English publications in Google Scholar compared to WoS and Scopus. Furthermore, 12 of Schlögl’s publications are German lexicon entries.
Although none of the personal publication lists is totally complete, they cover the majority of each author’s publications. It always depends on the point of view on what the authors themselves consider as a publication and therefore state in their lists. Some authors did not count revisions as independent publications. Furthermore, we assume some authors simply forgot to list publications (or did not realize that one of their papers was published). In addition to this, a few authors had conference and journal publications with the same title (or almost the same title) but stated only the journal articles in their publication lists. Few items on the respective personal publication lists do not meet the two criteria of scientific publications. Therefore, most of the lists are slightly overbounded.
For WoS and Scopus, we did not find aspects of an overbounded results list; however, Google Scholar produces such unwanted results. Depending on the search argument (searching for names versus searching inside of author profiles), the lists can be strongly overbounded. There are two kinds of mistakes (both were found on author profiles), namely wrong author names (for instance, on Leydesdorff’s profile, you can find on position no. 7 an article of Richard Rogers with no relation to Leydesdorff) and informally published documents found by Google anywhere on the web (such as “Wissenschaftliche Zeitschriften im Web 2.0” on Haustein’s profile, which is an unpublished slide set).
4. Discussion and Conclusions
What are the objectives of this study? First of all, we intended to focus attention on the characteristics of publication lists. Therefore, we introduced the concepts of truebounded, underbounded, and overbounded publication lists. As a measure for the state of boundedness, we applied the authors’ relative visibility on bibliographic databases and on their personal publication lists.
To avoid confusion, we clearly have to differentiate between visibility and coverage. “Relative visibility” is a property of the unit of assessment (e.g., an author), while coverage is a property of an information service. Relative visibility shows how visible a unit of assessment in a specific database is. The focus of relative visibility is the perspective of the unit of assessment. While the two concepts are inextricably linked and the distinction may be a semantic one, treating them as different may reveal some new insights. For example, in the ACM Digital Library, a full-text database covering all ACM publications and comprehensively covering the field of ACM-based computing, an author from a bordering discipline might check his/her relative visibility in the information service in order to assess his/her positioning in computing.
In contrast to the application of results lists of bibliographic information services (especially WoS and Scopus) in scientometrics, the use of personal publication lists for research output measurement is a promising alternative approach.
To answer RQ1, all personal publication lists of our case studies are slightly underbounded. However, with visibility values between 83 and 99 percent, those lists are relatively close to the truebounded lists. Nearly all personal publication lists of our case study authors are (again, slightly) overbounded. There are missing items (leading to underboundedness) and items which do not meet the criteria of publications (leading to overboundedness). The analyzed publication lists are both underbounded and overbounded. This clearly demonstrates the importance of boundedness in addition to the simple calculation of relative visibility. Concerning RQ2, all publication lists of our case studies in the bibliographic information services WoS, Scopus, and Google Scholar are underbounded. The authors’ relative visibility is slim in WoS and a little bit better in Scopus. The best visibility values are found on Google Scholar, which is remarkable since this information service is based largely on automated algorithms and crowdsourced editing. There are no overbounded results lists either on WoS or Scopus, but on Google Scholar. This information service is problematic for bibliometrics due to missing standardized data formats, poor metadata descriptions, and overbounded publication lists.
The main limitation of this article is the use of a small list of case studies of authors. This has to be massively broadened in further research to other scientific subjects and other authors (maybe not as metric-wise writers as scientometricians). Considering other authors and other disciplines, results could differ. For example, new authors might also publish in journals that are not indexed by traditional indexes like WoS and Scopus. As we have only considered the general scientific information services WoS, Scopus, and Google Scholar, it would be very interesting to extend the research to discipline-specific databases as Medline for the biosciences, ACM Digital Library for computer science or LISTA for library and information science. With respect to Google Scholar, it would be interesting to include Microsoft Academic as well. We also intend to invite other researchers to discuss the criteria for determining scientific publications.
This research study investigates relative visibility of selected ISSI Committee Authors in WoS, Scopus, and Google Scholar compared to the relative visibility of their personal publication lists and reveals a publication visibility imbalance in the observed information services. Personal publication lists provide a high coverage of an author’s publications; they are only slightly underbounded and overbounded. Especially for some cases in WoS, publications are sparsely covered. “The use of personal publication lists are reliable calibration parameters to compare coverage of information scientists in academic citation databases with scientific social media”, Hughes ([30
], p. 126), following Hilbert et al. [4
There is need for scholarly information management to have the authors’ personal publication lists online available. Dorsch [2
] discussed the application of linked open data techniques and the establishment of institutional, national, and scholarly society-based repositories. A further option could be the inclusion of such lists—free of charge—in commercial information services (such as WoS or Scopus) in order to add the authors’ (more or less) truebounded publication lists to the underbounded so-called “quality paper” lists.