Scientific Production and Productivity for Characterizing an Author ’ s Publication History : Simple and Nested Gini ’ s and Hirsch ’ s Indexes Combined

In this study, I developed operational versions of Gini’s and Hirsch’s indexes that can be applied to characterize each researcher’s publication history (PH) as heterodox, orthodox, and interdisciplinary. In addition, the new indicators warn against anomalies that potentially arise from tactical or opportunistic citation and publication behaviors by authors and editors, and can be calculated from readily available information. I split the original Hirsch index into nested indexes to isolate networking activity, as well as to distinguish scientific production (number of articles) from scientific productivity (rate of production), and used nested Gini indexes to identify intentional and successful intertopical and interdisciplinary research. I applied the most popular standardizations (i.e., per author and per year), and used simple methodologies (i.e., least-squares linear and cubic fitting, whole-career vs. subperiods, two-dimensional graphs). I provide three representative numerical examples based on an orthodox multidisciplinary PH, a heterodox PH from the social sciences, and an orthodox unidisciplinary PH from the physical sciences. Two additional numerical examples based on PHs from the life and health sciences show that the suggested PH characterization can be applied to different disciplines where different publication and citation practices prevail. Software is provided to help readers explore the use of these indicators.


Introduction
All theoretical definitions of scientific activity depend on social institutions, which change in time and space.For example, astrology used to be a science, but is no longer, whereas sociology was not considered to be a science, but now is.However, regardless of a discipline's classification, researchers in a discipline and their managers seek ways to evaluate research activity, and this is most commonly done based on a researcher's publication history (PH).As a result, many theoretical and numerical methods (ranging from simple counts of the total number of publications to complex indexes) have been proposed to make the evaluation process more objective and more effective [1].Unfortunately, there are significant problems with each of these methods, some of which relate to a lack of practical simplicity, and some of which relate to how researchers can "game the system" and artificially improve their rating.
In the present paper, my goal was to improve these methods for assessing scientific researchers by characterizing each researcher's PH as heterodox, orthodox, unidisciplinary, multidisciplinary, and interdisciplinary.To do so, I will first constrain my context by proposing the following operational definition of scientific activity: disseminating original scientific knowledge.In this definition, the original integer-value H indexes rather than H indexes based on standardizations per author.Note that some new H indexes (i.e., related to journals and disciplines) obtained from the first aspect of the methodology are required to implement the second aspect of the methodology.Finally, the percentage values that characterize each PH can be multiplied by the (net per-capita) H indexes that arise from linear fitting for production and cubic fitting for productivity in order to rank individual scientists not only in terms of their overall research performance, but also in terms of specific aspects of that performance such as orthodoxy, heterodoxy, or interdisciplinarity.For example, a PH could be ranked better overall, but worse as an interdisciplinary PH or worse as a heterodox PH.

Methodology
The implementation of algorithms for disentangling scientific production and productivity and for characterizing a PH requires preliminary definitions of the key terminology and assumptions.Thus, Section 2.1 will present these definitions, Section 2.2 will suggest new algorithms based on these definitions for distinguishing scientific production from scientific productivity, and Section 2.3 will suggest new algorithms for characterizing a PH.In particular, I will refer first to the following publication features: • I will focus on full-length peer-reviewed articles (as opposed to notes, comments, or letters) to rely on a prior scrutiny of their originality by peer reviewers.• I will focus on English, to emphasize international dissemination.Note that citations of an article by non-English articles are also included in this analysis.• I will focus on net citations, after eliminating self-citations (citations of the author's other papers) and reciprocal citations (citations of papers by all coauthors and colleagues), by deleting records in which the same author appears in both the citing publication and the cited article.Although this will exclude some legitimate self-citations, it also mitigates the problem of excessive citation of one's own papers.I will also delete records in which the same affiliation appears in the citing publication and the cited article.Although this will exclude some legitimate citations of the work of colleagues that provide important context, it also mitigates the problem of excessive reciprocal citation.Here, I define reciprocal citations as situations in which coauthors cite each other's work.This will mitigate "apostle" effects (i.e., inflating citations by relying on temporal linkages such as citations of a supervisor's or manager's papers) and network effects (i.e., boosting citations by relying on personal linkages).Note that coauthors refer to any kind of publication (e.g., citations of articles by coauthors in books, symposium proceedings, or research notes) and colleagues refer to all researchers affiliated at any time with the author whose PH is being studied (e.g., citations of articles by colleagues in the same PhD courses).
Second, deleting reciprocal citations could discriminate against heterodox scientists, who are typically few, familiar to each other, and likely to be coauthors (e.g., post-Keynesian or Marxist economists).However, I will deal with this issue by characterizing the PH in terms of publication inequality: specifically, I will apply the Gini index [15], as described in Section 2.1, to measure dispersion of articles across different journals and different disciplines.Moreover, instead of "purifying" or "cleaning" the individual scientific activity by estimating the individual researcher's network from the database of all scientists [16,17], I will delete reciprocal citations from the dataset in order to keep calculations simple for each researcher.Finally, to be conservative, I will retain records in which the citing and cited authors are, or have been, members of the same editorial board, the same family, the same PhD course, the same workshop, or similar relationships.
Third, consistent with the Scopus dataset, I will consider citations from indirect coauthors and colleagues (i.e., the coauthors of coauthors who are not themselves the researcher's coauthors; similarly, the colleagues of colleagues who are not themselves the researcher's colleagues).However, authors working on a given topic within the same organization are likely to be coauthors, at least occasionally.The focus on full-length articles in peer-reviewed journals will minimize the probability of considering spontaneous or induced citations due to casual errors (i.e., mistakes) or deliberate errors (i.e., conscious deception) by identifying citations used in new (i.e., not yet published) research and distinguishing them from the existing literature.

Definitions and Assumptions
In this section, I will clarify the meaning of nested versions of the H index to estimate individual scientific production and productivity; the use of the G index to measure multidisciplinary and multitopical PHs; and the interaction between the H indexes and G indexes to estimate orthodoxy, heterodoxy, and interdisciplinarity.I will formalize these definitions mathematically in Sections 2.2 and 2.3, and relate these definitions logically in Section 2. 3.
The definitions used in my analysis (see Supplemental Materials I for detailed characterizations of alternative PHs) can be summarized as follows.
• Production = the number of articles up to a given point in time, used as a total (stock) variable to estimate the researcher's total scientific activity, where core production (as defined in Section 2.2) de-emphasizes the most frequently cited articles.• Productivity = a marginal (flow) variable used to evaluate production per unit time or changes over time in scientific activity, where core productivity de-emphasizes the most popular articles.

•
A multidisciplinary PH = the author submits their manuscripts to journals belonging to different disciplines; it will be measured by a Gini index applied to disciplines related to published manuscripts.The opposite would be a unidisciplinary PH.

•
A multitopical PH = the author submits their manuscripts to many different journals belonging to the same discipline; it will be measured using a Gini index applied to journals related to the author's published manuscripts.The opposite would be a unitopical PH.

•
An intentional PH = the author deliberately submits their manuscripts in order to shape their PH; it is related to the choice of journal publication, it will be applied to disciplines (i.e., multi-or unidisciplinary) and journals (i.e., multi-or unitopical), and it will be measured by the Gini index.

•
A successful PH = publications are cited many times by other papers within the same journal and within the same discipline (i.e., intratopical), by different journals within the same discipline (i.e., intertopical), or by different journals from different disciplines (i.e., interdisciplinary); it is related to the actions of other researchers (i.e., to cite or not to cite a given article), it will be applied to interdisciplinary and intertopical PHs, and it will be measured by H indexes.

•
An orthodox PH = the author publishes in a single discipline and in many journals, and the vast majority of the citations are in few disciplines but in many different journals; it is intentional and successful, and it will be measured by combining H indexes and G indexes.

•
A heterodox PH = the author publishes in a single discipline and in a few journals devoted to that discipline, so that the vast majority of citations are in few disciplines and few journals; it is intentional and successful, and it will be measured by combining H indexes and G indexes.

•
An interdisciplinary PH = the author publishes in many disciplines and journals, and the vast majority of citations are in many different disciplines and journals; it is intentional and successful, and it will be measured by combining H indexes and G indexes.

•
An intertopical PH = the author publishes in a single discipline and in many journals, and the vast majority of citations are in many journals within this discipline; it is intentional and successful, and it will be measured by combining H indexes and G indexes.
In addition, I have defined a discipline as a broad field of study (e.g., economics) that includes two or more topics.I define a topic as a specialized area within a discipline (e.g., environmental economics).To make the definitions more rigorous, I have tentatively defined "a few" disciplines and journals as (approximately) two or fewer and four or fewer, respectively, although future experimental work is required to empirically evaluate the applicability of these thresholds.
The assumptions used in my analysis, which considers the standardization procedures or the bibliometric indexes suggested in the Introduction as operational choices, can be summarized as follows.

•
Each journal represents a single topic within a discipline: that is, a journal cannot be attached to two different topics.See Section 5 for suggestions of future research to account for exceptions to this assumption.

•
Each journal is linked to the most representative discipline: that is, a journal cannot be attached to two different disciplines.See Section 5 for suggestions of future research to account for exceptions to this assumption.
Note that an interdisciplinary PH and an intertopical PH are identified by referring to both publications and citations, whereas a multidisciplinary PH and a multitopical PH are identified by referring to publications only.Moreover, an orthodox PH is intentional and successful as well as being unidisciplinary and multitopical, whereas a heterodox PH is intentional and successful as well as being unidisciplinary and relating to few topics.Finally, successful, unsuccessful, intentional, and unintentional features do not refer to the PH per se, but to characterizations such as whether it is inter-versus intratopical or inter-versus intradisciplinary.In particular, "intentional" is based on the journals chosen by the author, whereas "successful" is based on the citations received by the author.For example, an unintentional and successful interdisciplinary PH means that the author's articles receive many citations from researchers in other disciplines even though the author did not deliberately pursue an interdisciplinary characterization, by choosing journals in a single discipline.In other words, "intention" can be applied to a PH since the author chooses to submit to (and publish in) few or many different journals (i.e., intentional multitopical) and in a single or many different disciplines (i.e., intentional and multidisciplinary).However, "success" is based on citations by other researchers.Consequently, an author could try to make a PH intertopical by publishing in many journals, but this attempt could turn out to be successful if articles in many journals are cited by papers in different journals; in contrast, it could turn out to be unsuccessful if only a few articles are cited and only by the same journals (i.e., the PH is intratopical).Similar evaluations apply to successful and unsuccessful interdisciplinary PHs.
In this context, I treat the production of articles up to a given point in time as a total (stock) variable.In contrast, I treat productivity as a marginal (flow) variable that can be used to measure the sensitivity (or dynamics) of the total production to a changed or potentially changing factor.The integral of a series of marginal variables (i.e., productivity) results in the total value of that variable (i.e., production); conversely, the derivative of the total variable (i.e., production) with respect to a factor (e.g., with respect to time) amounts to the marginal variable (i.e., productivity).In particular, I will use production to estimate the total scientific activity, but will use productivity to evaluate changes over time in scientific activity, where the sum of productivities for each part of the overall period sums up to production during the whole period.Indeed, production and productivity refer to different goals: the decision to recruit a junior researcher as an Assistant or Associate Professor based on scientific productivity is different from the assessment of a senior researcher for promotion to an endowed chair or from the awarding of ad honorem degrees based on their scientific production.
Many alternative diversity indexes have been suggested in the literature [18] to measure interdisciplinarity (i.e., conceptual and methodological integration in the process and outcome of the research activity [19]): variety, balance and disparity of citing or cited publications within a top-down structuralist approach [20], which is closer to a cognitive interdisciplinarity; entropy and between measures applied to coauthors within a bottom-up spatialist approach [21], which is closer to a social interdisciplinarity.Here, I will apply the G index to published articles in different journals and disciplines to highlight the intentional balance of the research activity process, under the assumption that science is incomplete without publication.In particular, I will not use the coauthor analysis as inadequate in case of single-author articles.Moreover, variety is irrelevant in characterizing a PH so the G index is perfect to measure the balance of the research activity process [22]: the research activity outcome will be depicted by the nested H indexes. Finally, I will not apply the G index to the article's list of citations as inadequate to distinguish a unidisciplinary from a multidisciplinary PH.
Many standardizations have been suggested for use in cross-disciplinary comparisons of individual scientific activity ([9-11,23,24]): the number of articles per author, number of citations per author or per year for the cited article, number of citations in the citing article, average number of citations or publications in each discipline, and the weighted number of citations or publications according to an author's position in the list of authors for an article.Here, I will use the standardization "per author" to assess production over the total career of a researcher (i.e., based on the number of years from the first publication to the present), but will use the standardization "per author per year" (i.e., based on the number of years since publication for each article) to evaluate productivity in specific subperiods.Note that I will not use the average numbers of citations in each discipline [25], since this continuously changing figure is not provided by the available bibliometric datasets and would be prohibitively difficult to calculate and update.Moreover, I will not apply the number of citations in the citing publication [26], since it has been empirically shown to be ineffective in cross-disciplinary comparisons [25].Finally, I will not use the weighted number of citations or articles according to an author's position in the list of authors for an article [27,28]; since different practices prevail in different universities, disciplines, and countries.In particular, I will refer to the 22-year period from 1995 to 2016 to obtain a reliable dataset on production.I chose this period because the Scopus Web site suggests that many inconsistencies might arise for publications and citations before 1995.To account for productivity, I have also analyzed the data for the 10-year period from 2007 to 2016 (see Supplemental Materials II for details of the data).Note that the two periods that I considered (1995 to 2016 and 2007 to 2016) refer to the publication year, and account for citations in any year: by standardizing the citations per year, I reduce the potential bias against the more recent articles, since these articles have been cited for a shorter period than older articles.However, shorter periods are more likely to miss the citation cycle of articles in some disciplines, since the length of the cycle often increases with increasing originality of an article; for example, the most innovative articles might be ignored for some time after publication, but then be cited for a long time; see an application of the H index to show a scientist's dynamic research trajectory and scientific performance during different periods [29].Note that the focus on articles reduces the dependence of my results on the dataset that is used because most databases include all full-length journal articles by an author.Alternatively, one could rely on expert panels [30], despite their subjectivity, or on factor analysis, despite its methodological issues [31].
Many alternative operational bibliometric indexes have been suggested in the literature, but very often without demonstrating significant positive correlations between the metrics [32].For example, indexes discussed in recent papers include the c index [33], the Egghe [34] index (see also [35][36][37][38][39][40][41][42]), the z index [43], and the generalized Hirsch index [44][45][46].Here, I will apply the H index to citing publications to highlight the (possibly unintentional) intertopical or interdisciplinary outcome of the research activity.In particular that I will not consider the importance of citations based on algorithms for ranking Web pages such as PageRank [47,48].Moreover, I will not use the impact factors of the cited journals [49,50], because these figures change continuously and there is no consensus on the value of such metrics of a journal's influence.Finally, I will not compare alternative datasets in terms of their reliability and stability [10,51].Note that I will disregard individual characteristics that change over time, such as the researcher's age or position ([2,3,52]), and individual characteristics that are fixed in time, such as gender or ethnicity [53][54][55].In this paper, my interest is in indexes for, rather than determinants of, scientific activity [56].Moreover, I will omit papers that use indexes to compare countries [57], institutions [58], or journals [59,60].Finally, I will disregard indexes for cross-disciplinary comparisons, such as the P100 of Prathap [61] and the percentiles of Schreiber [62][63][64], which are not based on common information available in the Scopus dataset.
In the context of nested indexes, the Hirsch index (H), which is based on the number of articles and citations in different journals and disciplines, can be coupled with the Gini index (G), which accounts for differences between journals and disciplines (i.e., dispersion of articles among journals and disciplines).I will define both indexes analytically in Sections 2.2 and 2.3.In the Scopus dataset, each publication is attached to a source (i.e., a journal, including reviews) and classified into one or more subject, discipline, and topic areas (i.e., four subjects, 27 disciplines within those subjects, and 306 topics within those disciplines).For example, social sciences is a subject, economics is a discipline within that subject, and environmental economics is a topic within that discipline.Consistently with the Scopus dataset, I will assume that each article is linked to the most representative discipline for its source.In future analyses, a more complex classification (i.e., based on topics) might be possible that associates a publication with two or more different topics, although this could unreasonably enlarge the interdisciplinary features of PHs.Of course, the opposite problem applies for a less complex classification (i.e., subjects); this level artificially decreases the degree of interdisciplinarity.In other words, I will make the simplifying assumption that a journal cannot be attached to two or more different disciplines.In future analyses, a multi-discipline classification for journals such as Ecological Economics that clearly span two or more disciplines should be considered, although this will require formal definition of the distance between disciplines so that a standard that is as objective (quantifiable) as possible (i.e., because there is some subjectivity involved) can be applied to the vectors (i.e., 27 relative weights for the 27 disciplines) that characterize each journal.Next, I will make the simplifying assumption that each journal represents a single topic within a discipline; in other words, I will assume that a journal cannot be attached to two different topics.Although this assumption is an obvious simplification, some combinations of fields and methodologies are only accepted by a few journals.Consistent with the Scopus dataset, each article could be linked to the most representative topic for its source.
Therefore, to characterize PHs, I will apply a differential approach.This is similar to the approach used by Blagus et al. [65], who applied it to alternative versions of the H index.In the present approach, both levels and differences between levels are meaningful for both the Hirsch and Gini indexes, for deleting records at each step, and for constructing a system of nested indicators.This is true even though the Gini index values (hereafter, G values) are based on the topic and discipline of each publication, whereas the Hirsch index values (hereafter, H values) are based on the gross number of citations (i.e., citations without excluding self-citations and reciprocal citations) and the net number of citations after deleting records based on the abovementioned criteria, together with the topic and discipline of a given publication.Note that I will use similar inequality constraints for both indexes: G values for disciplines are less than or equal to G values for journals (i.e., topics), and H values for disciplines are less than or equal to H values for journals (i.e., topics).Moreover, I will present G indexes as percentages, but H indexes as levels.Indeed, there is a maximum inequality that can be used to standardize G: (N − 1)/N, with N being the total number of articles by an author.In contrast, there is no ex ante maximum level for the different H indexes. Finally, I will multiply nested G indexes by nested H indexes (i.e., values measured on different scales) to measure nested areas (see a similar approach applied to the citation impact of journals) [66], but I will then express the new indicators as a percentage of the comprehensive area.

Scientific Production and Productivity
In this section, I will apply least-squares linear and cubic fitting for the relationship between the net citations and the number of articles to calculate scientific production.See Supplemental Materials II for calculation of scientific productivity.Note that a quadratic fitting would be inappropriate because it would produce an increasing curve for the less frequently cited articles.
Table 1 summarizes the notation for the different calculated G and H indexes.In particular, linear fitting will emphasize net citations above the minimum required by the original H index.As a result, the linear index values are likely to be larger than the original H values.In contrast, cubic fitting will de-emphasize them.See an application of the H index to highly cited papers [67].For this reason, I will name the four cases that I study total production (i.e., a linear fitting applied to production) and total core production (i.e., a cubic fitting applied to production).See Supplemental Materials II for the algorithms used to calculate average productivity (i.e., a linear fitting applied to productivity) and average core productivity (i.e., a cubic fitting applied to productivity).See the notion of core documents, which focuses on identifying the main research topics by disregarding incidental citations [68], whereas the definition of core in the present study refers to the central H indexes by disregarding exceptional citations.
Thus, once the H index has been focused on articles in English and has been standardized for the number of authors, the H index for total production (H ltn , where l = linear, t = total, and n = net citations) will be calculated by applying the following formula to the number of articles per author as the independent variable (here, a generic scalar variable x) and to the number of citations per author for articles in a decreasing order as the dependent variable (here, the fitting curve lp(x)): where lp(x) stands for linear polynomial fitting and positive parameters a 0 and a 1 come from a linear regression of the total number of net citations over the total number of articles x.This procedure uses continuous variables to replace the calculation the H index based on discrete variables (i.e., Max i Min [f(i), i], where f(i) represents the number of citations in decreasing order from the largest to the smallest value for each article i and i is the counter for the article number).For example, H ltn is the value of x such that a 0 − a 1 x = x with a 0 and a 1 positive parameters (i.e., the solution is x = a 0 /(1+a 1 )).In graphical terms, this solution is represented by the intersection between the line y = x (i.e., the 45 degree line) and the linear polynomial fitting curve.The H index for total core production (H ctn , where c = cubic, t = total, and n = net) will be calculated as follows where cp(x) stands for cubic polynomial fitting, and the positive parameters b 0 , b 1 , b 2 , and b 3 come from a cubic regression of the total number of net citations over the total number of articles x.For example, In graphical terms, this solution is represented by the intersection between the line y = x (i.e., the 45 degree line) and the cubic polynomial fitting curve.
In other words, the fitting curves (i.e., lp(x) and cp(x)) transform the discontinuous number of citations for all articles in a decreasing order into a continuous series so that solutions (i.e., x such that lp(x) = x and x such that cp(x) = x) are good estimations of total production (i.e., H ltn ) and total core production (i.e., H ctn ), respectively.By analogy, the same procedure (i.e., linear and cubic fitting curves) applied to a subset of the articles (e.g., articles published in a specified period) estimates the average productivity (i.e., H ltn10 ) and the average core productivity (i.e., H ctn10 ), respectively.Similarly, the same procedure (i.e., linear and cubic fitting curves) applied to citations standardized per year estimates the total production (i.e., H lyn ) and the total core production (i.e., H cyn ) per year, respectively.
Note that a fitting based on x such that y = 1/(a 0 − a 1 x) = x (i.e., two parameters) would produce similar results.Moreover, the obtained H values are continuous and do not change abruptly when the number of citations of a single article changes; that is, they solve the problem of the discontinuity that could potentially be created by an additional citation received by the marginal article [69], because they account for the citations received by the entire set of published articles [70].Finally, linear fitting gives too much weight to fashionable articles (i.e., articles with many citations in a few years), whereas a cubic fitting disregards them by giving more weight to articles with few citations in many years.Consequently, a linear fitting (i.e., H ltn or H lyn ) seems to be most representative for total scientific production, whereas a cubic fitting (i.e., H ctn10 or H cyn10 ) seems to be most representative for the average core scientific productivity.

PH Characterization
In this section, I will apply nested Gini (G) and Hirsch (H) indexes for PH characterization, with the G indexes based on the number of articles and the H indexes based on both gross and net citations.
First, I will disentangle networking from scientific activities.Then, I will apply the G indexes to distinguish multiple-topic and multiple-discipline PHs from single-topic and single-discipline PHs [71].Finally, I will apply the G and H indexes to distinguish heterodox from orthodox PHs [72] and to identify intentional and successful intertopical and interdisciplinary PHs.Note that I will refer to the total scientific production in this analysis, although similar reasoning could be applied to shorter periods (i.e., scientific productivity).In particular, by applying linear fitting to the gross number of citations (i.e., citations of English articles, including self-citations and reciprocal citations) to calculate the linear total gross H index (i.e., H ltg ), and by calculating the percentage difference between H ltg and H ltn (i.e., [H ltg − H ltn ]/H ltg ), this approach provides a measure of the relative importance of networking activity in a PH.Table 2 summarizes the PH characterizations in terms of the values of the key parameters that define each type of PH.Note that H t is not used here, since successful, unsuccessful, intentional, and unintentional features do not refer to the PH per se, but rather to characterizations of the PH such as inter-or intratopical and inter-or intradisciplinary characterization.Second, I calculated H ljn as a linear fitting of points where citations are in journals other than the journal that published the cited article (i.e., an intertopical measure).To do so, I computed H ldn as a linear fitting of points where citations are in disciplines other than the discipline of the cited article (i.e., an interdisciplinary measure), and calculated the G values for journals (G j ) and disciplines (G d ) by applying the following formulas.
where N is the total number of articles in the author's career and (N−1)/N is the maximum value for both G d (i.e., a multidisciplinary measure) and G j (i.e., a multitopical measure); j represents the journal title, j i − j k = 0 if articles i and k appear in the same journal and j i − j k = 1; otherwise, d represents the discipline name; and d i − d k = 0 if articles i and k belong to the same discipline and d i − d k = 1 otherwise.See an analysis of H indexes based on citations by different citers [73].Note that this classification cannot be criticized as ambiguous (i.e., either the journal is the same or it is different), although it could be criticized because it overestimates PH differentiation (e.g., a heterodox post-Keynesian economist publishes in very few journals, such as the Cambridge Journal of Economics or the Journal of Post Keynesian Economics or the Review of Political Economy, but not in a single journal).However, heterogeneity of PHs can be estimated by comparing percentages.Third, I define a PH as unintentional interdisciplinary or intertopical PH if the author publishes in a single discipline or few journals, but is nonetheless cited by authors in many different disciplines or journals.This is represented by a decrease in G d and G j for a given H ldn and H ljn (i.e., a decrease in the inequality of articles for a given number of citations).Moreover, I define a PH as an unsuccessful intradisciplinary and intratopical PH if the author publishes in a single discipline and in few journals, and is also cited by authors in few different journals.I define a PH as an unsuccessful intradisciplinary and intertopical PH if the author publishes in a single discipline and in many journals, but is nonetheless cited by authors in few different journals.This is represented by a decrease of H ldn and H ljn for a given G d and G j (i.e., a decrease in the number of citations at a given inequality of articles).This case could depict an intradiscipline reputation if G j is large while H ljn is small; that is, it is possible that the author publishes in many journals because editors expect many citations of papers in their journal and consequently an increase in its impact factor, but this does not happen because papers are published without suitable scrutiny to ensure their quality.Similarly, I define an unsuccessful interdisciplinary and intertopical PH if the author publishes in many disciplines and in many journals, but is cited by authors in journals in few different disciplines.This is represented by a reduction of H ldn and H ljn for a given G d and G j (i.e., a decrease in the number of citations at a given inequality of articles).This case could depict an intratopical reputation if G d is large while H ldn is small (i.e., it is possible that the author publishes in few journals because the author knows the editors).Finally, I calculated the areas as percentages using the following equations, with the specified different colors applied in the two-dimensional graphs presented in Section 4 to characterize the PHs: Red area (intradisciplinary intratopical heterodoxy Yellow area (intradisciplinary intertopical orthodoxy Blue area (interdisciplinary intertopical orthodoxy) where the percentages do not sum to 100%, this is because the analysis only considered intentional and successful intertopical and interdisciplinary features.Note that if G d = 0, then the yellow area is given by [H ljn × G j ]/(H ltn ) in order to depict only intentional interdisciplinary PHs.

Data
The Scopus data set includes the following variables for both cited and citing articles.

Application of the Indexes
To demonstrate the application of my methodology, I chose three representative PHs from three different subjects such that the total career of the first author from the multidisciplinary subject perfectly matches the period after 1995 when reliable Scopus data is available.The second author-from the social sciences-published articles both before and during this period, and the total career of the third author, from the physical sciences, occurred before the period with reliable data.Note that I will rely on the numbers of citations per author and per year for the articles included in my analysis, although these standardizations have only been supported by statistical analyses of small samples [10,74].In other words, I deliberately chose three authors with qualitatively different PHs.In particular, the three authors were chosen such that the first author published in many disciplines and topics (i.e., to be associated with a multidisciplinary subject), the second author published in a single discipline (i.e., to be associated with the social sciences) and few topics, and the third author published in a single discipline (i.e., to be associated with the physical sciences) and many topics.Note that a random choice of authors could lead to PHs with the same characteristics, thereby failing to demonstrate the potential of the suggested metrics that I developed to distinguish among the PHs.In contrast, choosing PHs with known differences in their characteristics as I have done provides a reality check on the method's ability to distinguish differences in publication strategies.Nonetheless, I have presented two additional PHs from subjects that were not included in this initial analysis (i.e., life and health sciences) to test whether the suggested methodology could be applied to disciplines characterized by different publication practices in terms of coauthorship and citation numbers.Note that although the construction of artificial PHs is not necessary for the purposes of this paper, this approach could nonetheless be used to test the potential of the suggested metrics for specific applications (e.g., assessments of a research institution's publication outcomes).
Figures 1 and 2 show the values of the H indexes for the total and core scientific production based on linear and cubic fitting, respectively (i.e., intersections between the y = x lines and the linear and cubic fittings, respectively).Table A1 in Appendix A presents the corresponding data for the first PH.Note that the cubic fitting de-emphasizes the most frequently cited articles.Figures 3 and 4 show the H indexes for the average and core scientific productivity based on linear and cubic fitting, respectively (i.e., intersections between the y = x lines and the linear and cubic fittings, respectively).used to test the potential of the suggested metrics for specific applications (e.g., assessments of a research institution's publication outcomes).Figures 1 and 2 show the values of the H indexes for the total and core scientific production based on linear and cubic fitting, respectively (i.e., intersections between the y = x lines and the linear and cubic fittings, respectively).Table A1 in Appendix A presents the corresponding data for the first PH.Note that the cubic fitting de-emphasizes the most frequently cited articles.Figures 3 and 4 show the H indexes for the average and core scientific productivity based on linear and cubic fitting, respectively (i.e., intersections between the y = x lines and the linear and cubic fittings, respectively).Table A2 in Appendix A presents the H index values calculated for the first PH.Note that the cubic fitting emphasizes the most persistently successful articles.Table 3 summarizes the values of the H indexes for the first analyzed PH.Note that the application of relative weights (e.g., the harmonic mean in [75]) to emphasize the author who serves as the author for correspondence or the order of author names (i.e., larger weights to authors listed first) would not significantly change these results; indeed, the vast majority of articles had a single  Table 3 summarizes the values of the H indexes for the first analyzed PH.Note that the application of relative weights (e.g., the harmonic mean in [75]) to emphasize the author who serves as the author for correspondence or the order of author names (i.e., larger weights to authors listed first) would not significantly change these results; indeed, the vast majority of articles had a single  Table 3 summarizes the values of the H indexes for the first analyzed PH.Note that the application of relative weights (e.g., the harmonic mean in [75]) to emphasize the author who serves as the author for correspondence or the order of author names (i.e., larger weights to authors listed first) would not significantly change these results; indeed, the vast majority of articles had a single author for all three analyzed PHs.For authors who generally publish articles with two or more coauthors, it might be necessary to apply relative weights.
Table 3.The estimated values of the H indexes per author for the first analyzed PH (Table A1).All values are for the net production or productivity (i.e., after removal of reciprocal citations).Abbreviations: c = cubic fitting; l = linear fitting; n = net number of citations; t = total articles; y = citations are divided by the number of years since publication.

H
The Note that summary statistics from the Scopus dataset let me calculate both parameters (i.e., α and λ) that characterize the gamma distributions for the H indexes. Here, I chose the gamma distribution because, within non-negative distributions (i.e., H indexes can only have non-negative values) and asymmetric distributions (i.e., scientists are more likely to achieve relatively small than relatively large H values), the gamma distribution can account for qualitatively different frequencies that depend on alternative values of its two parameters.For example, the first analyzed PH (Table A1) turns out to be within the best 0.53% of scientists over the whole career (based on H ltn = 6.29) by standardizing for an average of four authors per article, and within the 0.0008% best scientists from 2007 to 2016 (based on H cyn10 = 2.57) by standardizing for an average publication life cycle of five years.In addition, similar calculations for the second and third analyses PHs (Tables A3 and A4, respectively) suggest that H indexes per year should be used to compare careers of senior researchers.See Supplemental Materials III for the calculation of the H indexes. Indeed, H ltn for the second PH is slightly larger than H ltn for the third PH (i.e., 10.00 > 8.34), whereas H lyn for the second PH is considerably smaller than H lyn for third PH (i.e., 1.52 < 4.17).
These results are internally consistent, since the index values achieved by the first analyzed author are smaller for the 22-year period than for the 10-year period, when the majority of the articles were published (i.e., 18.6 and 15.3 English articles per author in these periods, respectively), with H lyn = 2.43 being smaller than H cyn10 = 2.57.These results are also externally consistent.For example, calculations for the third PH (Table A4) show 11.2 English articles per author and H lyn = 4.17, with the third author's index being 7.5 times better than the index achieved by the first analysed author (Table A1), after standardization of citations per year.Moreover, the present results are clearly better than those calculated using traditional versions of the H index: six for the first representative author for 22 years versus 11 for the third representative author for 33 years, where the latter is only 1.83 times the former.Finally, these results can be easily interpreted.Indeed, if 2.57 articles are cited 2.57 times per year per author, there are several implications: for four authors, the same H index could be achieved only if 10.3 articles were cited 10.3 times per year, which, over a period of 10 years, means that the 10.3 articles would each be cited 103 times.These results can also be simply justified.Indeed, it is difficult to support the belief that citation of one article with 10 authors only one time will directly or indirectly benefit science or society to the same extent as 10 articles with a single author, each cited 10 times.
Figures 5-7 are based on all H values calculated for the first, second, and third analyzed PHs (Tables A1, A3 and A4, respectively) using the indexes developed in this paper.See Supplemental Materials IV for calculations of the G indexes.I define a PH as (interdisciplinary and intertopical) orthodox (e.g., Figure 5) if G d is large (i.e., the author publishes in many disciplines), G j is large (i.e., the author publishes in many journals), H ljn is slightly larger than H ldn , and H ldn is large (i.e., the vast majority of citations are in different disciplines).Moreover, I define a PH as (intradisciplinary and intratopical) heterodox (e.g., Figure 6) if G d is 0 (i.e., the author publishes in a single discipline), G j is small (i.e., the author publishes in few journals), and H ljn is small (i.e., the vast majority of citations are in the same journals).Finally, I define a PH as (intradisciplinary and intertopical) orthodox (e.g., Figure 7) if G d is 0 (i.e., the author publishes in a single discipline), G j is large (i.e., the author publishes in many journals), and H ljn is large (i.e., the vast majority of citations are in different journals).In other words, an orthodox PH can be either intra-or interdisciplinary.Note that in Figures 6 and 7, H ldn is greater than 0. Indeed, in a heterodox PH, H ljn = 0 only if each article is cited by an article in the same journal, whereas a more likely citation by an article in a different journal from a small group of journals is excluded.Similarly, in an intradisciplinary orthodox PH, H ldn = 0 only if each article is cited by an article in the same discipline, whereas a less likely citation by an article in a different discipline is excluded.
Figures A1 and A2 in Appendix B illustrate two sample calculation results for two randomly selected authors from the two remaining subjects: agricultural and biological sciences from life sciences (Table A5) and medicine from health sciences (Table A6).These representative authors were randomly selected from the Scopus sample described in Supplemental Materials V.In particular, the proportions for the authors in the health, life, physical, and social subjects were 29, 18, 44, and 9%, Figures A1 and A2 in Appendix A illustrate two sample calculation results for two randomly selected authors from the two remaining subjects: agricultural and biological sciences from life sciences (Table A5) and medicine from health sciences (Table A6).These representative authors were randomly selected from the Scopus sample described in Supplemental Materials V.In particular, the proportions for the authors in the health, life, physical, and social subjects were 29, 18, 44, and 9%, respectively; the relative standard deviations of the authors' numbers across disciplines in the health, life, physical, and social subjects were 1.90, 0.90, 0.65, and 0.85, respectively.The algorithms I have proposed for measuring the degree of interdisciplinarity (in percentages) prove to be applicable to these disciplines too, although the H-index values (in levels) are affected by the large number (an average of more than six) of coauthors that characterize these two disciplines.In particular, if an organization is interested in encouraging its authors to develop an interdisciplinary PH, it should look for PHs with large blue areas, whereas if it is more interested in heterodox PHs, it should look for PHs with large red areas.This approach could reduce the risk of discrimination against heterodox or interdisciplinary PHs, for which smaller values of H ltn are likely to be observed, by introducing some form of compensation (e.g., a smaller H ltn with large red or blue areas could be preferred to a larger H ltn with small red or blue areas).This is why I have focused on intentional and successful interdisciplinary and heterodoxy criteria in this study.
I have developed software that facilitates the calculations (http://www.mdpi.com/2304-6775/7/2/32/s1).Figures S1 and S2 in Supplemental Materials VI illustrate the software's interface for the same representative authors from the life and health sciences, respectively.

Discussion
Comparisons across disciplines might be irrelevant for the recruitment of junior researchers, since the latter authors are more likely to compete for a position within their own discipline.That is, if the goal is to recruit a good scientist in a given discipline, you do not need to compare them with scientists in other disciplines because (for example) only economists will apply for a position in economics.However, senior and junior researchers should be ranked according to their scientific production and productivity (i.e., H ltn and H cyn10 , respectively), which requires the ability to disentangle networking from research activity (i.e., H ltg vs. H ltn ) so that these activities can be separately and positively evaluated.The research activity should then be evaluated by potentially favoring or contrasting PHs according to basic intentional characteristics such as heterodox vs. orthodox articles and intra-vs.interdiscipline articles (expressed as proportions), and these should be distinguished from unsuccessful or unintentional characteristics.
Section 2.2 suggested the use of alternative nested H indexes based on linear and cubic fittings of standardized numbers of articles and citations, whereas Section 2.3 presented two-dimensional graphs based on alternative nested H and G indexes.These approaches would reduce incentives to engage in tactical or opportunistic behaviors in publication and citation by authors and journal editors [76][77][78][79][80][81][82], and should reduce discrimination against heterodox and interdisciplinary PHs that would be characterized by few citations and few articles [83,84].Table 4 summarizes suggested warning symptoms that could be used to identify potentially questionable practices by editors and authors, although future experimental work based on analytical insights will be necessary to test whether these symptoms truly indicate manipulation of the PH quality.It is important to note that if an indicator suggests the possibility of questionable behavior, this does not indicate the certainty of such behavior.Instead, the actual articles and citations should be carefully examined; there are many legitimate reasons for publishing many papers in the same journal (e.g., because it has the most suitable audience for a research result), for citing a colleague's work (e.g., because that work is most relevant to the author's paper), and so on.
In other words, the suggested methodology provides support for organizations that are interested in supporting networking and favoring orthodox and intradisciplinary researchers, or instead favoring heterodox and interdisciplinary researchers.Indeed, departments are often ranked according to articles by their members in few and specialized journals.
Although I found no papers that suggested algorithms to characterize a PH (i.e., it is not possible to compare the suggested methodology with other examples from the literature), the following main strengths of the new approach should be emphasized:

•
Many proposals for modifying the original H index have been accounted for [85], including the elimination of self-and reciprocal citations, an increased weighting of highly cited articles, a focus on peer-reviewed scientific journals, the use of fractional citations to account for the number of authors (i.e., awarding authors a fraction of a point instead of a full point for multi-author articles), an increased sensitivity to variability of the overall citation profile, and a consideration of the life cycle of an article.

•
Discrimination against interdisciplinary and heterodox PHs can be reduced by mitigating the bias created by conventional rankings, without relying on the application of advanced methodologies to complex datasets, as in the case of applying empirically based scaling factors to different disciplines [86], comparisons with the performance of other researchers in the same field [87], or comparison with the average number of citations per paper in a given discipline [25] • Most of the main questions left open by the original description of the H index have been tackled [88], including the attribution of an article to a given discipline, since this is done by the author.This is done while retaining the practicality and simplicity that made the original H metric so appealing to a large audience.

•
Indicators are distinguished according to the goals being pursued by amending well-established procedures such as years from publication rather than academic age (i.e., the duration of a researcher's career at the time of the analysis [89]), and the indicators can be applied at different levels of aggregation (e.g., at department or university levels).

•
Indicators are based on information that is available at an individual level, including citations that would be disregarded by the original H index [70], and the indicators can be easily computed.

•
Rankings can also be obtained when the publication period is prior to the citation period under consideration (e.g., neglecting citations older than 22 years rather than articles published more than 22 years ago).Indeed, I chose the third PH in Section 4 as a reference example to show how this feature of the proposed model works.
Although the suggested methodology is the first attempt in the literature to characterize a PH, some main weaknesses of this approach should be stressed:

•
Results depend on the dataset used, and many alternatives could be applied [10].However, the Scopus dataset for the last 22 years is both authoritative and comprehensive, and the same criticism could be raised for other datasets.

•
The focus is on past (retrospective) real performance rather than on future expected (prospective) performance [90,91].However, using impact factors to account for expected future performance would require a reliance on debatable information, such as the 2-year vs. 5-year impact factors described by Sangwal [57], from a dispersed and always in-progress dataset, as in the case of the temporal evolution of impact factors that is discussed by Finardi [92].In addition, there are potentially opposite interpretations.For example, the presence of few citations in journals with a high impact factor could be a negative feature, because it would represent the lack of ability to exploit an important audience.

•
Insights are not based on axiomatization, in which many alternatives could be suggested [93].However, the formulas are easy to implement and straightforward to interpret.

•
Characterization of the PHs depended on the simplifying assumption that a journal could not belong to two or more disciplines [25].Although factor analysis could be used to univocally sort journals into single hypothetical disciplines in terms of estimated correlations, this is unrealistic in practice because researchers may be unable to perform this analysis without support from suitable software.However, accounting for multidisciplinary journals remains a challenge for future research

Conclusions
To characterize a PH in terms of heterodoxy, orthodoxy, and interdisciplinarity, traditional bibliometric indexes of a researcher's PH must be modified.In the present study, I modified one of these indexes (Hirsch's H) by accounting for the feasible standardizations that have been suggested in the literature.I also accounted for the dispersion of a researcher's PH among disciplines or subjects by using the Gini index.
To allocate public funds among researchers, scientific activity must be prioritized and potentially questionable behaviors by authors and journal editors must be identified so that it can be accounted for in researcher evaluations.In the present analysis, I purified the traditional H index by eliminating information other than scientific activity.
To be widely used, bibliometric indexes must be easily calculated and highly relevant to the goals of the organization that is using them to evaluate researchers.The present study used information that is common to any bibliometric dataset.Because ongoing updates would be required as new papers by an author are published as well as when new citations of published articles are recorded, software is necessary to help authors and their managers rapidly recalculate the revised indexes described in this paper.Such software was developed as part of the present study.In other words, this study provides a simple methodology based on insights from the literature that allows researchers and their managers to easily characterize a researcher's PH.I do not use the adjective fair to describe this methodology, since that implies a value scale, but would instead describe it as a way to define the suitability for a given goal, which is a more objective criterion.
This study adopted a top-down structuralist approach (i.e., the structure of disciplines is fixed) as the most appropriate to measure the degree of interdisciplinarity, by using nested G indexes to measure the balance of publications in terms of journals and disciplines: variety is unessential in characterizing a PH.Moreover, this paper applied standardizations per author and per year to favor comparisons across disciplines.Finally, this study used nested H indexes based on citing articles to measure production and productivity of an author: the percentage of citations outside the discipline in the article's list of references to measure the degree of the process interdisciplinarity is inconsistent with the H index as an outcome index).Note that a bottom-up spatialist approach (i.e., the structure of disciplines is elicited from the analysis) based on coauthor analysis, by using diversity measures within a systems or between measures in knowledge networks, seemed to be inadequate to characterize a PH consisting of single-author articles.However, there is a lack of consensus on which approach should be adopted.In particular, the G index misses disparity and per author and per year standardizations do not solve all problems in cross-disciplinary comparisons, the H index shows inconsistency [94], although the per author and per year standardizations applied in this study mitigate it.In particular, the inconsistency highlighted by Waltman and Van Eck [94] in example 1 is accounted for by using a cubic interpolation for citations per year, whereas the inconsistency highlighted in example 2 is accounted for by using a linear interpolation for citations per author.
Consequently, the findings presented in this paper require further investigation.In particular, the G index could be replaced by other interdisciplinary indexes (e.g., the Shannon entropy index or the Rao-Stirling diversity index), to account for variety and disparity among journals and disciplines.Moreover, a statistical analysis based on a sufficiently large sample, both in terms of time (i.e., at least 10 years to consider the life cycle of articles) and in terms of authors (i.e., at least 10,000 PHs from all 27 disciplines to adequately represent the 10,000,000 scientists who have published at least one English article from 2007 to 2016 that were included in Scopus) would let us test whether the applied standardizations mitigate the problems in cross-disciplinary comparisons.Note that a statistical analysis based on a similarly large sample could be used to characterize the average author, whereas the micro-scale approach adopted in this study does not need to be scaled up.Finally, the H index could be replaced by other bibliometric indexes (e.g., the highly cited publications indicator by Waltman and Van Eck [94]), to account for its inconsistency.In other words, future research could implement the same methodology proposed by this study by using a set of indexes and standardizations which will be supported by a large theoretical consensus, and which can be operationally calculated from readily available information.

Figure 1 .
Figure 1.Least-squares linear interpolation fitting for the H index per author (i.e., scientific production): Hltn = 6.29 (R 2 = 0.66; regression sum of squares = 389; sum of squares for the residuals = 195; F (2, 22) = 44 (p < 0.001) with 2 parameters and 24 observations).The red increasing line represents y = x; the blue decreasing line represents the fitted curve.Abbreviations: l = linear interpolation; t = total number of publications and citations in the period 1995 to 2016; n = net citations.
Notation: Disc = Discipline; AN = Number of authors; Com = Computer Sciences; Eco = Economics, Econometrics and Finance; Eng = Engineering; Env = Environmental Sciences; Hum = Arts & Humanities; Man = Business, Management & Accounting; Mat = Mathematics.Boldfaced values represent the most recent 10 years.Disciplines in italics are in the social sciences, whereas other disciplines are in the physical sciences.

H cdn = 4 . 31 Notation
: a = all authors; c = cubic fitting; d = citations are in disciplines other than the discipline that published the cited article; g = gross number of citations; j = citations are in journals other than the journal that published the cited article; l = linear fitting; n = net number of citations; t = total articles; y = citations are divided by the number of years since publication; 10 = the 10-year period from 2007 to 2016 used in the present study.Both the gross and the net citation H indexes calculated based on data from www.scopus.comequaled 6.

Table 2 .
Summary of publication history (PH) characterizations in terms of key parameters.H d = H index for different disciplines, G d = G index for different disciplines, H j = H index for different journals, and G j = G index for different journals.Approximately, small G d = ≤ 0.2; small G j = ≤ 0.6; tiny H d ≤ 1; small H d for 1 < H d ≤ 3; tiny H j ≤ 2; small H j for 2 < H j ≤ 6.
Note that in this study, I used all information included in the Scopus dataset, together with the provided calculations of alternative H values that exclude reciprocal citations, and focus on the 22-year period from 1995 to 2016.Additional analysis for the more recent 10-year period from 2007 to 2016 is presented in Supplemental Materials II.In particular, the Scopus dataset includes 17,325,760 authors, and 23,953,840 English articles, with the following distribution among the 4 subjects in the 1995-2016 period: health 22%, life 24%, physical 43%, social 9%, 1% multidisciplinary, and 1% no subject.The means and standard deviations of H values that include reciprocal citations are 2.70 and 4.89, respectively; the means and standard deviations of H values that exclude reciprocal citations are 2.45 and 4.37, respectively.
• Disciplines: five in health sciences (medicine, veterinary, nursing, dentistry, and health professions), five in life sciences (pharmacology & toxicology, biological, neurology, agricultural, and immunology), nine in physical sciences (chemistry, physics & astronomy, and mathematics, Earth & planetary, energy, environmental, materials, engineering, and computing & information), and eight in social sciences (psychology, economics & econometrics & finance, arts & humanities, business & management & accounting, decision, politics, architecture, and sociology) Table A2 in Appendix A presents the H index values calculated for the first PH.Note that the cubic fitting emphasizes the most persistently successful articles.

Table 4 .
Summary of potentially questionable practices by editors and authors: sources, behavior types, observations, and warning indicators.

Table A2 .
Summary of the H index values calculated for the first analyzed PH (TableA1) over the author's whole career (i.e., the 22-year period from 1995 to 2016).

Table A3 .
Descriptive statistics and Scopus categories for the second analyzed PH, which is a representative intradisciplinary and intratopical heterodox PH.Each row represents a single publication; multiple publications in the same journal and same year are represented by separate rows.
Notation: Disc = Discipline; AN = Number of authors; Eco = Economics, Econometrics and Finance.

Table A4 .
Descriptive statistics and Scopus categories for the third analyzed PH, as a representative intradisciplinary and intertopical orthodox PH.Each row represents a single publication; multiple publications in the same journal and same year are represented by separate rows.Note that in contrast with the other two PHs, all publications by this author occurred before 1995.

Table A5 .
Descriptive statistics and Scopus categories for the randomly selected PH analyzed for the life sciences.Each row represents a single publication; multiple publications in the same journal and same year are represented by separate rows.
Notation: Disc = Discipline; AN = Number of authors; Agr = Agricultural and Biological Sciences; Env = Environmental Sciences.

Table A6 .
Descriptive statistics and Scopus categories for the randomly selected PH analyzed for the health sciences.Each row represents a single publication; multiple publications in the same journal and same year are represented by separate rows.