Coping with the Inequity and Inefficiency of the H-Index: A Cross-Disciplinary Empirical Analysis

: This paper measures two main inefficiency features (many publications other than articles; many co-authors’ reciprocal citations) and two main inequity features (more co-authors in some disciplines; more citations for authors with more experience). It constructs a representative dataset based on a cross-disciplinary balanced sample (10,000 authors with at least one publication indexed in Scopus from 2006 to 2015). It estimates to what extent four additional improvements of the H-index as top-down regulations ( ∆ H h = H h − H h+1 from H 1 = based on publications to H 5 = net per-capita per-year based on articles) account for inefficiency and inequity across twenty-five disciplines and four subjects. Linear regressions and ANOVA results show that the single improvements of the H-index considerably and decreasingly explain the inefficiency and inequity features but make these vaguely comparable across disciplines and subjects, while the overall improvement of the H-index (H 1 –H 5 ) marginally explains these features but make disciplines and subjects clearly comparable, to a greater extent across subjects than disciplines. Fitting a Gamma distribution to H 5 for each discipline and subject by maximum likelihood shows that the estimated probability densities and the percentages of authors characterised by H 5 ≥ 1 to H 5 ≥ 3 are different across disciplines but similar across subjects.


Introduction
To the best of our knowledge, few papers (e.g., [1,2]) suggest an index to evaluate interdisciplinary CVs (i.e., authors applying usual methodologies to unusual topics or vice versa).In particular, Zagonari [1] identifies the interdisciplinary percentage of any CV (i.e., articles in a discipline or subject quoted by articles in different disciplines or subjects) to be applied to the H-index characterising each author, where the H-index is chosen as an easily generated quantitative index.However, this interdisciplinary index requires a homogeneous H-index across disciplines or subjects to avoid gains for some interdisciplinary scientists (e.g., across medicine and computing) and losses for other interdisciplinary scientists (e.g., across art and mathematics) [3].
Within the huge theoretical and empirical literature on variants and extensions of the H-index (e.g., a-index, ar-index, m-quotient, raw h-rate, contemporary h-index, f-index, t-index, wu-index, maxpord index, q 2 -index within variants, and hw-index, hm-index, hi-index, hc-index, m-quotient, ht-index, fraction count on citation, fraction count on paper, age-based h-index) [4][5][6], some papers suggest some improvements of the H-index to increase homogeneity across disciplines (e.g., [7][8][9][10]).In particular, Zagonari [9] develops an empirically validated theoretical model of a researcher's publication goal by providing two internal criteria (i.e., efficiency and equity) to evaluate a bibliometric index (i.e., it grounds these concepts on an analytical model representing the researchers' incentives to maximise their H-index) and by suggesting two standardisations (i.e., calculate publications

•
Inefficiency (i.e., biased incentives to the research activity in terms of scientific achievements) is managed by focusing on articles instead of publications (i.e., publications include non-peer reviewed research) (Inefficiency a, Ifa hereafter) and by using net instead of gross citations (i.e., gross citations include co-authors' reciprocal citations) (Inefficiency b, Ifb hereafter).In terms of H-index improvements, ∆H 1 = H 1 − H 2 deals with the overvaluation of possibly non-original research such as reviews, proceedings, and editorials, where H 1 = H-index based on publications and H 2 = H-index based on articles; ∆H 2 = H 2 − H 3 deals with the overemphasis put on co-authors' reciprocal citations as a measure of actual knowledge diffusion, where H 3 = H-index based on net citations for articles.• Inequity (i.e., biased rankings in favour of some authors and some disciplines) is managed by using a per-capita H-index to account for the different co-authorship practices prevailing in different disciplines (i.e., more co-authors in some disciplines) (Inequity a, Iqa hereafter) and by using a per-year H-index to account for the different citation periods related to authors with more scientific experience (i.e., they can rely on a longer citation period) (Inequity b, Iqb hereafter).In terms of H-index improvements, ∆H 3 = H 3 − H 4 deals with the huge differences in the number of co-authors and thus in the number of articles in favour of some disciplines, where H 4 = net per-capita H-index based on articles; ∆H 4 = H 4 − H 5 deals with the obviously large number of citations received by researchers with more experience and thus the likely worse assessment of the scientific production in disfavour of researchers with less experience, where H 5 = net per-capita per-year H-index based on articles.
Note that all acronyms and variables are described in Table 1.Zagonari [9] is focused on the degree of efficiency and equity rather than on the homogeneity across disciplines which can be achieved by the suggested standardisations and guidelines, and a theoretical approach is adopted (although the structural model is validated in terms of means and variances) rather than a statistical approach (where reduced models are tested in terms of residuals and distributions).Note that Ifa favours authors who minimise efforts and risks related to a collaborative and creative scientific activity by relying on the large number of citations to reviews.Ifb favours authors who spend efforts on networking at no risk rather than on a creative scientific activity.Iqa favours authors who reduce efforts and risks (i.e., they misuse the prevailing measurement of scientific activity based on the principle "one article with n co-authors is n articles"), by spending efforts on networking rather than on a creative scientific activity.Iqb favours authors who minimise efforts at no risk (i.e., they misuse the prevailing measurement of scientific activity based on the principle "the overall sum of citations matters") by spending efforts on networking rather than on a creative scientific activity.
Within the recent empirical literature on indexes for interdisciplinary science (e.g., [11]), the purpose of this paper is to statistically test to what extent the suggested improvements of the H-indexes [9], considered as top-down regulations, account for different publication and citation habits characterising different disciplines and subjects in order to enable suitable comparisons of interdisciplinary scientists [1] across disciplines and subjects.To do so, we suggest some measures of inefficiency and inequity in Section 2. We construct a representative sample in Section 3. We provide results for each single H-index improvement ∆H h based on linear regressions and analysis of variance (ANOVA) in Section 4.1 as well as results for H 5 based on maximum likelihood fittings and quantile analysis in Section 4.2 by introducing the assumption that the observed H 5 values for both disciplines and subjects are realisations of a gamma distribution.This is followed by a discussion of findings, weaknesses, and strengths in Section 5, before conclusions and final remarks about methodological and practical potentials in Section 6.
Dummy variable for discipline j (j = 1,. .., 27) (i.e., D j takes value 1 for a discipline j and 0 for disciplines other than j)

S k
Dummy variable for subject k (k = 1, 2, 3, 4) (i.e., S k takes value 1 for a subject k and 0 for subjects other than k)

Aaut
Dummy variable for age (Aaut = 1 if author's first publication is after 2009, otherwise Aaut = 0) Note that the use of H-index improvements suggested by Zagonari [9] instead of other developments will be justified in Section 5.Moreover, our observation unit will be each author (A i ), rather than journals (e.g., [12]) or institutions (e.g., [13]), since our goal is the evaluation of interdisciplinary scientists.Finally, the use of H-index improvements as policies suggested by Zagonari [9] will be justified in Section 6.
In other words, the research questions of the present study can be summarised as follows: 1.
Does each single H-index improvement ∆H h properly solve inefficiency and inequity issues?2.
Does each single H-index improvement ∆H h spread inefficiency and inequity issues uniformly across disciplines D j and subjects S k ? 3.
Can any discipline D j and subject S k be distinguished from other disciplines and subjects, respectively, net of ∆H h ? 4.
Does the comprehensive H-index improvement H 1 -H 5 properly solve inefficiency and inequity issues?5.
Does the comprehensive H-index improvement H 1 -H 5 spread inefficiency and inequity issues uniformly across disciplines D j and subjects S k ?6.
Can any discipline D j and subject S k be distinguished from other disciplines and subjects, respectively, net of H 1 -H 5 ?

7.
Are disciplines D j and subjects S k characterised by similar parametric distributions for H 5 (i.e., plots have similar shapes) and by similar right tails (i.e., similar percentages of authors with H 5 ≥ 1, H 5 ≥ 1.5, H 5 ≥ 2, H 5 ≥ 2.5, H 5 ≥ 3)?Note that improvements of H-indexes are taken as policies based on limited information about each single author (e.g., year of the first publication) or discipline and subject (e.g., average number of citations per article).Moreover, the application of the interdisciplinary index to a homogeneous H-index refers to subjects to a greater extent than disciplines (i.e., really powerful interdisciplinary research is across subjects).Statistical analyses of this feature are presented for subjects in the Results and for disciplines in the Appendices A-D.Finally, as for a classification of studies on bibliometrics in terms of internal vs. external criteria (e.g., [14]) and in terms of theoretical vs. empirical approaches (e.g., [15]), the present paper refers to external theoretical concepts (i.e., efficiency and equity, by adding the concept of homogeneity across disciplines), but it adopts an empirical approach.This feature implies that our results will depend on the used sample: in other words, a theoretical proof based on internal criteria will not be attained, similarly to all other empirical studies.However, we will refer to external criteria of judgment supported by the structural model validated in Zagonari [9] and we will perform a statistical analysis of reduced forms of the same model by referring to the same dataset used in Zagonari [9].
In summary, by focusing on subjects S k , apart from ∆H 2 , neither each single H-index improvement nor the comprehensive H-index improvement solves inefficiency and equity issues (i.e., answer NO to research questions 1 and 4), although the comprehensive H-index improvement makes them confidently uniform across subjects (i.e., answer NO to research question 2 and answer YES to research question 5).Moreover, apart from subject S 1 for Ifa and subject S 4 for Ifb, all subjects are similar in terms of inefficiency and inequity (i.e., answer NO to research question 3 and answer NO to research question 6).Finally, subjects show similar gamma distribution fits and similar right tails (i.e., answer YES to research question 7).

Measuring Inefficiency and Inequity by H-Indexes
In order to check if improved H-indexes solve efficiency and equity problems on average and to a greater or smaller extent in each single discipline, efficiency and equity and H improvements described in Section 1 are specified as follows (i.e., Npub = number of publications, Nart = number of articles, Ngro = number of citations including co-author citations, Nnet = number of citations excluding co-author citations, Naut = mean number of co-authors for each author, expert = authors with the first publication before 2011 to include up to 10 years from 2006 to 2015, inexpert = authors with the first publication after 2010 to include up to 5 years from 2006 to 2010):

•
Inefficiency a (i.e., many publications other than articles for each author): • Inefficiency b (i.e., many co-authors' reciprocal citations for each author): • Inequity a (i.e., more co-authors in some disciplines): • Inequity b (i.e., more citations for authors with more experience): where issues are depicted on the left-hand side (lhs), policies and disciplines are depicted on the right-hand side (rhs), and "lhs ~rhs" stands for "lhs depends on the variables on the rhs".Note that Equations ( 1)-( 4) are reduced analytical forms of the structural theoretical model specified in Zagonari [9].In particular, each ∆H h depicts to what extent each bias on the left-hand side is tackled by the improved H-index as a top-down regulation (i.e., ∆H ), whereas the dummy variables D j (i.e., D j takes value 1 for a discipline j and 0 for disciplines other than j) represent to what extent each discipline is not well represented by each ∆H h .We will perform similar analyses for subjects S k .Appendix A provides the list of the 27 disciplines used by Scopus, whereas the 4 subjects can be detailed as follows: 1. health (i.e., medicine, veterinary, nursing, dentistry, health professions), 2. life (i.e., pharmacology and toxicology, biological, neurology, agricultural, immunology), 3. physical (i.e., chemistry, physics and astronomy, mathematics, Earth and planetary, energy, environmental, materials, engineering, computing and information), and 4. social (i.e., psychology, economics and econometrics and finance, arts and humanities, business and management and accounting, decision, sociology).Note that Npub is likely to be underestimated, since many reviews are published as articles.Moreover, estimations are based on differences if the bias under consideration affects the number of citations only (i.e., Equations ( 2) and ( 4)) as well as if the bias affects both the number of authors and the number of citations (i.e., Equations ( 1) and ( 3)).Finally, each subsequent bias is additional to the previous one.Thus, in order to test the performances of the comprehensive H-index improvement for addressing the overall bias, we will refer to the following equation: (5) where the lhs represents the overall bias, since the focus is on publications for expert authors and on articles for inexpert authors.We will perform similar analyses for subjects by replacing the dummy variables for disciplines D j with the dummy variables for subjects S k .
Note that Zagonari [9] does not include H 4 and H 5 .Moreover, Zagonari [1] shows that interdisciplinary science requires an additional category, together with orthodox science (i.e., authors publish in a single discipline and in many journals, and the vast majority of the citations are in few disciplines but in many different journals) and heterodox science (i.e., authors publish in a single discipline and in a few journals devoted to that discipline, so that the vast majority of citations are in few disciplines and few journals), to be combined with H 5 to reduce unfair rankings between interdisciplinary scientists (i.e., authors publish in many disciplines and journals, and the vast majority of citations are in many different disciplines and journals) across different disciplines as well as between interdisciplinary scientists and single-discipline scientists collaborating with many authors from different disciplines.Finally, Zagonari [9] does not include quantitative results based on linear regressions or parameter estimations.

Constructing the Dataset
In order to obtain a representative dataset for authors, we applied the following stratified sampling.The reference population consists of authors with at least one publication in the Scopus dataset from 2006 to 2015.This population is partitioned by discipline: we used the 27 scientific disciplines suggested by Scopus [16].
By preserving the percentages of authors in each scientific discipline, 10,000 authors are then randomly extracted from the Scopus database.This design required the attribution of each author to a single discipline: we used the attribution suggested by Scopus, where an author is linked to the discipline with the largest percentage of publications.
Tables 2 and 3 provide the summary statistics for subjects S k , while Appendix B provides the summary statistics for disciplines D j .Altogether, the dataset includes 1,487,866 co-authors, 507,557 papers, 31,950 journals, and 562,688 citations.The Supplementary Materials provide the histograms of H 1 and H 5 for both disciplines D j and subjects S k .

ANOVA and Linear Regressions
In order to check if the improved H-indexes solve efficiency and equity problems on average and to a greater or smaller extent in each discipline, we will perform ANOVA based on linear regressions [17] (see Appendix D for ANOVA based on a quasi-Poisson distribution).In particular, we will translate the theoretical models presented in Section 2 into regression models as follows (i.e., Aaut takes value 1 for authors with the first publication after 2010):

•
Inefficiency a (i.e., many publications other than articles for each author): • Inefficiency b (i.e., many co-authors' reciprocal citations for each author): • Inequity a (i.e., more co-authors in some disciplines): • Inequity b (i.e., more citations for authors with more experience): where ~means that the variable on the lhs is linearly dependent on the variables on the rhs.Note that we disregarded discipline #10 (multidisciplinary) and discipline #36 (health professions), since few authors are attached to them in our sample (i.e., 2 authors for discipline #10 and 1 author for discipline #36).Moreover, in the model for Iqa (i.e., Equation ( 8)), Naut is moved to the rhs.Indeed, this inequity is based on the prediction that authors with several co-authors are likely to achieve more citations (i.e., Naut is a stochastic variable).We estimated this relationship before trying to explain the impact of this H-index improvement on rhs and check for residual heterogeneity explained by the discipline dummies.Finally, in the model for Iqb (i.e., Equation ( 9)), both Naut and Aaut are moved to the rhs.Indeed, this inequity is based on the prediction that expert authors with several co-authors are likely to achieve more citations (i.e., Naut and Aaut are stochastic variables).Again, we estimated these relationships before trying to explain the impact of this H-index improvement on rhs and check for residual heterogeneity explained by the discipline dummies.Thus, the overall bias can be represented by: where the focus is on articles.We will perform similar analyses for subjects S k .
Note that we will check firstly for the significance levels of variables and secondly for their coefficient values.Moreover, Equation ( 10) can be obtained by summing up terms of rhs and lhs in Equations ( 6)-( 9).Finally, we will check for differences between specific disciplines or subjects only if their general explanation of variability is significant.
A methodological remark is worth making here.An ANOVA exercise as a descriptive method (i.e., calculated significance levels) relies on the assumption of normal distributions, although its descriptive statistics (i.e., estimated coefficients and explained variance) allows a simple interpretation of results.Each ANOVA table in Section 4.1 is associated with an analogous table in Appendix D based on a log-linear model involving Poisson and negativebinomial distributions; additional methodological details are provided in Appendix D.

Inefficiency a (Ifa) (Many Publications Other Than Articles)
All authors in our dataset have H 1 = H 2 (i.e., ∆H 1 does not affect Inefficiency a).In other words, publications other than articles do not affect their H-index (e.g., eminent authors are asked to write a review or a book).Consequently, we will apply ANOVA only to disciplines and subjects.In particular, Table 4 shows that the variance explained by disciplines D j is mildly significant but tiny.In contrast, Table 5 shows that the variance explained by subjects S k is significant but tiny.Note that we will hereafter use slightly significant whenever * applies (i.e., significant at 95%), mildly significant whenever ** applies (i.e., significant at 99%), and significant whenever *** applies (i.e., significant at 99.9%).Next, Table A4 in Appendix C shows that, apart from D 27 and D 35 , all disciplines D j are characterised by a percentage of publications other than articles smaller than 1.Similarly, Table 6 shows that, apart from S 1 , all subjects S k are characterised by a percentage of publications other than articles smaller than 1.The application of ANOVA to Ifb (i.e., Equation ( 7)) shows that ∆H 2 explains 26.79% of its variability.In particular, Tables 7 and 8 show that the residual variances explained by disciplines D j and subjects S k are slightly significant and tiny (i.e., ∆H 2 makes D j and S k homogeneous with respect to Ifb).In other words, 26.79% of the Ifb variability is explained by ∆H 2 .The remaining 73.21% of its variability can be decomposed in variance between disciplines in Table 7 (subjects in Table 8) that accounts for only 0.29% (0.06% in Table 8) and variance within disciplines (subjects) that accounts for the remaining 72.92% (73.15% in Table 8).Thus, even if disciplines and subjects are slightly significant, these factors explain very little of Ifb (i.e., we can safely affirm that, once the Ifb variability explained by ∆H 2 is removed, the remaining variance is within disciplines and subjects to a greater extent than across disciplines and subjects: 72.92% > 0.29%).Next, Table A5 in Appendix C shows that, apart from D 13 , D 27 , and D 31 , all disciplines D j are characterised by an insignificant intercept in the linear regression (7) (i.e., ∆H 2 depicts intercepts for those disciplines), where two significant coefficients are positive and large (i.e., larger than 0.1).Similarly, Table 9 shows that, apart from S 4 , all subjects S k are characterised by a significant and positive intercept in the linear regression (7) (i.e., ∆H 2 depicts intercepts only for that subject), where one significant coefficient is positive and tiny (i.e., larger than 0.1).Note that we did not detail comparisons between subjects and disciplines, since the variance explained by subjects and disciplines altogether is slightly significant and tiny.

Inequity a (Iqa) (More Co-Authors in Some Disciplines)
The application of ANOVA to Iqa (i.e., Equation ( 8)) shows that ∆H 3 describes 4.38% of its variability.In particular, Table 10 shows that the variance explained by disciplines D j is significant and small (i.e., ∆H 3 does not make Iqa homogeneous across disciplines D j ).In contrast, Table 11 shows that the variance explained by subjects S k is significant and tiny (i.e., ∆H 3 makes Iqa homogeneous across subjects S k ).Note that Table 12 for subjects and Table A6 in Appendix C for disciplines suggest that adding one co-author to an author significantly affects the number of net citations by around 0.01.In other words, Iqa, although it is statistically significant, it turns out to be a marginal feature for most authors (i.e., to have a substantial effect on the number of citations, an author should publish with several hundreds or thousands of co-authors).However, ∆H 3 contains more information than the number of authors to explain the number of net citations, both for subjects (i.e., 1.68 > 0.0097) and disciplines (i.e., 1.61 > 0.0105).are characterised by a significant intercept in the linear regression (8) (i.e., ∆H 3 depicts intercepts only for those disciplines), where four significant coefficients are positive and large (i.e., larger than 5).Similarly, Table 12 shows that all subjects S k are characterised by a significant and positive intercept in the linear regression (7) (i.e., ∆H 3 does not depict intercepts for subjects), where two significant coefficients are positive and large (i.e., larger than 5).
Note that Tables 12 and A6 in Appendix C show that there is still some heterogeneity not explained by ∆H 3 across disciplines and subjects.In particular, looking at the estimated coefficients and their standard errors, it seems that Subject 1 (health) is similar to Subject 2 (life) and Subject 3 (physical) is similar to Subject 4 (social).The significance levels reported in Table 13 sustain the above conjecture: once the effect of ∆H 3 is removed, the residual Iqa is different between Subject 1 (health) and Subject 2 (life) on one side and between Subject 4 (social) and the other subjects on the other side.Similarly, Figure S5 in the Supplementary Materials reports the significance levels of the differences between the dummies for disciplines D j : there is evidence of residual heterogeneity only for disciplines 13, 27, and 28.This suggests that the set of disciplines could be partitioned into two groups, with disciplines 13, 27, and 28 in the first group, and the other disciplines in the second group.The application of ANOVA to Iqb (i.e., Equation ( 9)) shows that ∆H 4 explains 0.46% of its variability.In particular, Table 14 shows that the variance explained by disciplines D j is significant and small (i.e., ∆H 4 does not make Iqb homogeneous across disciplines Dj).In contrast, Table 15 shows that the variance explained by subjects S k is significant and tiny (i.e., ∆H 4 makes Iqb homogeneous across subjects S k ).
Note that, looking at the estimated coefficients, an expert researcher receives on average 8.84 and 8.86 more net citations per article than an inexpert researcher (Table 16 for subjects and Table A7 in Appendix C for disciplines).However, this relation describes 4.48% of the net citation variability (Table 14 for disciplines and Table 15 for subjects).In other words, Iqb is significantly present in our sample.Next, Table A7 in Appendix C shows that, apart from D 18 and D 29 , all disciplines D j are characterised by a significant intercept in the linear regression (9) (i.e., ∆H 4 depicts intercepts only for those disciplines), where ten significant coefficients are positive and large (i.e., larger than 10).Similarly, Table 16 shows that all subjects S k are characterised by a significant and positive intercept in the linear regression (9) (i.e., ∆H 4 does not depict intercepts for subjects), where two significant coefficients are positive and large (i.e., larger than 10).
Note that Table 17 shows that, once Iqb is explained by ∆H 4 , there is still some heterogeneity across subjects, where Subject 1 (health) is similar to Subject 2 (life) on one side and Subject 3 (physical) is similar to Subject 4 (social) on the other side.Similarly, Figure S6 in the Supplementary Materials shows that apart from disciplines 13, 28, 27, and 30, all disciplines are similar.The application of ANOVA to the overall bias (i.e., Equation ( 10)) shows that ∆H 5 explains 0.04% of its variability.In particular, Table 18 shows that the variance explained by disciplines D j is significant and null (i.e., ∆H 5 makes D j homogeneous with respect to the overall bias).In contrast, Table 19 shows that the variance explained by subjects S k is slightly significant and null (i.e., ∆H 5 makes S k homogeneous with respect to the overall bias).are characterised by a significant intercept in the linear regression (10) (i.e., ∆H 5 depicts intercepts only for those disciplines), where one significant coefficient is negative and large (i.e., larger than 0.5).Similarly, Table 20 shows that all subjects S k are characterised by a significant and negative intercept in the linear regression (10) (i.e., ∆H 5 does not depict intercepts for subjects), where all coefficients are negative and small (i.e., smaller than 0.5).Note that Figure S7 in the Supplementary Materials highlights an overall homogeneity across disciplines, apart from D 31 and D 16 (i.e., only D 31 and D 16 are statically different from the other dummies), while we did not detail the differences between subjects, since the variance explained by subjects altogether is slightly significant and tiny.In other words, ∆H 5 explains the overall bias across disciplines, except for those three disciplines.Therefore, H 5 turns out to be satisfactory in making the overall bias homogeneous across disciplines and subjects.In Section 4.2, we will focus on H 5 to perform additional analyses.Note that the author profile within Scopus enables the calculation of H 1 , H 2 , and H 3 .

Maximum Likelihood Fittings
Figure 1 shows the histograms of H 5 for the 25 disciplines.Figure 2 presents the maximum likelihood fittings of gamma distributions for the 25 disciplines.Table A9 in Appendix C shows the percentages of authors characterised by H 5 larger than 1, 1.5, 2, 2.5, and 3 for the 25 disciplines.
Publications 2024, 12, x FOR PEER REVIEW 13 of 28 Therefore, H5 turns out to be satisfactory in making the overall bias homogeneous across disciplines and subjects.In Section 4.2., we will focus on H5 to perform additional analyses.Note that the author profile within Scopus enables the calculation of H1, H2, and H3.

Maximum Likelihood Fittings
Figure 1 shows the histograms of H5 for the 25 disciplines.Figure 2 presents the maximum likelihood fittings of gamma distributions for the 25 disciplines.Table A9 in Appendix C shows the percentages of authors characterised by H5 larger than 1, 1.5, 2, 2.5, and 3 for the 25 disciplines.Thus, disciplines are characterised by different gamma distributions and different quantiles.
Figure 3 shows the histograms of H 5 for the four subjects.Figure 4 presents the maximum likelihood fittings of gamma distributions for the four subjects.Table 21 shows the percentages of authors characterised by H 5 larger than 1, 1.5, 2, 2.5, and 3 for the four subjects.Figure 3 shows the histograms of H5 for the four subjects.Figure 4 presents the maximum likelihood fittings of gamma distributions for the four subjects.Table 21 shows the percentages of authors characterised by H5 larger than 1, 1.5, 2, 2.5, and 3 for the four subjects.Thus, subjects are characterised by similar gamma distributions and similar quantiles.

Discussion
We applied ANOVA analyses and linear regressions together with maximum likelihood fittings and quantile analyses to answer the seven research questions specified in Section 1.
The main specific insights obtained can be summarised as follows.By focusing on disciplines D j , apart from ∆H 2 , neither each single H-index improvement nor the comprehensive H-index improvement solves inefficiency and equity issues (i.e., answer NO to research questions 1 and 4), although the comprehensive H-index improvement makes them slightly different across disciplines (i.The main general insights obtained can be summarised as follows.By referring to Table 18 for disciplines, the variability of the overall bias amounts to 10,162 (i.e., sum of squares is 1562 + 71 + 8529), where 15.37% (i.e., 1562/10,162) of this bias could have been reduced by using ∆H 5 .The remaining bias is within disciplines for 83.93% (i.e., 8529/10,162) and across disciplines for less than 1%.Similarly, by referring to Table 19 for subjects, the variability of the overall bias amounts to 10,162 (i.e., sum of squares is 1562 + 14 + 8586), where 15.37% (i.e., 1562/10,162) of this bias could have been reduced by using ∆H 5 .The remaining bias is within subjects for 84.49% (i.e., 8586/10,162) and across subjects for less than 1%.
Note that Inefficiency b (i.e., many co-authors' reciprocal citations) turning out to be statistically significant for few disciplines (i.e., D 13 , D 27 , and D 31 ) could be interpreted as inequity.
Therefore, the present study shows that the net per-capita per-year H-index based on articles can be used to evaluate interdisciplinary scientists.In particular, the empirical approach adopted in the present study highlighted that the suggested improvements of the H-index as policies did not implement efficiency and equity across disciplines and subjects in the dataset under consideration, although all suggested improvements combined produced homogeneity across subjects (i.e., a crucial feature in evaluating interdisciplinary science).Note that a homogeneous H-index across subjects is a necessary condition for a proper assessment of interdisciplinary science, whereas interdisciplinary science does not solve theoretical and empirical problems identified for the H-index.Next, the empirical demonstration that the suggested improvements of the H-index represent a useful tool to evaluate interdisciplinary science does not imply that it will be used whenever comparisons between subjects are required (e.g., in allocating funds in interdisciplinary departments), although it can be easily implemented (e.g., an algorithm is available on Scopus.com to compute the efficiency improvements of the H-index; Zagonari [1] provides software to calculate all suggested improvements of the H-index) [18].In other words, the adoption of homogeneity as a criterion is a political/academic decision rather than a technical/scientific issue [19].
Nevertheless, two main limits of the present study must be highlighted.First, the applications of ANOVA and linear regressions are justified by the straightforward interpretation of their results, although they provide a statistical description of the sample under consideration based on the assumption of a normal distribution.However, the references to reduced forms (see Equations ( 1)-( 5)) of the structural model developed by Zagonari [9] (i.e., a very plausible model for authors who aim at maximising their H-index) and the consistent results (see Appendix D) obtained by applying a weighted quasi-Poisson distribution (i.e., a very plausible distribution for the stochastic phenomenon of articles' citations) [20] seem to also support similar insights outside the sample under consideration.Second, the association of each author with a single discipline is justified by the classification of authors adopted by the Scopus dataset, although it might be too simplistic for some authors.However, a continuous classification of authors in terms of percentages to weigh all disciplines in the publication experience of each author would require a similar continuous classification for all journals and all articles.Some methodological remarks are worth making here.
Improvements of the H-index other than those suggested by Zagonari [9] could have been used.In particular, we disregarded: 1.
Impact factors [21].However, this feature is misleading, since a paper poorly cited but published in a high-impact journal should be punished rather than rewarded, since it wasted a popular stage.2.
Gender differences [22].However, this feature is irrelevant in making disciplines and subjects homogeneous.
Negative citations [6].However, this bias is likely to be negligible, since papers criticising a paper do not need to quote it many times.5.
Country differences [26].However, this feature is irrelevant in making disciplines and subjects homogeneous.6.
Co-authorship networks [27].However, this feature is misleading in focusing on inefficiency and inequity across authors in different disciplines and subjects.
Note that we omitted the editors' trick of magnifying citations of papers published in a journal as a precondition to publish in it, since some journals are often tightly linked to some topics.
H-index dynamics.In fact, other papers focused on the same feature [11].2.
Linear regressions.However, non-linear estimations require additional assumptions (e.g., a Poisson distribution based on random and over-time independent citations for over-time constant authors) and make interpretations of results more complicated (e.g., impacts of alternative policies ∆H h and different disciplines D j or subjects S k are non-additive) [6].3.
Gamma distributions.In fact, other papers used the same distribution [24].
Note that we standardised with respect to each author rather than with respect to disciplines, while possible specific features of disciplines and subjects are caught by dummies D j and S k .
In summary, the main strength of the present study is the reference to scientific 27 disciplines and four subjects.For example, Ryan [28] estimates the same variant of the H-index (i.e., H 5 ), but it refers to 474 observations in five colleges.Next, the main weakness of the present study is its descriptive rather than predictive purpose, by discussing which top-down regulation could have made disciplines and subjects homogeneous in the sample rather than which top-down regulation could make disciplines and subjects homogeneous in the future.For example, Moreira et al. [29] apply the functional form of the distribution of the asymptotic number of citations but to 1283 authors in seven disciplines only (i.e., a similar topic but a smaller sample).Similarly, Kupper [30] applies random forests and gradient boosting machines to 111,156 authors in a single discipline but to predict gender bias (i.e., a similar sample but a narrower topic).

Conclusions
The purpose of the present study was to identify an improvement of the H-index, as an easily generated quantitative index based on a readily accessible set of information, in order to enable suitable comparisons of interdisciplinary scientists.We succeeded by considering alternative H-index improvements as top-down regulations (i.e., aware that there is no single bibliometric index accounting for all biases in all disciplines and subjects) and by focusing on both disciplines and subjects (i.e., aware that differences across subjects are more important than differences across disciplines to compare interdisciplinary scientists).Indeed, the net per-capita per-year H-index based on articles does not account for the total variance, although it makes disciplines significant but irrelevant (i.e., research question 1) and subjects insignificant and irrelevant in explaining the total variance (i.e., research questions 2 and 5).Moreover, some disciplines and subjects are highlighted for some H-index improvements (i.e., research questions 3 and 6).Finally, the net per-capita per-year H-index based on articles produces similar gamma distributions and quantiles for subjects but not for disciplines (i.e., research question 7).
In fact, we did much more than identifying an H-index improvement to compare interdisciplinary scientists by suggesting a procedure to empirically evaluate alternative bibliometric indexes.Indeed, it is weak to criticise an index because it does not identify a specific award in a given year (e.g., [4,31,32]) (i.e., critiques from outside but empirical).Moreover, it is not possible to identify a bibliometric index accounting for the many different practices across disciplines (e.g., patents are useful for engineering and chemistry, but inapplicable to arts or economics; the many authors in physics and the many citations in computing cannot be properly compared with the few authors in humanities and the few citations in economics) [33].Finally, it is weak to criticise an index because it does not account for a specific feature (i.e., critiques from inside but theoretical) [34].
In other words, without any ambition to solve all different (good and bad) practices across disciplines and subjects by relying on general information, we criticised alternative variants of the H-indexes in terms of external and theoretically straightforward criteria (i.e., inefficiency and inequity) by testing the improvements of the H-indexes as policies in achieving an external and empirically straightforward goal such as homogeneity of disciplines and subjects (i.e., theoretical critiques from outside but empirically tested).Note that a possible change in standards in measuring scientific production towards the net per-capita per-year H-index based on articles could foster a potential change in behaviours in publication practices.For example, instead of adding authors as a costless practice, one could organise a network of authors in triplets, where one author appears in each triplet.However, this practice is not costless in terms of coordination efforts and it will favour a smaller number of authors such as the department heads.
The present study could be developed by using a more recent dataset to test the same structural model behind it.However, researchers at Scopus should be engaged to produce a similar sample (i.e., a stratified random sampling requires the complete list of authors).Moreover, the structural model we referred to in our study should be validated again.Finally, in case of adoption of our improved H-indexes within the Scopus framework, everybody could test the present study in any alternative period of time by referring to the same statistics for the whole population of authors.

Author Contributions:
The authors contributed equally to all activities required by the present study (i.e., conceptualization, methodology, validation, formal analysis, investigation, resources, writing-original draft preparation, writing-review and editing, visualization).All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.The dependent variable in the model behind Tables A10 and A11 is the number of publications that are not articles.This count model has been estimated by an overdispersed Poisson (also known as quasi-Poisson) GLM with the canonical log link function.In particular, Tables A10 and A11 report the resulting ANOVA deviance tables, where the p-values are computed by likelihood ratio tests [20].In contrast, the models behind Tables A12-A19 explain an average of different kinds of citations per article (i.e., an average of count data): for example, Ngro, Nnet, or Ngro-Nnet in Tables A12 and A13.We used an over-dispersed Poisson distribution to model these dependent variables.In particular, Tables A12-A19 show ANOVA deviance analysis for weighted GLM regressions based on an over-dispersed Poisson distributional assumptions (also known as weighted quasi-Poisson regressions).Note that the concept of variance used in Tables 4, 5, 7, 8, 10, 11, 14, 15, 18 and 19 is now replaced by the concept of deviance in Tables A10-A19, where deviance is not simply a sum or average of squared residuals, although the levels of deviances represent a measure of information progressively explained by each factor.
Moreover, the quasi-Poisson regression is robust with respect to the distribution specification as it relies only on an assumption of proportionality between the variance and expectation parameters rather than on the specific distribution shape.In particular, with a proper parameterisation, the widely used negative binomial shows the following property: Var[Y] = E[Y]/p, when Y is negative binomial, and p is its "success probability" parameter [20,35].Since the use of the hurdle or zero-inflated model would have required the specification of the dependence of its additional parameter on the exogeneous variable in each model [35], we chose to avoid this level of complexity for our statistical models and analyses; the extremely high significance levels obtained seem to support our distributional choice.
Finally, the variance progressively explained by each factor and the residual variance shown in Tables 4, 5, 7, 8, 10, 11, 14, 15, 18 and 19 have a similar interpretation of the deviance reported in Tables A10-A19.In particular, a similar but weaker phenomenon is observed.For example, the deviance associated with ∆H 5 in Tables A18 and A19 on the

Figure 1 .
Figure 1.Histograms of H5 for disciplines Dj.Different colors for different disciplines.The numbers of disciplines from 11 to 35 are consistent with the scientific categories used by the Scopus dataset.

Figure 1 .
Figure 1.Histograms of H 5 for disciplines D j .Different colors for different disciplines.The numbers of disciplines from 11 to 35 are consistent with the scientific categories used by the Scopus dataset.

Figure 1 .
Figure 1.Histograms of H5 for disciplines Dj.Different colors for different disciplines.The numbers of disciplines from 11 to 35 are consistent with the scientific categories used by the Scopus dataset.

Figure 2 .
Figure 2. Gamma Probability Density Functions of H5 for disciplines Dj.Different colors for different disciplines, consistently with colors used in Figure 1.Figure 2. Gamma Probability Density Functions of H 5 for disciplines D j .Different colors for different disciplines, consistently with colors used in Figure 1.

Figure 2 .
Figure 2. Gamma Probability Density Functions of H5 for disciplines Dj.Different colors for different disciplines, consistently with colors used in Figure 1.Figure 2. Gamma Probability Density Functions of H 5 for disciplines D j .Different colors for different disciplines, consistently with colors used in Figure 1.
Publications 2024, 12, x FOR PEER REVIEW 14 of 28Thus, disciplines are characterised by different gamma distributions and different quantiles.

Figure 4 .
Figure 4. Gamma Probability Density Functions of H5 for subjects Sk.Pink = Health, Green = Life, Blue = Physical, Purple = Social.Figure 4. Gamma Probability Density Functions of H 5 for subjects S k .Pink = Health, Green = Life, Blue = Physical, Purple = Social.

Figure 4 .
Figure 4. Gamma Probability Density Functions of H5 for subjects Sk.Pink = Health, Green = Life, Blue = Physical, Purple = Social.Figure 4. Gamma Probability Density Functions of H 5 for subjects S k .Pink = Health, Green = Life, Blue = Physical, Purple = Social.
e., answer NO to research question 2 and answer NO to research question 5).Moreover, apart from D 27 and D 35 , all disciplines are characterised by a similar level of Inefficiency a; apart from D 13 , D 27 , and D 31 , all disciplines are characterized by an insignificant level of Inefficiency b; apart from D 12 , D 18 , D 20 , D 21 , D 26 , D 29 , D 30 , D 34 , and D 35 , all disciplines are characterised by a significant level of Inequity a; apart from D 18 and D 29 , all disciplines are characterised by a significant level of Inequity b; apart from D 12 , D 18 , D 26 , and D 29 , all disciplines are characterised by a significant level of overall bias, and some disciplines are similar but some disciplines are different in terms of inefficiency and inequity (i.e., answer YES to research question 3 and answer YES to research question 6).Finally, disciplines show different gamma distributions and different quantiles (i.e., answer NO to research question 7).

Table 1 .
Description of acronyms and variables.
5Net per-capita per-year H-index based on articles

Table 2 .
Summary statistics on independent variables for subjects S k .Notations: mean (SD) is in the first row for each subject, median [min-max] is in the second row for each subject; in columns, Npub = No.publications, Nart = No.articles, Naut = No.co-authors, Ngro = No.gross citations, Nnet = No.net citations.

Table 3 .
Summary statistics on all H-indexes for subjects S k .Notations: mean (SD) is in the first row for each subject, median [min-max] is in the second row for each subject.

Table 13 .
Differences between subjects S k (below diagonal) and related p-values (above diagonal).

Table 17 .
Differences between subjects S k (below diagonal) and related p-values (above diagonal).

Table 21 .
Percentages of authors characterised by H 5 larger than 1, 1.5, 2, 2.5 and 3 for subjects S k .

Table A2 .
Summary statistics for disciplines D j .Notations: mean (SD) is in the first row for each discipline, median [min-max] is in the second row for each discipline; in columns, Npub = No.publications, Nart = No.articles, Naut = No.co-authors, Ngro = No.gross citations, Nnet = No.net citations.

Table A3 .
Summary statistics for disciplines D j .Notations: mean (SD) is in the first row for each discipline, median [min-max] is in the second row for each discipline.

Table A4 .
Number of publications Npub and of articles Nart in disciplines D j .Nobs = No.observations.

Table A5 .
Linear regression of Ngro-Nnet on ∆H2 and disciplines Dj. *** = significant at 99.9%.Note that all disciplines apart from D13, D27, and D31 cannot be distinguished statistically from other disciplines.