Inferring Authors’ Relative Contributions to Publications from the Order of Their Names When Default Order Is Alphabetical

In attributing individual credit for co-authored academic publications, one issue is how to apportion (unequal) credit, based on the order of authorship. Apportioning credit for completed joint undertakings has always been a challenge. Academic promotion committees are faced with such tasks regularly, when trying to infer a candidate’s contribution to an article they coauthored with others. We propose a method for achieving this goal in disciplines (such as the author’s) where the default order is alphabetical. The credits are those maximizing Shannon entropy subject to order constraints.


Introduction
More and more published research is a collaboration of several researchers (Shapiro et al., 1994 [1]). As various promotion committees need to know and estimate the contribution and the quality of an individual researcher, that raises the bibliometric issue of apportioning individual credit by the "fractional counting" of joint publications (e.g., references [2][3][4]). Abbas (2011) [2] proposes a set of indices to evaluate the quality of research produced by an author, while Egghe (2008) [4] focuses on a mathematical theory of the h-and g-index in the case of the fractional counting of authorship.
In this paper, we focus on disciplines where the default order of authors is alphabetical (e.g., Social Science and Mathematics; Liu and Fang 2014 [5]). Thus, for example, an order such as (B, A, C) indicates that B's contribution was "significantly" larger than A's, while C's was not. We shall thus assume that each discipline has a "standard", where if, for example, B's contribution exceeds A's by more than the standard, their order will be switched. Therefore, the default order (A, B, C) indicates that neither B's nor C's excess contribution exceeds the standard. Other than these inferences, which become constraints on the fractional contributions, we shall assume that the contributions are as uncertain as possible.
Suppose there is a disciplinary standard ε, 0 < ε < 1, such that the alphabetical order is not changed unless the difference in contributions, in favor of the alphabetically latter author, is deemed to be larger than ε. Thus, if, for example, the order of three authors is (B, A, C) (meaning that the author who is alphabetically second is listed before the one who is alphabetically first), it reflects the fact that B's contribution exceeds A's by more than ε, while C's contribution does not exceed A's (and thus also not B's) by more than ε. A large value of ε indicates strong adherence to an alphabetical order, while small values correspond to a high sensitivity to the actual relative contributions. The standard ε may or may not be known to those wishing to evaluate the contributions.
We shall make use of the constrained maximal (Shannon) entropy approach, reflecting the most diffused contribution distribution that satisfies the implications of the limited information given by the order. Constrained maximal entropy has been used, among other applications, in physics [6] and finance [7], where the constraints were the mean and/or variance of the distribution. Our constraints are simpler, so solving the problem is often rather trivial. First, we deal with estimating the mean contribution, and then, we propose an appropriate multivariate distribution.

Mean Contribution
Start with two authors (A and B), with the respective unknown expected contributions p for A and 1 − p for B, which we wish to infer. Note that although p is a share and not a probability, we shall perform probability operations on it. The entropy function, , is concave in p with its maximum at p = 1 2 , regardless of the base of the logarithm (e.g., Cover and Thomas 2006 [8]). We shall assume that each author's contribution is at least δ δ < 1 4 .Now, the order (A, B) implies that p > 1 − p − ε, i.e., that p > 1−ε 2 . Since 1−ε 2 < 1 2 ∀ε, it follows that the constrained entropy is maximized at p * = 1 2 . Thus, the authors are deemed to have contributed equally(!). The order (B, A) implies that 1 − p > p + ε, i.e., that Thus, if, for example, ε = 1 4 , A's mean contribution is estimated at 3 8 . If ε = 1 4 A's mean contribution is max 1 8 , δ , where the low value reflects A's demotion despite the high threshold.
For three authors A, B and C, and respective unknown mean shares p, q, 1 − p − q, the unconstrained entropy is jointly concave and maximized at 1 3 If the order is (B, A, C), the constraints are q > p + ε, q > 1 − p − q − ε and 0 ≤ p + q ≤ 1. Among the feasible solutions, the one that maximizes the entropy is 1 3 − 2 2 ε , 1 3 + 1 3 ε, 1 3 + 1 3 ε , assuming ε < 1 2 (note that our notation lists the relative contributions in the alphabetical order of the authors).
Note that C's contribution, 1 3 + 1 3 ε, is deemed to be larger that A's 1 3 − 2 3 ε , even though they appear later in the order (but not by more than 2ε). For Note that if ε is large, then as A was nevertheless demoted to last, its contribution is deemed to be negligible (another school of thought is that if all the authors were essential to the research, they should receive equal credit. If one adopts the philosophy that all the authors were essential to the creation of the paper, a possible approach would be to average the above order-dependent shares with equal shares. Thus, for example, the order (B, C, A) will result in . For (A, C, B), the solutions is 1 3 − 1 3 ε, 1 3 − 1 3 ε, 1 3 + 2 3 ε, ε < 1. If 1 3 − 1 3 ε < δ, then {δ, δ, 1 − 2δ}. If ε < 1, then the allocation is δ, δ, 1 − 2δ, so A's and B's contribution is deemed negligible. The intuition here is that despite the high requirement for reversing an order (ε > 1), C has overtaken B.
Note that in all cases, the expected contributions are either independent of ε or dependent on it linearly. Thus, if ε is a random variable (with some subjective distribution), the only change required is the substitution of E(ε) for ε wherever it appears.

Joint Distribution of Relative Contributions
We shall now assume that the joint distribution of the relative contributions is believed to be Dirichlet (e.g., Kotz, Balakrishnan and Johnson 2000, 40.1 [9]). That is, the joint density of the relative contributions p 1 , . . . , p m is We Note that the marginal density of p i is beta As we wish to allocate the whole credit to the authors, we have Note that if we have already estimated (inferred) the mean relative contributions p 1 , . . . , p m , then, to maintain these ratios, we need to have θ i = kp i , i = 1, . . . , m, for some k > 0. The choice of k will determine the parameters of the (Dirichlet) distribution.

Effect of Number of Authors
How does the number of authors affect the relative contribution of one of them? Consider, for concreteness, A's contribution in orders where he/she is last: We see that, in the case of three authors, A's relative contribution depends on the order of B and C; in 3a, C needs to be rewarded for "overtaking " B (as well as A), which reduces A's contributions.

Short Discussion and Concluding Remarks
In this paper, we attempted to quantify the significance of deviations from some "natural" or "default" order of authors. We assumed that the unknown relative contributions maximize entropy subject to constraints reflecting default order reversal.
Future study could further demonstrate and validate the proposed method by empirical and data-driven methods. For example, a comparison could be made of academics' rankings by the H-index (or some other measures), compared to authors' rankings by their total fractional journal paper contributions. Another option is to apply this study to journals where the authors' contributions are required and published, while comparing it to the order of the authors' names. Clearly, a survey could be conducted over some well-cited papers, asking the authors to ascribe their fractional contributions, while comparing them to those determined by the proposed method.
We note that the proposed measure, if and when it will be used for personal evaluation and promotion, should be combined with qualitative assessments, as often done in T&P committees. Otherwise, "automated" evaluation metrics alone could encourage game-playing among collaborators, which would be an uninvited outcome.
Finally, let us note that several related applications could use, with required modifications, the proposed method-for example, for assessing the contribution of programmers and AI agents in tasks distributed and published over the Internet. One scenario with some similarity to the considered problem is the following.
In sports, some media "power rank" (PR) teams during the season. The PR is usually consistent with the number of wins the teams have achieved, but not always (there might be other factors such as recent injuries, recent performance, etc.).
Funding: This research was partially funded by the Koret Foundation grant for "Smart Cities and Digital Living 2030".