Do Successful Researchers Reach the Self-Organized Critical Point?

The index of success of the researchers is now mostly measured using the Hirsch index ($h$). Our recent precise demonstration, that statistically $h \sim \sqrt {N_c} \sim \sqrt {N_p}$, where $N_p$ and $N_c$ denote respectively the total number of publications and total citations for the researcher, suggests that average number of citations per paper ($N_c/N_p$), and hence $h$, are statistical numbers (Dunbar numbers) depending on the community or network to which the researcher belongs. We show here, extending our earlier observations, that the indications of success are not reflected by the total citations $N_c$, rather by the inequalities among citations from publications to publications. Specifically, we show that for very successful authors, the yearly variations in the Gini index ($g$, giving the average inequality of citations for the publications) and the Kolkata index ($k$, giving the fraction of total citations received by the top $1 - k$ fraction of publications; $k = 0.80$ corresponds to Pareto's 80/20 law) approach each other to $g = k \simeq 0.82$, signaling a precursor for the arrival of (or departure from) the Self-Organized Critical (SOC) state of his/her publication statistics. Analyzing the citation statistics (from Google Scholar) of thirty successful scientists throughout their recorded publication history, we find that the $g$ and $k$ for very successful among them (mostly Nobel Laureates, highest rank Stanford Cite-Scorers, and a few others) reach and hover just above (and then) below that $g = k \simeq 0.82$ mark, while for others they remain below that mark. We also find that all the lower (than the SOC mark 0.82) values of $k$ and $g$ fit a linear relationship $k = 1/2 + cg$, with $c = 0.39$.


I. INTRODUCTION
Inspiring researches in sociophysics (see e.g.[1][2][3][4][5][6]) have, in years, led to intense research activities in several statistical and statistical physical models and analysis of socio-dynamical problems.For example, the social opinion formation models of Galam (see e.g., [7,8]), of Biswas-Chatterjee-Sen (see e.g.[9,10]), of Minority Games (see e.g.[11]), of Kolkata Paise Restaurant games (see e.g., [12,13]), etc.In view of the automatically encoded wide range of the citation data of the publications by the scientists and their easy availability in the internet, we have studied here the inequality statistics from Google Scholar data.The presence of ubiquitous inequalities allowed recently the studies of various scaling etc properties in their statistics (see e.g., [14,15]) of the Hirsch index [16], or the universal (or limiting) Self-Organized Critical (SOC) behavior (see e.g., [17][18][19]) and their citation inequality like the century-old Gini (g) [20] and the recently introduced Kolkata (k) [21,22] indices.It may be noted at this stage that while Gini (g) values measure the overall inequality in the distributions and the Kolkata index (k) gives the fraction of "mass" or of total citations coming from the (1 − k) fraction of avalanches or publications.These studies [17][18][19] indicated that the inequalities in the avalanche size distributions, measured by g and k, just prior to the arrival of the SOC point in several standard physical models (like the sand-pile models of Bak-Tang-Wiesenfeld [23], Manna [24], and others), and in social contexts of citations from publications [18,19] becomes equal (g = k = 0.84 ± 0.04).It may also be noted that k = 0.80 corresponds to Pareto's 80/20 law (see e.g., [21,22]).This Pareto Principle asserts that 20% of the causes are responsible for 80% of the outcomes.In other words, the principle suggests that a small fraction of the factors contribute in causing a large fraction of major events, from economics to quality management and even in personal development.In business, it is often used to identify the most important areas for improvement.It may be mentioned here that our earlier studies of inequality indices g and k [17][18][19][20][21][22] corresponded to the cumulative dynamics (as the sand-pile dynamics progresses and cluster distributions grow or the publications by the authors or from the institutions progresses over time and the citation size distributions grow since the start of the dynamics) as the system approach towards the respective SOC states.Our study here is for the same inequality indices, but for small time intervals along growth dynamical paths of individual researchers.
We intend to study here the inequality dynamics measured by the Gini (g) and Kolkata (k) indices of several successful researchers (mostly winners of international prizes, medals or awards like Nobel, Fields, Boltzmann, Breakthrough, highest level Stanford c-score achievers etc), some distinguished sociophysics researchers, along with those of a few high level (but not so high Stanford c-score, though within "Top 2%") researchers, for data up to 2022, since their recorded first publication year.We collected their citation data of the publications (from online free Google Scholar, if an individual Google Scholar page exists).We calculate the g and k indices for each year, starting their first publication, by taking the citation statistics today (collected and analyzed in July-August 2023).We extracted the values for g and k for all the recorded publications of the scientist in each overlapping five-year windows (since the first publication), where the window continuously shift by one year till the year 2022 (corresponding to the last central year 2020 of the researcher) in the following figures for each researcher.The choice of five-year window size is found to give optimal stability in statistics (a smaller three-year window size did not give stability of the citation statistics for quite a few of the scientists.) We find, the majority of the chosen scientists crossed the g = k ≃ 0.82 mark (which we interpret here as the precursor level of the SOC point [17]) early in their life and often they hover just above or below but around that level of inequality mark.Some others just touched the precursor mark (g = k) once or even multiple times and a few remained below that mark.For other well-known researchers considered here, the g = k mark occurs marginally but does not cross ever.It is to be noted that this mark of reaching the SOC state (beyond the g = k ≃ 0.82) level of inequality is for yearly statistics (within a 5-year window which slides yearly) and not for the overall success measuring indices (in their cumulative citation statistics) studied earlier for the citation statistics of some distinguished researchers (see e.g., [14]), where the SOC mark is observed to be a little higher (g = k ≃ 0.86).
As mentioned earlier, the Hirsch index (h) [25], which gives the highest number of publications by a researcher, each of which has received equal or more than that number of citations, does not perhaps give an excellent measure [15,26] of the success of individual researchers.It has now been clearly demonstrated [15] (using the kinetic theoretical exchange model ideas), analyzing the Scopus citation data for the top 120,000 (within the "Top 2%") Stanford cite score achievers that statistically h ∼ √ N c ∼ N p , where N c and N p denote respectively the total number of citations and total number of publications by the researcher.This suggests convincingly that the average number of citations per paper (N c /N p ), and hence h, are statistical numbers (given by the effective Dunbar number [27,28]) depending on the community or network in which the researcher belongs [15,18].We show here, extending our earlier observations (see e.g., [14,18]), that the indications of success are not reflected by the total citations N c , or for that matter by the Hirsch index h, rather by the inequalities among the citations from publication to publication.Specifically, we show that for very successful authors, the yearly variations (given by the statistics with overlapping 5-year windows) in the Gini index (g, given by the average inequality of the citations for the publications; 0 ≤ g ≤ 1) and the Kolkata index (k, giving the fraction of total citations received by the top (1 − k) fraction of publications, 0.5 ≤ k ≤ 1).In particular, achieving g = k ≃ 0.82 signals a precursor to the Self-Organized Critical (SOC) state in the publication statistics.Analyzing the citation statistics (from the open-access Google Scholar) of 30 successful scientists throughout their recorded publication history, starting from their first recorded publication that the very successful among them (mostly Nobel Laureates, very high ranking Stanford c-scorers and a few others) reach and hover just above and below that g = k ≃ 0.82 mark, characteristic of the SOC state (k = 0.82 means 82% citations come from 18% publications).Others remain below that (SOC) level of extreme inequality in publication statistics.

II. SOCIO-STATISTICAL INEQUALITY AND ITS MEASURES
In 1905 American economist Lorenz [2,3] developed the Lorenz curve, a graphical representation of the distribution of wealth in a society.To construct this curve (illustrated by the red curve in Figure 1), one organizes the society's population in ascending order of their wealth and then plots the cumulative fraction of wealth, denoted as L(p), held by the poorest p fraction of individuals.One can similarly plot the cumulative fraction of citations against the fraction of papers that attracted those many citations.As indicated in Fig. 1, the Gini index is calculated from the area between the equality line and the Lorenz curve, divided by the area (1/2) below the equality line for normalization.As such, g = 0 signifies perfect equality and g = 1 corresponds to extreme inequality.The Kolkata index k is given by the fixed point of the Complementary Lorentz function L c (p) ≡ 1 − L(p).As such, k gives the fraction of citations attracted by the top cited k fraction of papers and k = 0.5 means perfect equality, while extreme inequality corresponds to k = 1.A minimal expansion [29] of the Lorenz function L(p), employing a Landau-like expansion of free energy, suggests L(p) = Ap + Bp 2 , A > 0, B > 0, A + B = 1.This gives L(0) = 0 and L(1) = 1 (with B = 0, the Lorenz function can represent only the equality line in Figure 1).

One can then calculate
one can obtain a quadratic equation involving g and k.An approximate solution of it, in the g → 0 limit gives where C = 3/8 [29] suggesting that g = k will occur at the Pareto value k = 0.80.We will see here a little deviation in the value of the constant C in the relation ( 1), for all the reported observations.

III. INEQUALITY DATA ANALYSIS FROM GOOGLE SCHOLAR
We collect the citation data for all the recorded publications in each year since the first entry in the record for thirty successful researchers having individual Google Scholar page and having minimum and maximum number of total publications N p = 127 and 2954, minimum and maximum number of total citations N c = 5769 and 463382, minimum and maximum values of Hirsch index h = 22 and 328, respectively for all those selected researchers.We considered three Nobel prize winners in each of the science subjects: Physics  [15]), and six well-known contributors in Econophysics and Sociophysics: W. Brian Arthur (known for "El Farol Bar Problem" of minority choice, see e.g., [30]), B. K. Chakrabarti (one of the "Fathers of Econophysics" [31,32]), R. I. M. Dunbar (known for Dunbar's number of social connectivity, see e.g., [33]), S. Galam (considered Pioneer of Sociophysics, see e.g., contributions in this Special Issue [34]), R. Mantegna (one of the "Fathers of Econophysics" [31,32]), V. M. Yakovenko (pioneer of kinetic exchange models of income/wealth distributions, see e.g., [35]).We considered three of the highestranked Stanford Cite-Scorers for 2022 (M.Graetzel, R. C. Kessler and Z. L. Wang [36]), and for comparison, we also considered three lower rank holders of the same "Top 2% Stanford Cite-Scores" (I.Fofana, U. Sennur and N. Tomoyuki [36]).
For studying the growth of inequality in the citation-statistics of each of these researchers, we select a 5-year window, starting earliest publication, and note the present-day citations of each of these publications.We then construct the Lorenz function (see Fig. 1) and extract the g and k indices as described the last section.We associate the g and k values with the middle year of the respective 5-year window and by one and shift the window by one year and get the values of the inequality indices for each of the successive years up to 2020 (considering data up to 2022).These are shown in the following Figs.2-6.
We can see from the Figs.2-6, for all the above-mentioned 30 scientists that for many of them (mostly Nobel Prize winners and highest rank c-scorers), the Gini index g value goes over the Kolkata index k value in one (or multiple years) by crossing the k = g ≃ 0.82 line (see the corresponding insets).These crossings of the indices (at values above 0.80 value) clearly indicates large inequalities and entering in to the Self-Organized Critical (SOC) state of the citation statistics of these scientists [17].[31,32], EFBP means "El Farol Bar problem" (see e.g., [30]), DN means "Dunbar Number" (see e.g., [33]), FSP means "Father of Sociophysics" (see e.g., contributions in this Spl.issue [34]), PKEM means Pioneer in Kinetic Exchange Modeling of Wealth Distribution [35], SCS-x means Stanford Cite Score rank (x denoting the rank) among the "Top 2%" scientists in 2022 [36].
Inequality Indices: Hirsch (h), Gini (g) Although the study of the time variations of the Gini (g) and Kolkata (k) indices (as shown in Figs.2-6) and checking if g value ever goes over the k value by crossing the k vs. g line (as shown in the respective insets) is indispensable for detecting if the SOC state has arrived or not, one can also have an easy (but only approximate) indication of the SOC state by looking at the ratio R of the citation number n max C of the highest cited paper and the effective Dunbar number D given by the average citation N C /N p of the researcher.In Table II, we precisely compare these R = n max C /D values (where D = N C /N P ) and see how its higher values compare with the observation of SOC (when k vs. g line is crossed affirmatively).We find, for R ≥ 40 more than 94% cases correspond to SOC level.

IV. SUMMARY AND DISCUSSIONS
Our earlier analysis [15] of the Scopus citation data for the 120000 top Stanford Cite-Score scientists showed that the Hirsch index h ∼ √ N c ∼ √ N p , where N c and N p denote respectively the total number of citations and the total number of publications by the researcher.This, in turn, says that the average number of citations per paper ( N c /N p ), and hence h, are statistical numbers (determined by the effective Dunbar number [27,33]) of the community or network (coauthors and followers) in which the researcher belongs [15,18].Indeed the anticipated increase of research impact through collaboration (by increasing the number of coauthors) have been studied in [37], by looking at the average value of the community Dunbar number or N c /N p .Also, detailed study from Google Scholar data on the relation between Hirsch index of individual scientists with their average number of coauthors per paper has been reported in ref. [38].Our study here shows that Hirsch index can not be a good measure of success for the researchers (even in Table I; the highest h = 328 does not correspond to a Nobel Prize winner, while the least one with h = 22 do).
In an earlier work [18], we proposed that the citation inequality indices Gini (0 ≤ g ≤ 1) and Kolkata (0.5 ≤ k ≤ 1) might give better measures of success of the scientist (not N c or h) and perhaps g and k both approach to equality at g = k ≃ 0.86 for successful researchers.It may be mentioned here that we used there the entire citation data (over all the years) to get the Lorenz curve and the overall values of g and k of the researcher, and this gave a little higher value of g = k ≃ 0.86 point.Indeed, our numerical study [17] of the overall or cumulative inequality statistics of the avalanches or cluster sizes in some well-studied and well-established Self-Organized Critical (SOC) models also suggested the arrival of the equality point of the avalanche size inequality indices (g = k ≃ 0.86) just appears as a precursor of the SOC point of the respective sand-pile or SOC models.In other words, as mentioned already, the SOC points in sand pile models (like BTW, Manna, etc) of physics signifies a critical state where sand grain avalanches of all sizes occur following a power law distribution.As shown in Ref. [17], even in these physics SOC models, the inequality statistics (indices Gini & Kolkata) corresponding to the avalanche size statistics reach similar values for the inequality indices of the unequal citations (considered here equivalent to the sand mass avalanches in sand piles).
We analyzed here the citation data for all the recorded publications in each year since the first entry in the record for the chosen 30 successful scientists, each having an individual Google Scholar page.They have the minimum and maximum number of total publications N p = 127 and 2954, and minimum and maximum number of total citations N c = 5769 and 463382, respectively.For studying the growth of inequality in the citation statistics if each of these scientists, we select 5-year windows, where the central year of each window moves every year.We constructed the Lorenz functions for each of these windows (see Fig. 1) and extract the yearly values (corresponding to the central year of the window) of g and k indices.We have plotted these yearly g and k values for all the working years, starting the recorded first year and for the third year from there and continued for successive years up to 2022 (by considering data up to 2022) for each of these chosen 30 scientists.These are then shown in Figs.2-6.The insets in each Fig .show the corresponding plot of k vs. g (disregarding the yearly sequence).These plots in all 30 cases of the researchers show very good linear fit to k = 0.5 + 0.39g (cf.eqn.(1)), as obtained approximately using a (Landau-like) minimal polynomial expansion of the Lorenz function (see section II.A).The insets also show the actual or extrapolated (precursor of sand-pile SOC) point at k = g = 0.82 ± 0.02.As we can see from Figs. 2-6, for 10 of the 12 Nobel Prize winners, several of the other International prize winners are considered here, well known Sociophysicists, Econophysicists, and all 3 of the highest rank Stanford Cite-Scorers, the crossing(s) of k vs. g (often at multiple years), do take place convincingly.The same is also true (often marginally), for several others.The 3 lower rank (yet from the "Top 2%") Stanford Cite-Scorers did not come up to g = k point.There are of course a few notable anomalies in this analysis of the data set; e.g., B. Josephson, J. Frank (both Nobel Laureates), D. Dhar (Boltzmann Award winner) and A. Sen (Breakthrough Prize winner) do not fit this picture of clearly reaching the SOC point.These anomalies indicate perhaps some shortcomings of this kind of analysis.On the other hand, noting that out of 27 of the researchers have chosen here (neglecting the 3 lower rank, though from the "Top 2%", Stanford Cite-Scorers), the clear evidence of SOC are seen for 19 (neglecting the "no" and "marginal" entries in the last column of Table I for these 27 researchers), indicating a success rate more than 70% for identifying the outstanding researchers.In Table II, we give a simple (though rough) indicator R = n max C /D (where n max C denotes the maximum citation of any paper and D the effective Dunbar number of the researcher) to check if the researcher has achieved the SOC level or not.We see that the SOC level is achieved for R ≥ 40, with more than 94% coincidence rate.
In summary, as statistically the Hirsch index h of a prolific researcher grows with the total citations N c as h = 0.5 √ N c [15] and N c grows linearly with the total number N p of publications by the researcher, N c = DN p (see [15,18]), where the effective Dunbar number D (∼ 75 [15]) of the network community in which the scientist belongs, h and N c can only give some average measures of success.In fact, very well appreciated members of the community can in principle have uniformly high citations of order D for each of their publications and hence h ≥ D ≃ 75.Though such uniformly appreciated or cited scientists will have very low values of Gini (g ≃ 0) and Kolkata (k ≃ 0.5) index values.Our study here shows, notwithstanding some anomalies, most successful researchers have large fluctuations in the citations of one or more of their publications (presumably due to uneven but accurate appreciations from the usual Dunbar network or community and also perhaps from outside the usual Dunbar community), which do not increase directly the D or h values, but lead to larger values of their inequality indices g and k, which may then hover around the SOC level value g = k ≃ 0.82, a little above the Pareto value (k = 0.80).

FIG. 1 .
FIG. 1.The Lorenz curve, represented by L(p) in red, denotes the cumulative proportion of total citations possessed by a fraction (p) of papers, when organized in ascending order of citation counts.Conversely, the black dotted line indicates perfect equality, where each paper receives an equal number of citations.The Gini index (g) is computed from the area (S) between the Lorenz curve and the equality line (the shaded region), normalized by the total area under the equality line (S + S ′ = 1/2).The Kolkata index (k) is obtained by locating the fixed point of the complementary Lorenz function (Lc), defined as Lc(p) ≡ 1 − L(p): Lc(k) = k.By geometry, the value of k gives the proportion of total citations owned or possessed by (1 − k) fraction of the top cited papers.

FIG. 2 .
FIG. 2. Yearly variations of the citation inequality indices, Gini (g) and Kolkata (k), for 3 Nobel prize winners in Physics and 3 in Chemistry.The indices are calculated using the present citation data for the publications within a 5-year window, starting from first recorded one in Google Scholar, and the window sliding by one year.The corresponding year shown is mid year of the window until 2022 (shown for year 2020 for the last 5-year window).The g value crossing above (and coming down) the k value marks the precursor of onset (leaving) the SOC state with time.The inset shows the plot of k vs. g over the entire career of the scientist.It fits well with the linear (Landau-like) relationship k = 1/2 + 0.39g, suggesting a crossing SOC precursor point at k = g = 0.82 ± 0.02.

TABLE I .
Consolidated inequality index (g, k) results for the citation statistics (from Figs. 2-6) of the 30 chosen science researchers (including 12 Nobel prize winners, 2 Fields Medalists, 2 Boltzmann Award winners, 2 Breakthrough Prize winners, 6 distinguished Sociophysics and Econophysics researchers, 3 from the top and 3 from the bottom of the "Top 2% Stanford Cite-Score Scientists" (2022 list).NP(P) means Nobel Prize in Physics, NP(C) means Nobel Prize in Chemistry, NP(M) means Nobel Prize in Physiology or Medicine and NP(E) means Nobel Prize in Economics, FM means Fields Medal in Mathematics, BA means Boltzmann Award in Statistical Physics, BP(P) means Breakthrough Prize in Physics, FEP means "Father of Econophysics"

TABLE II .
A rough indicator R = n max C /D, where the effective Dunbar number D = NC /NP (NC denotes the total number of citations for NP papers by the researcher) and n max Cdenotes the citation of the most-cited paper by the researcher, to check if the researcher has achieved the SOC level or not.We find, for R ≥ 40 the corresponding researchers clearly belong to the SOC level (94% success rate).