1. Introduction
In order to make a meaningful impact in their field, should scientists focus on producing a large quantity of papers? Chance models of scientific productivity such as Sinatra et al.’s
Q model (
Janosov et al. 2020;
Sinatra et al. 2016) or Simonton’s equal odds baseline (EOB;
Simonton 1988,
2004,
2010) model imply exactly that. For example, the equal odds baseline proposes that every creative work a scientist produces has (on average) ‘equal odds’ of being a ‘hit’ and being recognized as high-quality by their peers. Therefore, the number of high-quality papers, is a linear function of the number of papers and the chance to produce a scientific ‘hit’ increases simply as a function of a researcher’s productivity.
Yet, it has been highlighted that the equal odds baseline also suggests other propositions and characteristics concerning the relationship between quantity and quality of ideas (
Forthmann et al. 2020b,
2020c,
2021b). First, the correlation between the number of papers and average paper quality is proposed to be zero, such that individual scientists’ ‘hit-rate’ is not dependent on their quantity of work (
Simonton 1988,
2003,
2004). Second, scientists that produce the highest quantity of work are also expected to vary the most from one another in terms of the overall quality of their output (
Forthmann et al. 2021b).
What this means in a linear regression model in which quantity predicts quality of production is that the equal odds baseline implies heteroscedasticity and model residuals are hypothesized to display the shape of a tilted funnel that increases across the scale of quantity (
Forthmann et al. 2020c). In order to test the EOB in general and this ‘tilted funnel hypothesis’ of the EOB, statistical methods such as structural equation modeling (
Forthmann et al. 2020b,
2021a) and quantile regression (
Forthmann et al. 2020c) have been proposed and employed. The current work extends these previous efforts to test the EOB by transitioning to Bayesian statistical inference within a regression framework that encompasses and, hence, unifies existing approaches.
1.1. The Equal Odds Baseline and the Tilted Funnel Hypothesis
In general, research on creative productivity relies on open-ended count variables as measures of quantity of output (e.g., number of publications, number of musical compositions, number of responses in a divergent thinking test, and the number of generated ideas in a brainstorming session). Beyond quantity, each single product can also be evaluated for its creative quality. In its simplest variant, such an evaluation might lead to a dichotomous quality score (product is of high quality vs. product is not of high quality) and, in turn, a count of high-quality products. To add greater complexity, creative quality can be measured as an open-ended count variable, which has allowed for a more nuanced investigation into the correlates of creative quality. For instance, Simonton relied on the number of citations as the key quality indicator of a publication (e.g.,
Simonton 2003,
2004,
2009) and other researchers put forth similar arguments (e.g.,
Wang 2016). Citation counts can also be summed across publications to yield a quality score for an individual scientist’s overall output. All such quality scores in which scores of single products are summed across all products (e.g., all publications of a scientist) are referred to as additive quality scores (
Forthmann et al. 2020a;
Mouchiroud and Lubart 2001). Moreover, average creative quality scores (i.e., hit-ratios) result from dividing an additive quality score by the total number of products. Average quality scores are important for disentangling the intricate relationship between quantity and additive quality (
Hocevar 1979;
Forthmann et al. 2020a;
Prathap 2018) and they allow the testing of competing explanations of the relationship between quality and quantity in creative production (
Forthmann et al. 2020c;
Kozbelt 2008).
The cross-sectional EOB is a parsimonious model for the relationship between additive quality
H (e.g., the number of high-quality products) and quantity
T (i.e., total production). The EOB posits that
H is a linear function of individual differences in
T (
Simonton 2003,
2010):
with regression slope
(i.e., hit-ratio) and random shock term
u (
Simonton 2004,
2009,
2010). The shock term in the model is incorporated to take individual differences in hit-ratios into account. Indeed, there is empirical evidence suggesting that some scientists produce a lot but are rarely cited (i.e., mass producers), whereas others publish rarely but almost exclusively high-impact works (i.e., perfectionists;
Sidiropoulos et al. 2015;
Cole and Cole 1967;
Feist 1997). Consequently, in accordance with the EOB, detectable individual differences in hit-ratios beyond mere sampling variation are expected (
Forthmann et al. 2020b,
2021b). Moreover, average quality
H/
T must be uncorrelated with
T for the EOB to hold (
Simonton 1988,
2003,
2004). If
H/T were to be correlated with
T, a non-linear relationship between
H and
T would be observed, therefore running counter to the basic linear tenet of the EOB (i.e., a quadratic term would be additionally needed in Equation (1) to predict
H by
T;
Simonton 2003,
2004).
Another critical assumption related to the EOB has been coined the ‘tilted funnel hypothesis’ (
Forthmann et al. 2020c). The tilted funnel hypothesis refers to the fact that residual variance—i.e.,
Var(
u)—in the EOB cannot be homoscedastic. Instead, for lower values of
T, residual variance must be lower than it would be for higher values of
T. Hence, a bivariate scatterplot with
H on the y-axis and
T on the x-axis (cf. left sides in
Figure 1) should give the impression of a funnel tilted to the right.
Forthmann et al. (
2020c) have examined this by means of quantile regression. In quantile regression, the tilted funnel hypothesis leads to the expectation that regression slopes at higher conditional quantiles of the distribution of
H (e.g., the 0.80 quantile) are larger as compared to regression slopes at lower conditional quantiles (e.g., the 0.20 quantile), such that the additive quality of work steeply increases for every unit increase in quantity of work for those scientists that have the highest additive quality. Furthermore,
Simonton (
2010) argued that the distribution of
u should be approximately log-normal. In this work we aim to estimate a model in which
u is normally distributed on the log-scale (i.e., an overdispersed Poisson formulation of the EOB; for details see
Section 2.1).
1.2. Aims of the Current Work
The aim of the current paper was to test approaches that aim to model the EOB within Bayesian regression frameworks. Importantly, Bayesian statistical analysis is increasingly used and discussed as a viable alternative to classical frequentist statistical approaches (especially when it comes to null hypothesis testing) in various scientific fields, such as psychological science (
Wagenmakers et al. 2018) and scientometrics (
Schneider 2018). The expected benefits of Bayesian modeling include greater flexibility (e.g., ways to account for heteroscedasticity) or availability of information for Bayesian inference. For example, Bayesian modeling provides complete sampling distributions for all estimated model parameters. Hence, even inference with respect to the random error term
u can be facilitated this way. In addition, previous work in which the tilted funnel hypothesis was tested relied on quantile regression, which typically requires a different software package than other EOB modeling approaches (e.g., using structural equation modeling software packages). In this work, we employed Bayesian quantile regression (i.e., the tilted funnel goes Bayesian) and other EOB models within the same regression framework as implemented in the R package brms (
Bürkner 2017,
2018). Thus, the overall steps of the analysis testing the propositions of the EOB can be carried out within a unified framework.
As a more far-reaching goal, this current study may pave the way for future EOB modeling when only small samples are available. When frequentist approaches are not available, rather complex models (i.e., in relation to sample size) can be fitted by means of Bayesian modeling, if weakly informative priors are used. This capability of Bayesian modeling is currently being discussed in relation to measurement invariance testing (
van de Schoot et al. 2013), for example. In further support of this general aim of our work, we intentionally use a small classical sample of eminent neurosurgeons (drawn from
Davis 1987) which allows for quick Bayesian model estimation. Other researchers, who are hopefully inspired by this work, can readily use the openly available data and code used in this work to test, refine, and extend Bayesian EOB modeling.
2. Methods
We reanalyzed a dataset of 50 American neurosurgeons who were active within the time period from 1965 to 1979 (
Davis 1987). Notably, Simonton argued that the cross-sectional EOB is more likely to hold if ideas (i.e., those obtained by researchers during training and so forth) are sampled randomly from a domain (
Simonton 2010). This would clearly be more likely when people are also randomly chosen. Yet, the sample of neurosurgeons used here was not randomly chosen as all of the people in the sample were eminent neurosurgeons (i.e., non-prolific neurosurgeons had no chance to enter the sample). The data reported by Davis were recovered by
Forthmann et al. (
2021b) who found that the dataset mostly adhered to the EOB tenets. In this work, we use this dataset for illustration and proof of concept of the proposed Bayesian modeling of the EOB. We explored the proposed models for two measures of quality: (a) total number of citations for papers; and (b) number of first-authored papers that received at least one citation. The measure of quantity was the number of first-authored papers. For exact details on how bibliometric data were retrieved, consult
Davis (
1987). Importantly, correlations between quantity and quality measures as reported in the papers (between quantity and citations:
r = 0.62; between quantity and cited papers:
r = 0.86) were almost exactly recovered (between quantity and citations: r = 0.61; between quantity and cited papers: r = 0.86; see
Forthmann et al. 2021b).
2.1. Bayesian Estimation
All models were estimated with the statistical software R (
R Core Team 2021) by means of the brms package (
Bürkner 2017,
2018) for Bayesian regression modeling which relies on the stan package for model estimation (
Carpenter et al. 2017). All files needed to reproduce the reported analysis in this paper are openly available in a repository of the Open Science Framework (
https://osf.io/yq5mb/).
The tilted funnel hypothesis was tested with Bayesian quantile regression (
Yu and Moyeed 2001) which is implemented in brms via the asymmetric Laplace distribution. We estimated two models, one at the 0.20-quantile and one at the 0.80 quantile. The difference in the slope coefficients obtained from these models was derived from the difference in the posterior samples of the respective slopes. As we expected the slope at the 0.80-quantile to be higher than the slope at the 0.20-quantile (i.e., the tilted funnel hypothesis), we subtracted the slope of the 0.20-quantile from the slope of the 0.80-quantile such that a positive difference value is in accordance with the tilted funnel hypothesis. We further examined a Bayesian 95% credible interval of the slope difference.
Next, we fit an EOB model based on the normal distribution by omitting the intercept and using a constant prior for the slope parameter set to
(
Forthmann et al. 2021a). The same model was fit with the intercept and slope parameters freely estimated. We further fitted the EOB model as a simple linear regression, but the residual standard deviation was regressed on the number of publications in accordance with the tilted funnel hypothesis. We also tried to fit such a model with the residual standard deviation as a function of the number of publications and freely estimated intercept and regression slope, yet for this model the estimation process terminated with several technical issues flagged for both quality measures considered in this work. Then, the EOB was fit as a Poisson model. For this model, the intercept at the log-level was fixed to
and the logarithm of the number of publications was added as an offset at the log-level, which implies
Indeed, this model did not require any parameters to be estimated and was run with the argument algorithm set to “fixed_param”. For model comparison purposes, however, we fitted a variant of this model in which the slope was freely estimated. In a final model, we extended Equation (2) by explicitly adding the
u parameter to the Poisson model as a random effect across authors (i.e., analogous to overdispersed Poisson modeling) resulting in
with exp(
) =
u and
. In this model, again log-quantity was added as an offset and the log-level intercept was fixed to
.
All models were fit to both dependent variables and used the brms’ default priors when possible. However, the simple EOB model based on the normal distribution with citations as the dependent variable, and in which the residual standard deviation was modeled as a function of the number of first author papers, produced technical errors. Weakly informative priors were needed to fit this model. Given that this model could be estimated without technical problems for the number of cited first author papers, we rescaled the intercept and slope estimates obtained for the prediction of the residual standard deviation. These rescaled estimates were used as means in normal priors that had both a standard deviation of 0.25. With this setup, model estimation terminated without technical errors. All models were estimated with four chains and 2000 iterations. Only for the overdispersed Poisson variant of the EOB model, the number of iterations was increased to 5000. All available convergence diagnostics (i.e.,
, Bulk-ESS, and Tail-ESS; see (
Vehtari et al. 2021)) were in the recommended ranges, which flagged that Bayesian inference was accurate. Next, the expected log-pointwise predictive density (ELPD;
Vehtari et al. 2017) was used for model comparison purposes. Thus, models were compared in terms of their expected capability to predict new data. The best-fitting model has the highest ELPD and is used as a reference for evaluation of ELPD differences (i.e., the best-fitting model receives an ELPD difference of zero). Specifically, we used ELPD differences and respective standard errors for multi-model inference. We consider ratios of 2 of ELPD difference and the corresponding
SE as substantial. Finally, we focus on estimates of
u as a reflection of individual differences in the hit-ratio. We checked the relative positioning of neurosurgeons by correlational analysis of the
u estimates derived from the various models.
4. Discussion
In this work, we have shown how the assumptions underlying the EOB can be tested within a unified Bayesian regression modeling framework. First, we examined the tilted funnel hypothesis by means of Bayesian quantile regression (i.e., a regression model based on the asymmetric Laplace distribution). Second, we compared various formulations of the EOB model by means of Bayesian multi-model inference utilizing the ELPD difference. Like previous work using structural equation modeling frameworks such as those implemented in the R package lavaan (
Rosseel 2012), we used the capabilities of brms to fix regression coefficients at their expected values under the assumption that the EOB model holds for the data. This model can be, and was, easily compared with a model in which intercept and slope are freely estimated. Across the dependent variables we tested here, we found that the EOB model with a fixed intercept and slope performed substantially better compared to the simple linear regression model with free estimates of coefficients. In addition, we found that a model that explicitly takes the tilted funnel hypothesis into account by regressing the residual standard deviation on quantity performed better when compared to the simple EOB model (yet not substantially better for the citation analysis). The best fitting model was an overdispersed Poisson model that included the
u parameter as a normally distributed random effect at the log-level following the theorizing of
Simonton (
2010). Clearly, such a model is flexible enough to have the capability to handle the tilted funnel data pattern. A Poisson model without explicit modeling of the
u parameter was by no means competitive.
Correlational analysis of residuals further showed that modeling choices have mostly negligible effects on the quantification of residuals. Only the
u parameters obtained from the overdispersed Poisson displayed a visible degree of differentiation from the other model formulations. These observations are useful in case a researcher is interested in the
u parameter as a quantification of individual differences in the hit-ratio. Conceptually, the
u parameter refers to a researcher’s capacity to produce high-quality works, but also is theoretically inseparable from other factors such as luck, institutional factors (which are at least partially confounded with individuals in the EOB model), and other random sources (cf.
Janosov et al. 2020;
Sinatra et al. 2016). Simply speaking, the
u parameter quantifies how a researcher performed in comparison to what was expected. Thus, in a sense, it quantifies a researcher’s efficiency and adds useful information beyond the strongly correlated indicators of
H and
T. This was illustrated by reconsidering Davis’ selected ten most creative neurosurgeons, which were chosen based on indicators of
H and
T, yet in the current work we observed that two of these eminent neurosurgeons were not in the top ten based on the
u parameter estimates. While Davidoff dropped only a few ranking positions, Elsberg moved to the lower end of the distribution. Elsberg was clearly far less impactful than expected based on his level of productivity (i.e., he was a mass producer).
Thus, our empirical illustration emphasizes several advantages of Bayesian modeling. First, Bayesian modeling is very flexible. This is highlighted by modeling dispersion directly as a function of quantity in a heteroscedastic variant of the EOB, which was implemented within the model syntax of the brms package (
Bürkner 2017,
2018). Second, there are other areas in which the capability of Bayesian modeling for small samples has been highlighted. For example, Bayesian modeling has been shown to be useful even for rather short single-case time series (
Solomon and Forsberg 2017;
Christ and Desjardins 2018). In this work, this is emphasized by the fact that the EOB model with explicit modeling of the tilted funnel pattern could not be estimated with the default priors implemented in brms. However, borrowing and rescaling the needed information from the same model successfully estimated for cited papers resulted in informative priors that allowed estimation of the model for citations. Third, sampling distributions for all model parameters are immediately available and provide critical information for Bayesian inference (this is not necessarily the case for frequentist approaches). We have illustrated this by looking more closely at the
u parameter along with credible intervals. Credible intervals nicely show which
u parameters were estimated with the least or with the highest measurement precision. Finally, it should be acknowledged that the practice of null hypothesis testing has been flagged by researchers as being logically flawed (e.g.,
Schneider 2018) and in this work we focused much more on multi-model inference rather than null hypothesis testing. Hence, looking at the EOB from a null hypothesis testing perspective is expected to further emphasize the usefulness and capability of Bayesian modeling to account for the quantity-quality relationship in scientific productivity.
It should be acknowledged that the sample size was very small, yet this was intentionally chosen. Besides its various advantages, sometimes Bayesian estimation can take weeks, and we found it constructive to examine ideas of Bayesian EOB modeling on a sample for which results are available within minutes. This allows other researchers to more quickly test and extend the analyses employed in this work. Finally, the way we derived credible intervals for the difference in slopes for different conditional quantiles was highly pragmatic. We argue that the reported differences, particularly the credible intervals for the difference, should be interpreted with caution. Simply subtracting the posterior samples carries the assumption that the sampling distributions of both slopes are independent. However, datasets that adhere to the EOB are more likely to produce sampling distributions that are positively correlated, which implies that statistical inference here was most likely conservative and trustworthy. We suggest that future work should investigate this in more detail.