Our study is based on the Crunchbase 2013 Snapshot (
Crunchbase Inc. 2013), which contains a replica of the Crunchbase database up to 31 December 2013. This database is the latest one licensed under Creative Commons. The Crunchbase 2013 Snapshot has a relational structure consisting of 11 tables in SQL dump format, and collects information on 21,789 ventures and on the related products, investors, acquisitions, and funding rounds. We considered the following two tables:
object, which contains time-invariant information on each venture (country, economic sector, foundation and exiting date, number of milestones achieved, number of strategic relationships established, etc.), and
funding_rounds, which includes information on all funding rounds in which each venture was involved (date, typology, participants, funds raised, etc.).
2.2. Sample Selection
The three target events ‘operating’, ‘exited’, and ‘closed’ appear very imbalanced across the 21,789 ventures in the Crunchbase 2013 Snapshot: 18406 (84.5%) ventures are still operating, 2117 (9.7%) are exited, and 1266 (5.8%) are closed. As suggested by the fourth quintile of age for exited ventures equal to 20 years, successful exit (i.e., through M&A or IPO) is likely to occur also in advanced stages of development. Therefore, in order to balance the representativeness of the three events, we excluded all ventures aged more than 20 years. We also excluded all ventures involved in no funding rounds, because they provide no information on the relationship between successful exit and equity funding dynamics.
These selection criteria led to a sample of 5147 ventures, where 1924 (37.4%) are still operating, 1976 (38.4%) are exited, and 1247 (24.2%) are closed.
2.3. Research Variables
In order to represent equity funding dynamics of ventures, we defined several variables equal to the amount of funds raised across all funding rounds of the same typology. Typologies of funding rounds in the Crunchbase 2013 Snapshot include:
‘Angel’: small rounds designed for new ventures, where participants can be individual angel investors, angel investor groups, friends, and familiars;
‘Venture—Series A’ and ‘Venture—Series B’: funding rounds for earlier stage ventures, ranging on average between one and USD 30 million;
‘Venture—Series C’ and ‘Venture—Series D and onwards’: later funding rounds designed for established ventures, typically consisting of amounts over USD 10 million;
‘Venture—Series unknown’: funding rounds for established ventures where the series has not been specified.
We also defined two global measures of equity funding dynamics: (i) the total number of funding rounds (variable funding_rounds), and (ii) the number of unique investors across all funding rounds (variable participants).
Research variables are listed and described in
Table 2, while their sample statistics are reported in
Table 3.
It can be noted that the number of funding rounds is comprised between 1 and 13, with mean equal to 1.9, while the typology of funding rounds with the highest average amount is ‘Venture—Unknown series’ (USD 4.6 million), followed by rounds of series D and onwards (USD 3.4 million). Furthermore, rounds of series A, B and C consist of an average amount between 2.3 and USD 2.7 million, while rounds of type ‘Angel’ involve the lowest average amount USD 0.1 million). In the sample, there are 521 ventures (10.1%) that raised no funds (datum not shown in the tables).
2.4. Control Variables
Since a venture’s chance of successful exit and risk of closure may be determined by other factors besides our research variables, we also defined several control variables. The first two control variables include geographical location (variable location) and economic sector (variable sector), which should account for different market conditions across ventures. Age (variable age) is a third control variable that should take into account the fact that older ventures may have higher chance of exit and less risk of closure. Also, age at first and at last funding (variables age_first_fund and age_last_fund), respectively) were considered in order to account for the distribution of funds raised across a venture’s life. A further control variable considered in this study is the number of strategic relationships established (variable relationships), which is supposed to reflect network ties. Finally, we defined three proxies of effectiveness: the total number of milestones achieved (variable milestone), and two dummy variables indicating whether at least one milestone was achieved before the first and the last funding (variables miles_first_fund and miles_last_fund).
Control variables are listed and described in
Table 2, while their sample statistics are reported in
Table 3 and
Table 4. In order to limit the occurrence of small frequencies, geographical locations were reclassified into ‘USA’, ‘Europe’, ‘Asia’, and ‘other’, and economic sectors ‘cleantech’, ‘security’, ‘manufacturing’, ‘transportation’, ‘automotive’, ‘nonprofit’, and ‘local’ were included into the category ‘other’. It can be noted that the majority of ventures are located in USA (75.3%) and in Europe (15.7%), and that the most prevalent economic sectors are ‘software’ (17.3%) and ‘web’ (15.5%), followed by ‘entertainment’ (12.9%), ‘sales’ (11.3%), ‘enterprise’ (10.7%), and ‘health’ (10.4%). Also, 14.8% and 15.7% of ventures achieved at least one milestone before the first and the last funding, respectively.
2.5. Statistical Model
In order to assess the impact of research and control variables on a venture’s chance of successful exit and risk of closure, multinomial logistic regression (
McCullagh and Nelder 2002, p. 159ff;
Hosmer and Lemeshow 2000, p. 269ff) was employed. Multinomial logistic regression allows to describe the probability distribution of a categorical outcome variable
Y as a function of the values taken by a set of explanatory variables, and can be viewed as a generalization of logistic regression when the outcome variable has more than two unordered categories.
Denote the categories of the outcome variable
Y with values
, where value 0 is assigned to the reference category and
K is the total number of categories. Also, let
be a vector including value 1 in the first position and, in the subsequent positions, the values of explanatory variables observed on unit
i (with dummy coding for categorical variables). The multinomial logistic regression model is defined as:
where
is a set of parameters to be estimated referred to the category of
Y labeled as
k, with
p equal to the number of explanatory variables after dummy coding. The probability distribution of
Y predicted by model (
1) for unit
i is:
where
is an indicator function taking value 1 if
, and value 0 otherwise.
It can be noted that model (
1) expresses the logarithmic probability ratio of each non-reference category of
Y to the reference one as a linear combination of the values taken by explanatory variables. Such probability ratios are often called odds, while their logarithm is known as logit. The
k-th odds
indicates how much the category of
Y labeled as
k is more or less likely than the reference one (labeled as 0), and it can be shown that, for
, the quantity
equates to the variation ratio of the
k-th odds due to a unit increase in the value of the
j-th explanatory variable at constant values of the other explanatory variables. Therefore, parameter
with
represents the net effect of the
j-th explanatory variable on the
k-th odds. Note that parameters
act as intercepts and, in particular,
represents the
k-th odds when all explanatory variables take value 0.
Parameters
can be estimated through the maximum likelihood method, consisting in selecting their values that maximize the likelihood (or, more conveniently, the log likelihood) of the model. Maximum likelihood is a widely employed estimation technique due to its desirable statistical properties, i.e., consistency, asymptotic efficiency, and asymptotic Gaussian distribution. Assuming that the sample units are independent, the log likelihood of model (
1) is:
where
is the number of events experienced by unit
i (
) falling in the category of
Y labeled as
k, and
is the total number of events experienced by unit
i. In the case where each sampling unit experiences one and only one event, i.e.,
, the log likelihood simplifies into:
where
is an indicator function taking value 1 if the event experienced by unit
i falls in the category of
Y labeled as
k, and value 0 otherwise.
Due to the asymptotic Gaussian distribution of maximum likelihood estimates, it is possible to compute, for each
j and
k, the
p-value for the significance test on the hypothesis
, which equates to the absence of an effect of the
j-th explanatory variable on the
k-th odds. It is also possible to assess the importance of each explanatory variable based on the analysis of deviance. The deviance of model (
1) is defined as minus twice the log likelihood:
Analogously, we can define the deviance of any reduced model obtained by setting to zero one or more parameters in model (
1). The deviance can be interpreted as the amount of unexplained variability of a model compared to a situation of perfect fit, which is provided by the saturated model, i.e., a model with one parameter per sampling unit.
Let
be the deviance of the reduced model obtained by setting to zero the parameters associated to the
j-th explanatory variable in model (
1). The difference
represents the increase in the deviance when the
j-th explanatory variable is excluded from the model, therefore it can be interpreted as the deviance explained by such variable. Consequently, the relative importance of the
j-th explanatory variable across all the ones included in the model can be measured as:
It can be shown that is asymptotically distributed as a Chi-squared random variable with a number of degrees of freedom equal to the number of parameters associated to the j-th explanatory variable. This result can be exploited to assess the statistical significance of the contribution of each explanatory variable to the model.