Revisiting the Spatial Autoregressive Exponential Model for Counts and Other Nonnegative Variables, with Application to the Knowledge Production Function

: This paper proposes a two-step pseudo-maximum likelihood estimator of a spatial autoregressive exponential model for counts and other nonnegative variables; it is particularly useful for dealing with zeros. It considers a model speciﬁcation allowing us to easily determine the direct and indirect partial effects of explanatory variables (spatial spillovers and externalities). A simulation study shows that this method generally behaves better in terms of bias and root mean square error than existing procedures. An empirical example estimating a knowledge production function for the NUTS II European regions is analyzed. Results show that there is spatial dependence between regions on the creation of innovation, where regions less able to transform R&D expenses into innovation beneﬁt from knowledge spatial spillovers through indirect effects. It is also concluded that the socioeconomic environment is important and that, unlike public R&D institutions, private companies are efﬁcient at knowledge production.


Introduction
Many empirical applications with spatial data concern the modeling of counts and other nonnegative response variables. Examples are the modeling of trade flows, migration flows, patenting citation and patent creation, number of crashes, firm location and firm birth, number of new patients contracting a given disease, etc. Conventional practices opt to logarithmically transform the dependent variable in order to apply the well-known spatial linear models. This is the approach followed in [1] to model the interregional trade of goods at the NUTS3 level in Spain, in [2] to explain labor migration flows in China, and in [3] to investigate the effect of intraregional labor mobility in the production of knowledge in Europe, to give just a few examples. However, Silva and Tenreyro [4], in the context of cross-sectional data, note that modeling logarithmically transformed variables with a linear model may lead to bias in estimation when heteroscedasticity is present, or to distortions in parameter estimates caused by the need to add a constant to zero observations. The authors propose using the Poisson pseudo-maximum likelihood (PPML) estimator of the model for the untransformed variables as an alternative to ordinary least squares (OLS) of the loglinear model. Spatial autoregressive models are popular to address spatial dependence. Elhorst [5] discusses the relevance of such models in recent applied spatial econometrics. One reason is that they quantify indirect spatial spillovers, as is pointed in [6,7]. While linear spatial autoregressive models are widely used in the literature, nonlinear spatial autoregressive models, namely models for counts or other nonnegative variables, are not so popular because of their complexity in estimation and in the derivation of marginal effects and This work proposes to model an outcome that is a count or other nonnegative variable, showing spatial dependence, by the following spatial autoregressive exponential specification of the conditional mean (SAR-E regression), which is based in the spatial lag model of counts of [12]: E(y|X) = µ = exp(ρW log(µ) + Xβ), where y is the vector with observations of the dependent variable for n spatial locations, µ is the vector with a conditional mean of y, X is a matrix with observations of k explanatory variables for n spatial regions, W is a spatial weighting matrix, and β and ρ are unknown coefficients to be estimated. Observe that, according to Equation (1), the conditional mean of one location i, µ i , is determined as a function of the characteristics of location i through the observed values for the explanatory variables and of a weighted average of the conditional mean of neighboring locations. Equation (1) serves three purposes. Firstly, it expresses the conditional expectation of a nonnegative and, in particular, a count variable. Observe that count variables are often assumed to have a Poisson distribution, whose expected conditional mean is an exponential function of a set of explanatory variables. Secondly, it incorporates the spatial dependence of the data by means of an autoregressive term, extending the well-known SAR or Spatial Lag linear model to the nonlinear context. Finally, it is invertible, which allows us to easily calculate the partial effects of variables and, in particular, analyze global spatial interactions between regions with the identification of spatial spillovers and externalities.
The reduced form of Equation (1) is: The partial effects of the explanatory variables are deduced from Equation (2) leading to, where µ DI AG is a diagonal matrix of order n with elements µ i . Observe that Equation (3) is a n × n matrix of partial effects, where the elements in the main diagonal are direct effects of the kth explanatory while the off-diagonal elements are indirect effects.
Considering that A = (I − ρW) −1 , then the direct partial effects in region i are equal to ∂µ i ∂x ik = β k a ii µ i i = 1, . . . , n, and give the expected outcome of a given location due to a one-unit variation of the kth explanatory in the same location. When the spatial weighting matrix is row-normalized, indirect effects can be divided into spill-in and spill-out spatial spillovers. The spill-in spillover measures the cumulative sum of spatial spillovers that location i receives from all neighboring locations-that is, the sum of expected impacts in the outcome of location i due to a one-unit variation of the kth explanatory in each neighboring location j-and can be calculated as follows: which is the cumulative sum of all off-diagonal elements in row i of Equation (3). The spill-out spillover effect is the sum of all spatial spillovers that location i transfers to neighboring locations-that is, the sum of the expected impact in the outcome of each location j neighbor from i when the kth explanatory in location i varies by one unit and is equal to spill − out i = β k n ∑ j = 1 j = 1 a ji µ j i = 1, . . . , n, (6) or, equivalently, the sum of all off-diagonal elements in column i of Equation (3). Each region has a direct, a spill-in and a spill-out partial effects. The values analyzed in empirical applications are usually the average of each of these effects over all regions, constituting, respectively, the average direct partial effect, the average spill-in spatial spillover (Aspill-in), and the average spill-out spatial spillover (Aspill-out), that is,

Estimation
When the dependent variable is a count with a Poisson distribution, the full information maximum likelihood (FIML) estimator of the reduced form in Equation (2) is derived in [12]. For variables that are not Poisson distributed, which includes some types of counts and other general nonnegative variables, those results can be used in a Poisson pseudomaximum likelihood (PPML) context, assuming that the conditional mean is correctly specified according to Equation (1). Since the seminal work of [18], PPML has become popular for model estimation because it extends the technique of maximum likelihood to situations where the conditional distribution of the outcome does not need to be specified, but its conditional expectation has to be an exponential function of a linear index. The idea is to use the Poisson probability function to build the likelihood function, even if the outcome is not Poisson distributed, requiring that its expectation coincides with the expectation of a Poisson-distributed variable. As a misspecified distribution was used to define the likelihood function, the covariance matrix of the estimator and, in particular, standard errors, need to be estimated with a robust estimator. Silva and Tenreyro [4] disseminated PPML to estimate the gravity model, which is a particular case of exponential regression.
The PPML approach proposed here results in estimating the unknown coefficients by FIML and the respective standard errors with a robust estimator to safeguard them from variance misspecification, like in situations where there is overdispersion. However, the authors of [12] report severe difficulties in obtaining numerical solutions for FIML estimates. Therefore, they recommend instead a limited information maximum likelihood (LIML) two-step estimator. The first step delivers an estimate for the unknown variable W log(µ) obtained with an OLS regression of W log(y) in the set of regressors X, WX and W 2 X. In the second step, a Poisson regression is performed with regressorsŴ log(y) and X. An expression for the second stage adjusted covariance matrix is given in [12].
This paper proposes a two-step procedure that extends and refines the method described above in two ways. First, it suggests extending the estimation for a pseudomaximum likelihood framework in order to encompass the modeling of a vast set of outcomes. This approach requires additional care in the estimation of the covariance matrix in the second step. Second, it proposes a different estimation procedure for the first step that circumvents the problem of observations that are zeros. Therefore, the following two-step PPML procedure to estimate the SAR-E model in Equation (1) is recommended: Run a PPML regression of y on X, WX and W 2 X and calculate the predicted valuesŷ.

2.
Run a PPML regression of y on W log(ŷ) and X.
Observe that the second step of [12] usesŴ log(y), the fitted values for the variable W log(y), while the second step of the proposed method uses W log(ŷ), which is a transformation of the fitted values of variables y,ŷ.
Standard errors in the second step should take into consideration the pseudo-maximum likelihood framework where the Poisson variance may be misspecified, and should account for the sampling variation in the regressor W log(ŷ). To overcome these issues, the use of bootstrap standard errors is recommended in the second step. This procedure is easy to implement because it requires only software with a command for Poisson regression and bootstrap standard errors, like STATA [19]. (For non-negative outcomes other than counts, we advise using the command "glm" in STATA with the option "family(Poisson)" instead of "Poisson".)

Simulation Study
In the simulation study, the two-step estimator introduced in Section 2 with a first step based on maximum likelihood (SAR-PPML 1stStep-ML) is compared with the two-step estimator of [12], with the first step being an OLS regression (SAR-PPML 1stStep-OLS) and aspatial PPML. For SAR-PPML 1stStep-OLS, when calculations need the logarithm of the outcome, an ad hoc constant equal to 1 was added to observations that were zeros. Simulations were performed with R [20].

Simulation Design
The simulation design closely follows that in [12], which is most closely related to other spatial econometric experimental designs, such as those in [21][22][23].
The random dependent variable is generated as The design matrix includes two covariates, X 1 and X 2 , where the first was randomly generated from a normal distribution, with mean and variance equal to 1 and 2, respectively. Since econometric studies usually incorporate a mix of continuous and dummy variables, following [4], X 2 is a dummy variable randomly generated from the Bernoulli distribution with a mean equal to 0.5.
The study considers three alternative spatial weight matrices. They were calculated using the same two-step procedure found in other spatial econometrics simulation studies (see, e.g., [24]). First, n space units are randomly drawn within in the unit square. Secondly, a matrix W0 is constructed given a criterion, and normalized by rows, so that the sum of all elements in each row is 1. In the present study, two different criteria were used, resulting in three alternative spatial weighting matrices. W1 and W3 intend to replicate matrices generated with a contiguity criterion, with neighbors chosen based on the nearest neighbor distance, fixing for W1 that each unit has seven neighbors (the seven units that are closest), while for W3 the number of neighbors is four, which is close to the average number of neighbors observed in the empirical study included in the next section. On the other hand, W2 is created based on an inverse distance criterion, using the Euclidean distance between each unit. The matrix W2 is said to be denser than the matrix W1, since W2 contains more nonzero entries, and matrix W1 is denser than matrix W3.
Monte Carlo simulations were conducted for each design of W and for each of the three estimators described above. The sample size, n, varied over the set [100; 250; 500; 750; 1000], and the spatial autoregressive parameter, ρ, varied over the set [0; 0.2; 0.4; 0.6; 0.8]. The parameters associated with variables X 1 and X 2 , β 1 and β 2 , respectively, were held fixed at 0.5. The intercept was set to zero.
For each experiment, 1000 replications were used. This is the usual number of replications used in Monte Carlo studies with spatial data (see [12,[21][22][23][24][25], among others). The bias was calculated as the average in the 1000 replications of the difference between the estimated value of the coefficient in each simulated sample and the respective true value. The RMSE was also calculated for each estimated coefficient as the square root of the sum between the square of the bias and the empirical variance in the 1000 replications.

Monte Carlo Results
It should be noted that the results obtained referring to W1 are quite similar to those obtained with W3. This suggests that estimators should not be sensitive to the density of the spatial weighting matrix when using a contiguity criterion. For this reason, the analysis of the remaining results will focus only on experiments related to the use of W1 and W2 matrices. The results for W3 can be found in Tables A5 and A6 of Appendix A.
Tables A1 and A3 in Appendix A show the results for the bias of the estimated coefficients for each estimation method, considering the spatial weighting matrix based, respectively, on the contiguity criterion, W1, and the inverse distance criterion, W2. Both SAR-PPML estimators show similar and quite satisfactory results, with the SAR-PPML 1stStep-ML presenting a lower absolute value for bias for lower and median levels of spatial dependence, while the SAR-PPML 1stStep-OLS appears to behave better for values of ρ closer to 1. It is worth noting that both estimators have lower bias, as an absolute value, to estimate the coefficient of the continuous variable than that of the dummy variable. Note, also, that when ρ increases, both estimators present a smaller absolute value of bias when using matrix W2 compared to matrix W1. Nevertheless, this difference is negligible, especially for a large n. Finally, the aspatial PPML estimator shows progressively worse results as ρ increases, as expected, being slightly better than the SAR methods when there is no spatial dependence (ρ = 0).
Concerning the bias of the spatial autoregressive coefficient, ρ, globally, the SAR-PPML 1stStep-ML shows better performance than the remaining estimators, especially when n is large. However, for ρ = 0.8 it shows a higher bias, in absolute value, particularly for the W2 matrix. Although slightly worse than the first, the SAR-PPML 1stStep-OLS presents satisfactory results, namely for high levels of spatial dependence. On the other hand, in general, the use of a special weighting matrix based on the inverse distance between locations produces higher bias when estimating the spatial autoregressive parameter.
Tables A2 and A4 in Appendix A show the results referring to RMSE. From a general point of view and regarding β 1 , the SAR-PPML 1stStep-ML presents the best results, particularly for W1. However, the SAR-PPML 1stStep-OLS produces a more desirable set of results for higher values of ρ. In both estimators, it is noted that, as the spatial dependence and the sample size increase, the RMSE decreases. This result is only slightly altered when ρ = 0.8. As expected, the aspatial ML estimator only shows satisfactory results when ρ = 0. As for β 2 , the conclusions are quite similar to β 1 , with the disclaimer that the RMSEs for this coefficient are much higher, especially when the sample size is small. Estimations involving the W1 matrix have slightly better results. Lastly, the aspatial estimator is, again, quite far from the results of the other estimators.
Both SAR-PPML estimators present quite similar results regarding the RMSE values for the estimation of the coefficient of spatial dependence, ρ, with the SAR-PPML 1stStep-ML showing better results as the sample size increases. It is also important to note that the SAR-PPML 1stStep-ML exhibits a higher RMSE for matrix W2 for high levels of spatial dependence when compared to SAR-PPML 1stStep-OLS. However, in general, the use of W1 seems to trigger better results.
In summary, these results are in line with those obtained in other simulation studies such as [4,12,22,24,25], suggesting the following conclusions. First, the estimator SAR-PPML 1stStep-ML presents the best performance, except for high spatial dependence, when ρ = 0.8. Keep in mind, however, that most empirical applications give low and median values for the spatial dependence parameter. Since this estimator does not rely on logarithmic transformation of the dependent variable and uses PPML regression instead of a loglinear estimation in the first step, this result seems to be in agreement with that found by [4]. Another interesting result is that there is a higher distortion for the estimated coefficient of the dummy variable compared to the estimated coefficient of the continuous variable, suggesting that the distribution of the explanatory variables can condition the performance of the estimators, a conclusion that is also made by [12]. Other similar conclusions between studies are the fact that the RMSE decreases as the spatial dependence and sample size increase, and that the spatial weighting matrix criteria influence the results. Several studies have already addressed this issue, such as [24], where the authors found that the RMSE of coefficient estimators appears to be generally higher for the spatial weighting matrix based on inverse distance, suggesting that the variance of the estimated coefficients may, somehow, be related to the density of the spatial weights matrix chosen. Another expected conclusion was the poor performance of the Aspatial PPML estimator in the presence of spatial dependence, which presented an accentuated upward bias for the coefficients of X 1 and X 2 . This result is in agreement with [26], who found biased and inconsistent estimators when spatial dependence was not taken into consideration. In addition, it is interesting to note that the distortion of results is more significant for values of ρ near 1, which is in line with the results of [22].
To assess the performance of both estimators under misspecification, a new design was considered where X 1 shows spatial dependence instead of being i.i.d. Therefore, X 1 was simulated according to the following spatial autoregressive process: The other variables were generated as before, with the coefficients retaining the same values. Estimation was implemented as if X 1 was i.i.d. (ignoring that it is spatially autocorrelated). The results obtained for 1000 replications, considering the spatial weighting matrix W1, are included in Table A7 for bias and in Table A8 for RMSE, while Tables A9 and A10 show, respectively, the bias and RMSE when the spatial weighted matrix is W2. Results show that ignoring spatial autocorrelation in the explanatory variable leads to noticeably higher bias and RMSE in the estimation of all parameters, especially in the estimation of the spatial autocorrelation coefficient. Both estimators show similar performance in estimating the coefficient of X 1 , whether the spatial matrix is based on the nearest neighbor criterion (W1) or the inverse distance (W2). The new estimator introduced, SAR-PPML 1stStep-ML, shows better performance than SAR-PPML 1stStep-OLS for the coefficient of X 2 when the spatial matrix is W1. The improvement in performance of SAR-PPML 1stStep-ML over the SAR-PPML 1stStep-OLS is especially visible in the estimation of ρ for both spatial weighting matrices.

Empirical Application
This section illustrates the usefulness of the SAR-E regression introduced in Section 2 by an empirical example that estimates a knowledge production function to explain the creation of innovation in European regions. For the sake of comparison, the estimator of [12] is also calculated.
Following the arguments of [27], the number of patents in a given region per million inhabitants (Pat) is used as a proxy for knowledge creation. See also [28] for a discussion on measuring innovation. The equation to be estimated is where x i is a vector with explanatory variables that will be introduced in Section 4.1, β is a vector of unknown coefficients to be estimated, ρ is the unknown spatial autocorrelation coefficient, and w ij are the elements of a spatial weighting matrix. In this empirical application, the spatial weighting matrix was calculated based on a queen contiguity criterion and is row normalized. All estimations were conducted using R [20]. The exploratory data analysis was performed according to QGIS [29] and GeoDa [30].

Data and Variables
The data were collected from Eurostat regional statistics. They contain data on 234 NUTS II regions from 24 European countries, of which 22 belong to the European Union, with the addition of the United Kingdom and Norway. NUTS is a nomenclature of territorial units for statistics developed and regulated by the European Union, defining a hierarchical system of regions with three different levels. At the top of the hierarchy are the NUTS 0 regions, referring to countries. The next level is NUTS 1, representing major socioeconomic regions within countries, followed by NUTS 2 regions, which are subdivisions of NUTS 1, and NUTS 3 regions, which are subdivisions of NUTS 2. All data refer to 2012. Regions with no neighbors were excluded (like Portuguese and French islands). Finally, NUTS II London (UK) and Centre (France), were discarded for incongruity of data. The list of countries in the database is in Appendix B.
The description of variables used in this study, together with the expected outcome of the estimated associated coefficient, can be found in Table 1. Since [31] introduced the knowledge production function, the use of variables related to R&D has become normal when modeling the creation of innovation. Following [27,28,[32][33][34] different impacts on the creation of knowledge from expenditure and human resources in R&D were considered according to its source (from the private and business sector, from government, or from universities). It is expected that more R&D expenditure, as well as more full-time R&D employees, will trigger an increase in knowledge creation. Therefore, the expected outcome of the estimated coefficients related to these variables should be positive. However, the literature suggests that this happens only for the R&D resources of the private sector. For both the public sector and universities, the effect of those variables often appears to be negative or statistically negligible (see, e.g., [27,33,35]). This behavior may be explained in the case of universities by the fact that their main contribution to knowledge creation arises in the form of scientific articles and not patents, while for the public sector it may be due to a certain inefficiency of public institutions in the production of knowledge (see [33,34]).
Three variables aiming to capture the effect of the "innovative environment" are considered. The first is the percentage of graduates in the population between 25 and 65 years old, proxying the level of education of the population in the region. The second is GDP per capita, which proxies the technological sophistication and the size of the economy. Finally, the third is the tuberculosis mortality rate, considered as a proxy for the level of poverty of the inhabitants, as several studies relate tuberculosis with poverty (see, for example, [33]). It is expected that a better socioeconomic environment will boost innovation (as in [33]). In addition, the number of inhabitants was defined as the control variable. Table 2 includes the descriptive statistics of the variables used in this study. Additionally, we note that 6% of the regions in the sample registered no patents.
The correlation matrix of these variables is shown in Table A11 of Appendix B. Pairwise correlations between explanatory variables do not exceed the common threshold of 0.8, as recommended in [36], which leads us to not anticipate collinearity problems in estimation.

Exploratory Spatial Analysis
Analyzing the spatial distribution map of the variable Pat per quartile in Figure A1 in Appendix B, we see the existence of a cluster effect, with a concentration of patenting taking place in Central Europe, Southern England, and Scandinavia, with the number of new patents in Southern and Eastern Europe being modest. On the other hand, Moran's I test for spatial autocorrelation, applied to patents, shows a value for the test statistic equal to 0.6045, with a p-value equal to 0.000, denoting evidence of positive spatial dependence. This conclusion is supported by the Moran diagram ( Figure A2 in Appendix B). Analyzing the latter, it is worth noting that most of the observations are in the 1st and 3rd quadrant, and therefore, the majority of regions with a higher (smaller) number of new patents have neighboring regions where this number is also higher (smaller). Analyzing the LISA3 indicators in Figure A3 of Appendix B, we see two high patent clusters in Central Europe and Scandinavia, together with low patent clusters in the Iberian Peninsula and Eastern Europe. It is also possible to identify two other clusters where patenting tends to be low, northern Britain and southern Italy. Figure A4 of Appendix B shows the LISA significance map, inferring that the results are more significant concerning the Central European, Iberian Peninsula, and Eastern Europe clusters.

Estimation Results
Equation (7) is estimated with the introduced SAR-PPML 1stStep-ML estimator. For comparative purposes, the results obtained with the alternative estimator from [12], the SAR-PPML 1stStep-OLS, are presented as well.
In SAR-PPML 1stStep-OLS an ad hoc constant (c = 1) was added when patents were 0. Table 3 includes the estimates for coefficients of the knowledge production function, together with their bootstrap standard errors. Bear in mind that, because both estimators use an explanatory variable that is the result of a fit obtained in the first step, the usual standard errors are not valid. The introduced PPML 1stStep-ML behaves better in terms of goodness of fit, with the value for the loglikelihood being noticeably higher.
For both estimations, the coefficient related to the spatially lagged variable is positive and significant (p-value < 0.01), thus inferring that there is a clear positive spatial dependence between regions regarding innovation creation, which matches the results of [34,37].
As for the remaining explanatory variables, the variable R&D_B appears to be significant at 1% in all estimates. In contrast, R&D_U is not significant, which can be explained by the fact that university contributions are mostly in the form of scientific articles and not patents, as mentioned before. On the other hand, R&D_G is significant at 10% in SAR-PPML 1stSep-ML; however, it presents a negative sign. These results converge with those of [32][33][34], which also disclose evidence of inefficiency in the use of R&D resources of the public sector. In addition, these authors also conclude that R&D expenditures in the private sector are more important to trigger knowledge creation than those from the public sector or universities. Notes: Standard errors were computed using the bootstrap method. Significance levels: * 10%, ** 5%, *** 1%.
The variables related to the "Innovative Environment," Educ and Pop, are not statistically significant, while GDP is statistically significant at 1% in both estimations, with a positive sign. Finally, the mortality rate appears significant at 5% only in the SAR-PPML 1stStep-ML, showing a negative sign. These results are in line with expectations, as greater technological sophistication is generally associated with lower levels of poverty and higher quality of life, which fosters the growth of innovation in a region. These results corroborate studies such as [28,33], the authors of which conclude that an "innovative environment" is important for increasing knowledge creation.
Given the nonlinearity of the model, it is through the average partial effects (APE) that it is possible to quantify the impact of the variation of the explanatory variables on the dependent variable, ceteris paribus. Given the autoregressive structure of the model, it is possible to measure the indirect partial effects, that is, spatial externalities, together with the direct ones. These are included in Table 4. Concerning the average direct effects, and the SAR-PPML 1stStep-ML, an increase of 1 percentage point in the tuberculosis mortality rate in the region results, on average, in a drop of 20.4 patents per million inhabitants in the same region. On the other hand, an increase in GDP per capita of just 100 euros in the region may trigger an increase, on average, of 0.4 patents per million inhabitants in the same region.
Regarding the variables of expenditure on R&D, these can present the most interesting results for economic decision makers. An increase of 10 euros per capita in a region in public R&D entities means, on average, a decrease of 2.25 patents in that region per million inhabitants. Keep in mind that the respective coefficient estimate is statistically significant only at 10%. Now, given the inefficiency inferred there, a policy maker may transfer the financial resources of these institutions to private R&D companies, since these, for each increase of 10 euros per capita in R&D expenses, trigger an increase of approximately one patent per million inhabitants in the same region. The spatial distribution map of the SAR-Poisson 1stStep-ML direct partial effect (DPE) per quartile related to the variable R&D_B is represented in Figure A5 in Appendix B. It is clear that the regions with the most efficient companies for transforming R&D expenses into patents are located in Central Europe, southern Great Britain, and Scandinavia. Therefore, regions in Eastern and Southern Europe should initiate reforms in the private R&D creation system, seeking an increase in efficiency. These reforms require the recruitment of more qualified personnel and investment in more sophisticated technology.
As for the indirect effects, the variables related to the "innovative environment" show higher indirect effects in absolute value than direct, showing that not only the socioeconomic situation of the region is central to the creation of knowledge, but also the interregional environment.
Concerning the R&D expenditure variables, investment in government R&D institutions also does not benefit neighboring regions in the knowledge creation process, since both the spill-out effect and spill-in effect are negative. On the other hand, investment in private R&D in one region will have a positive impact in neighboring regions: a variation of 10 euros per inhabitant in private R&D expenditure in all neighboring regions of i results in an increase of 1.74 new patents in region i. Conversely, an increase of 10 euros per inhabitant in expenditure on private R&D in region i results in an increase, on average, of 1.68 in all neighboring regions. These facts highlight the presence of knowledge spillovers between regions. Figures A6 and A7 in Appendix B represent the spatial distribution map of spill-in and spill-out effects per quartile obtained with the SAR-PPML 1stStep-ML, respectively, of the variable R&D_B. It can be concluded that, in addition to the Central European cluster, which shows a strong relationship in the creation of knowledge, regions in Southern and Eastern Europe, as well as some regions in Southern England, have a remarkable capacity for absorbing innovation. Regarding the spill-out effects, the Central European and Scandinavian clusters are the biggest "exporters" of knowledge spillovers. Interestingly, some regions that present a lower DPE with the investment in private R&D, as is the case in Eastern Europe and the North of the United Kingdom, present higher values of spill-out and spill-in. Therefore, one may conclude that, despite having a lower capacity for innovation, these regions have a strong interconnection, which leads to high levels of knowledge spillover. This can be explained by a possible commitment of companies to strong interregional cooperation links, so that investment in one company is positively reflected in the others. These links can be a strategy to overcome the difficulty of competing solo against regions with high numbers of patents. Therefore, political and economic decision-makers in regions with lower patent capabilities should create incentives for the creation of knowledge-sharing networks, thus enabling increased competitiveness.

Conclusions
Many applications in spatial econometrics concern the modeling of count outcomes or other nonnegative variables. This work proposes modeling such variables by a spatial autoregressive exponential (SAR-E) regression instead of using SAR loglinear models, in line with the reasoning of [4] in the context of cross-sectional data. A two-step PPML procedure for the SAR-E model is suggested that circumvents the problem of dealing with zeros. A simulation study verifies that the introduced estimator shows better performance than the previous estimation procedures, independently of the sample size, especially when the autoregressive coefficient is not close to 1, which is the case for many applications with economic data.
The usefulness of the proposed approach is illustrated in an empirical application to analyze the main determinants of knowledge creation and to quantify the spatial knowledge spillovers across different European NUTS II regions. There, evidence of the spatial dependence on the creation of innovation in Europe is found. In addition, it is inferred that social and economic factors determine the creation of knowledge, as is the case with quality-of-life standards and technological sophistication. It also appears that public R&D institutions are inefficient, unlike private institutions, with the latter being the major promoters of innovation creation in the analyzed regions. It is also inferred that an increase in R&D expenditure by private institutions positively influences the creation of innovation in neighboring regions. Given these results, it is possible to conclude that regions with low levels of knowledge creation try to overcome this limitation by strengthening relationships with neighboring regions, thereby increasing the absorptive capacity for innovation and creating strong clusters of knowledge sharing.
In the empirical study, there were some noteworthy differences in the results obtained with the method introduced in this paper to estimate the SAR-E, two-step PPML, and the existing method of [12] concerning both the statistical significance and magnitude of some coefficient estimates, namely the autoregressive parameter. Differences in the latter explain the visible differences in the indirect effects of variables, the spill-in and spill-out spillovers, obtained with the two methods, where spillovers obtained with the proposed method are higher in absolute value. These differences are not unexpected because the data for the response variable show a clear percentage of zeros. Given the results of the simulation study and the fact that the method introduced here is better able to handle zeros in the dependent variable, it is expected that it will deliver better estimates.

Conflicts of Interest:
The authors declare no conflict of interest.    Table A3. Bias: SAR-Poisson, SAR-LogLinear, and Aspatial ML Poisson with W2 for 1000 replicates.   Notes: (1) Bias is estimated as the average of 1000 simulation replicates of the difference between the parameter estimate and its true value.

Appendix A
(2) RMSE is estimated as the square root of the sum between the square of the bias and the empirical variance of the estimated coefficient calculated after 1000 replicates. (3) SAR-Poisson 1stStep-ML is estimated using a two-step process. In the first step, the unobservable conditional mean, µ, is estimated using a PPML regression, and in the second step, the coefficients β1, β2, and ρ are also estimated using a PPML regression. (4) SAR-Poisson 1stStep-OLS is estimated using a two-step process. In the first step, the unobservable variable Wlog(µ) is estimated using an OLS regression of Wlog(y), adding an ad hoc constant (c = 1) when y = 0 and, in the second step, the coefficients β1, β2, and ρ are estimated using a PPML regression. (5) W1 and W3 are contiguity matrices created using the nearest neighbor criterion, where it is computationally defined that each unit will have seven units as neighbors for W1 and four units as neighbors for W2, which are the closest. W2 is created based on an inverse distance criterion, using the Euclidean distance between each unit.

Appendix B. Countries in the Sample
The 234 NUTS II regions in the dataset used in the empirical application come from the following countries: Bulgaria, the Czech Republic, Denmark, Germany, Estonia, Ireland, Spain, France, Croatia, Italy, Latvia, Lithuania, Hungary, the Netherlands, Austria, Poland, Portugal, Romania, Slovakia, Finland, Sweden, the United Kingdom, and Norway. Belgium, Switzerland, and Greece were discarded given the considerable lack of data in several Nuts II from these countries.