The location pattern of any industry is the product of a large number of individual decisions. Industrial location analysis investigates these location decisions and seeks to detect location determinants that trigger and influence such decisions. These determinants are generally referred to as location factors. A thorough understanding of the impact of location factors on firms’ location decisions and firm performance can have important implications for stakeholders. Managers and entrepreneurs can integrate valuable information into the decision making process when choosing the location of a new venture [1
Policy makers at the regional, national, and multinational level want to promote economic growth by developing the right location factors to create a beneficial environment for firms. The long-standing study of industrial location research [2
] has brought forward a wide range of location factors which can be studied at different levels of geographic aggregation, from the immediate firm neighborhood to highly aggregated spatial units. However, the analyzed location factors may vary in direction and strength at different levels of analysis and findings from aggregated spatial vary depending on the spatial scale at which the analysis is conducted [3
]. This issue is generally referred to as the Modifiable Areal Unit Problem (MAUP), which is defined through a location, a scale and a shape dimension [4
]. The selection of the appropriate level of analysis is therefore crucial, especially in studies which evaluate public policies [7
], and must be based on reasonable and transparent assumptions.
Such assumptions rely on a thorough understanding of geographic aggregation effects on statistical inference. While the effects of non-geographic aggregation on inference are well studied in economics [9
], research on geographic aggregation is rather scarce. Amrhein [11
] finds that scaling has strong effects on regression coefficients and correlation statistics. However, it is unclear how robust these results are in an empirical setting as simulated data was used in this study. Arauzo-Carod et al. [12
] and Manjon-Antolin et al. [13
] find only minimal zonation effects on regression results. Briant et al. [14
] use administrative spatial units and gridding to assess both the scaling and shape dimension of the MAUP. They find that the use of different spatial units results in different regression coefficients. Overall, the understanding of the MAUP in industrial location analysis remains incomplete and Arauzo-Carod et al. conclude in their meta-study on industrial location research that “[…] the reported effects may not be robust to the use of alternative geographical units and the presence of spatial effects. In general, it is not clear what effects spatial aggregation and spatial dependence may have on the inference” [15
]. Most previous studies analyzed firm location patterns aggregated at rather crude spatial scales, such as counties or metropolitan areas, and thus there is a lack of understanding of location determinants at the microgeographic level. The varying direction and strength of location factors at different levels of aggregation may lead to superimposed location factors which are missed when aggregated geographic units are analyses. Some location factor-firm relationships which are relevant at the macro level (aggregate) may not be so at the micro level (ecological fallacy
Suitable data for such a microgeographic analysis has become available only recently through the emergence of Volunteered Geographic Information (VGI) [16
] and the increasing availability of official (open) geodata [17
]. The OpenStreetMap (OSM) project is of particular interest in the context of firm location analysis as it goes beyond mapping ordinary road networks: The informal OSM standard contains hundreds of tags in over 25 categories and includes map features such as amenities and public transport stations [20
]. Up to now, only few studies have utilized the potential of OSM in firm location analysis and geographic economic analysis in general [21
]. However, these studies did not use OSM in a large-scale spatial analysis but concentrated on single cities and a strongly limited set of location factors. Following the analysis of previous research efforts, the research questions for our work are defined as follows:
Are the effects of location factors, as reported by previous studies using aggregated spatial units, robust at the microgeographic level?
How does a firm location prediction model perform at the microgeographic level and to what degree does it provide valuable new insights into the firm allocation process? What are the distinct requirements to the data and the statistical model?
To answer the research questions above, we analyze firm location patterns at the microgeographic level using spatial firm-related data that are available in unseen detail compared to previous studies. We combine this unique data set of three million geocoded street-level firm observations in Germany with OSM data and other detailed geodata (population density, land cover, railway stations, education levels, life expectancy, and many others). We investigate whether findings from previous industrial location studies hold true at a small spatial scale, i.e., at fine spatial resolutions. In general, regular gridding reduces the bias induced by the use of predefined administrative units [24
]. In our study, we focus on the software industry, which is rather unrestricted in its location decisions [22
], inducing only little bias from unobservable location determinants.
First, we investigate the software firm location pattern in an Exploratory Spatial Data Analysis (ESDA). We find that Poisson regression is likely to be an appropriate method to model the pattern of software firms aggregated at a regular 1 km grid, whereas negative binomial regression seems to be appropriate for higher levels of aggregation due to over-dispersion in the point pattern. Further, we find that software firms are an urban phenomenon, as they are disproportionally frequent in and around urban areas and even form statistically significant hotspots in some city regions. We further conclude that the regional settlement structure (polycentric vs. monocentric) seems to have an impact on the location pattern of software firms.
In a consecutive step, we construct a Poisson regression model to predict the number of software firms per 1 km grid cell using a large set of location factors. In the regression analysis, we include 24 different agglomerations, infrastructure, socio-economic, topographical, and amenity location factors. We interpret the estimated regression coefficients to deduce the relationships between the location factors and software firm counts. Due to identification limitations [25
] in our model, we abstain from tagging causal relationships and rather concentrate on the predictive performance of our model. However, by comparing our estimates with estimates from previous studies, we are able to discuss differences in the location factor-firm count relationships at different levels of geographic aggregation. We find that our model’s overall performance is good as it is able to redraw the software firm pattern to a high degree and yields reasonable coefficients, which are in line with prior research. Inter alia, we are able to show that regional population centrality (which we operationalize using the Urban Centrality Index [27
]) is a significant predictor of local software firm numbers at the microgeographic level. However, we also find that our model has a weak performance in highly segregated cities with quarters characterized by populations with dissimilar socio-economic profiles. Due to data limitations, we are not able to capture this microgeographic heterogeneity in the population structure. When considered at the aggregate city level (25 km grid), this systematic prediction error is levelled and the model yields systematic (spatially autocorrelated) errors in areas which were identified as software industry hotspots in the ESDA. This indicates that our model specification misses some crucial location factors present in these areas or some of the model’s assumption are violated (e.g., the independence between individual location choices).
In this paper, we presented a software firm location prediction model using Poisson regression and OSM data. We used a comprehensive dataset of three million street-level geocoded firm observations to explore the location pattern of software firms in an Exploratory Spatial Data Analysis (ESDA). Then, we used a variety of predictor variables to assess spatial factors that influence the location process of software firms. Our study shows that OSM can be used to construct location factors which are suitable for an encompassing microgeographic firm location analysis. Its coverage, completeness, and degree of detail makes OSM a promising yet underused data source in the context of firm location analysis and geographic economic analysis in general, also because the data are easy to obtain for many parts of the world. We also highlighted further application opportunities for OSM and other VGI data (e.g., geocoded data from social network sites) in this context. Our research questions defined in the introductory section can be answered as follows.
6.1. RS1: Scale-Robust Location Factors
We found that the microgeographic level of analysis provides new insights into the firm allocation process, but also that most location factors are scale robust. That is, our findings with respect to location factor effects are in line with prior research using aggregated spatial units. However, for a thorough understanding of MAUP scaling effects on location factor-firm correlations, our encompassing regression specification should be applied to different levels of geographic aggregation. Such an analysis could also investigate whether some location factors are more scale sensitive than others and whether the chosen operationalization approach alters the estimated effect of the location factors (e.g., “proximity to universities” could be measured by a binary variable, a count variable, or a continuous distance variable; recent research indicates that distance-based methods may be scale-robust [89
6.2. RS2: Microgeographic Location Prediction
We demonstrated that our microgeographic prediction model is able to predict the location of software firms to a satisfying degree, but it comes with particular requirements to the statistical model and the data employed in the analysis. The detailed level of geographic aggregation requires the researcher to employ a statistical model, which is adapted to the specific requirements of the level of analysis. In our specific case, statistical over-dispersion is less problematic, whereas excess zeros are a serious issue. At the same time, our analysis requires high resolution geodata, which may not be available in all domains. In our study, low resolution geodata on socio-economic population characteristics lead to unobserved microgeographic heterogeneity within cities, causing systematic prediction errors.