Geographical and Economic Factors A ﬀ ecting the Spatial Distribution of Micro, Small, and Medium Enterprises: An Empirical Study of The Kujawsko-Pomorskie Region in Poland

: Micro, small, and medium-sized enterprises (MSMEs) are an essential part of economies at the national, regional, and local levels. Understanding the determinants of the development of this sector is interesting not only for researchers but also for local governments to support the development of this sector. This paper analyses micro, small, and medium enterprises at the gmina (local) level in one region, the Kujawsko-Pomorskie voivodship (NUTS2) in Poland. The authors use multivariate linear regression, spatial econometrics, and classiﬁcation trees to model the inﬂuence of di ﬀ erent factors on the number of enterprises relative to population size. The authors found that the most crucial factor in all cases, independently of the method used, is the local government’s revenue from personal income tax per capita. This ﬁnding, together with the lack of signiﬁcance of variables related to the distance to technological parks or economic zones, indicates that the enterprises in the region produce mainly for local consumption and lack innovativeness. The authors also examined the inﬂuence of spatial context on the number of enterprises. The most important factor seems to be the percentage of built-up areas, but there are also others, depending on the model type; again, this conﬁrms the local character of the activity of micro, small, and medium enterprises in the region. Variables representing the spatial context can explain the relative number of enterprises with coe ﬃ cient of determination (R 2 ) between 0.30 and 0.45, which shows that this context played a relatively signiﬁcant role in the development of the MSME sector in the region. On the other hand, the econometric models (that include the neighborhood) are only signiﬁcant (improving R2) for medium enterprises, which means that medium enterprises expand their activity beyond the local range.


Introduction
Micro, small, and medium-sized enterprises (MSMEs) are an essential part of economies at the national, regional, and local levels. They provide jobs, create investment opportunities, and develop economic potential required for development. Therefore, the literature has researched the understanding of the determinants of the development of this sector. As emphasized by [1], a region facilitates the formation and transmission of social capital as well as geographic platform from geographical proximity may facilitate interorganizational learning, but it is neither a necessary nor a sufficient condition. Understanding the diverse patterns of geographical proximity at the local level is therefore an important factor contributing to the identification of possible solutions aimed at enhancing innovation.
The degree of urbanization affects localization economies. Spatial agglomerations create localization advantages in terms of spillovers and cooperation between firms. The benefit of urbanization comes, among others, from lower transport costs and proximity to suppliers and customers [10][11][12]. Access to customers can be proxied by the degree of urbanization. One of the important characteristics related to urbanization is access to transportation infrastructure, which has been found to have a significant effect on urban growth [13].
The firm size also matters. Companies with different sizes have different characteristics, both with respect to sectoral structure but also the ability to innovate. Studies show that there is a negative relationship between firm size and employees' tendency to become entrepreneurs (e.g., [14][15][16]).
Ref. [17] shows that the occurrence of the small firm effect is closely related to low entry costs in industries typically clustered by small firms. Not only do the industry entry conditions of small parent firms have a strong and positive impact on the likelihood of employees' entrepreneurial entry, but entrepreneurs spawned by small firms show a strong preference for starting a business in the exact same industry as their previous employer. Additionally, evidence in the literature shows that there is a link between firm size and innovative performance [18][19][20]. Finally, smaller firms face more obstacles than larger firms with financing, taxes and regulations, inflation, or anti-competitive practices. For these obstacles, small firms have the biggest problems, followed by medium-sized and large firms [21]. Moreover, [22] analysis by firm size shows that industrial structure and corporate organization affect the benefits that arise from clustering within a given industry. This finding is strongest with regard to the size distribution of establishments; own-industry at small establishments presents a much greater attraction to potential new arrivals than does a comparable level of own-industry employment at larger establishments.
Multiple factors, both at the regional and local levels, affect the creation process of new firms, their survival, and their innovative capacity. The evidence in the literature indicates that these include, among others, population growth, a high percentage share in the population of people with high managerial and vocational skills, urban concentration, household wealth, and demand [23]. Factors that are conducive to the development of small and medium-sized enterprises (SMEs) differ locally. Evidence from Finland shows that urban and rural communities provide different environments for enterprise development, particularly with regards to human capital, access to technology, the number of cluster enterprises, and the intensity of communal cooperation [23]. However, in their analysis of SMEs' development in the UK, [24] conclude that business growth is possible under different territorial conditions, including different levels of competition and market demand between regions and differences in the occupational and skill structure of the labor market. Many SMEs in peripheral regions may actively work to develop strategies to overcome these constraints. Therefore, an initial locational disadvantage may ultimately benefit rather than inhibit a company's growth and performance. The growth of indigenous companies in peripheral regions results from more active and autonomous roles of the entrepreneurs as well as local governments in creating an entrepreneurship-friendly environment. In turn, this can contribute to the economic and social development of the local areas [25,26]. These considerations can also be placed in the context of the role of the MSMEs in the economic transition in Central and Eastern Europe (CEE). In recent years, there are three major spatial developments that can be observed in the CEE countries [27]. First, there are increasing differences in development between urban core regions that develop better than peripheral rural regions. Second, there are strong trends towards polarization between their main metropolitan area (usually the national capital) and the rest of the country. Third, there is an east-west gradient, with the western parts performing better than eastern regions, which is also confirmed by [28]. As shown by [29], there is a risk that spatial development further concentrates in a smaller number of (metropolitan) regions, whereas more and more other regions might be affected by the processes of peripheralization. Ref. [30] show that the regional inequalities in CEE countries are strong and persistent. Weaker CEE regions typically lost the greatest part of their industrial base which, being in capital-intensive sectors, was more exposed to international competition; as a result they are more exposed to recession shocks, which gains importance given the recent COVID-19 pandemic situation. An analysis of the non-core regions in CEE by [31] shows that, after the financial crisis, the increasing number of SMEs, along with substantial R&D outlays and the development of human capital, were important stimuli for development. Analyses at the CEE intrastate level conducted by [32] show that while there was general catching-up with the EU-15 average by the state economies, there was also growing economic diversification between regions in the studied countries (internal divergence). Analysis of [33] indicates that, following the Williamson's curve, disparities between regions are lower in the early stages of development, peak in the middle-income stages, and diminish again as a country becomes wealthy. Development policies must not focus extensively on the country as a whole but have to take into account the preferences and possibilities of their peripheral regions as well. To that end, [34] show that while considering business stimulation policies, both the quantity and the quality of the new firm start-ups should be taken into account.
The regional determinants of the growth of SMEs have also been investigated in Poland. According to [35], the human capital development, wages, unemployment, economic activity of the population, and disposable incomes are essential for the development of SMEs. The study of enterprises in southern Poland indicates that access to financial instruments at the regional level is an essential barrier to enterprise development [36]. Simultaneously, IT infrastructure, closeness to markets, suppliers, and cooperators are the most critical factors stimulating such development.
In this article, the authors focus on analyzing the local determinants of the spatial distribution of micro, medium, and small enterprises in the Kujawsko-Pomorskie region in Poland. This region for many years has lagged behind the economic development in the rest of the country. Its innovative capacity remains one of the lowest, both when taking into account human capital as well as investment in research and development [37]. Given these challenges, this article was developed as a part of action research activities that aim at co-learning leading to developing knowledge and understanding necessary to design effective policies aiming at boosting innovative MSME growth in the region [38] under the umbrella of the "REGIOGMINA" project, led by the regional authorities. As discussed by [39], the choice of incubation strategy depends on the characteristics of local areas. Therefore, providing knowledge on the differences and similarities between different municipalities is an important contribution to develop the regional strategy.
The number of micro, small, and medium-sized companies per 10,000 people of working age in the Kujawsko-Pomorskie region remains below the national average ( Figure 1). In the case of micro and medium-sized companies, the gap between their number and the national average has widened in recent years, while in the case of small companies, it has remained stable. The distribution of the number of companies is asymmetric and the median number of companies is below average in the region. At the same time, there are several gminas, with the number of companies per 10,000 people of working age distinctly above the regional average. As [24] emphasize, the close analysis of such "paradoxical" cases can open up new perspectives for regional and local policies.
in the region?
• Are spatial factors important in explaining the local MSMEs' development in the region, and can they be successfully included in the analytical models? In addition to a theoretical understanding of the factors determining the existing structure of micro, small, and medium companies, the authors also identify local areas with higher levels of entrepreneurship than is explained by the analyzed factors.  There are also differences in the sectoral structure of the MSMEs both depending on the size of the companies and at the municipal level. In larger firms, the share of industry is higher. There is also spatial variation-there is a clear gradient between the shares of firms in industry and services, which also depends on locality ( Figure 2). Sectoral structure by enterprise size (NACE 1 sections)

Share of enterprises in services and industry in municipalities
Ratio of enterprises (payers) in services to industry in the municipalities of the Kujawsko-Pomorskie region Figure 2. Sectoral distribution of enterprises by company size and municipality (source: own work based on Social Insurance Institution data).

Data
The register of Social Insurance Institution is the data source on the number of micro, small, and medium-sized enterprises at the gmina level. We define micro enterprises as those that have 1 to 9 employees, small as those with 10 to 49 employees, and medium as those with 50 to 249 employees. Employees are identified as those for whom the enterprises pay social insurance contributions. The information on the number of companies paying social insurance contributions was collected for December 2018 in the "REGIOGMINA" project. The use of the administrative register enables the identification of active companies based on their actual employment; therefore, it provides timely and accurate information on the size of the MSME sector and its structure at the local level. The analytical units are gminas in the Kujawsko-Pomorskie region. This choice forced the selection of 1 Statistical classification of economic activities in the European Community. The novelty of the authorial approach is threefold. Firstly, the authors focus on the lowest administrative level, gmina (meaning municipality in the urban context or commune in the rural context) as the main actor in creating a friendly environment for conducting business activity [40]. Secondly, the authors use a unique database of local-level enterprises obtained from an administrative register, namely the Social Insurance Institution database. Therefore, the authors use the best possible source of information on Poland's actual activity (and employment) of micro, medium, and small enterprises. Additionally, the authors use gmina-level data from the Local Data Bank of Statistics Poland. Thirdly, the authors also consider spatial factors to explain differences in the MSMEs' development in the Kujawsko-Pomorskie region. The two main research questions are:

•
Which factors affect the current distribution of micro, small, and medium enterprises in the region? • Are spatial factors important in explaining the local MSMEs' development in the region, and can they be successfully included in the analytical models?
In addition to a theoretical understanding of the factors determining the existing structure of micro, small, and medium companies, the authors also identify local areas with higher levels of entrepreneurship than is explained by the analyzed factors.

Data
The register of Social Insurance Institution is the data source on the number of micro, small, and medium-sized enterprises at the gmina level. We define micro enterprises as those that have 1 to 9 employees, small as those with 10 to 49 employees, and medium as those with 50 to 249 employees. Employees are identified as those for whom the enterprises pay social insurance contributions. The information on the number of companies paying social insurance contributions was collected for December 2018 in the "REGIOGMINA" project. The use of the administrative register enables the identification of active companies based on their actual employment; therefore, it provides timely and accurate information on the size of the MSME sector and its structure at the local level. The analytical units are gminas in the Kujawsko-Pomorskie region. This choice forced the selection of explanatory variables that are available in such disaggregation. One should note that most of the statistical information on socio-economic development is provided at the powiat (district)(NUTS 4) or regional (NUTS 2) level.
The response variable adopted in all the methods was the number of micro, small, and medium-sized enterprises (calculated per 10,000 inhabitants of the gmina- Figure 3). explanatory variables that are available in such disaggregation. One should note that most of the statistical information on socio-economic development is provided at the powiat (district)(NUTS 4) or regional (NUTS 2) level. The response variable adopted in all the methods was the number of micro, small, and mediumsized enterprises (calculated per 10,000 inhabitants of the gmina- Figure 3). The Social Insurance Institution's registry has provided the number of enterprises. The authors used the registry data, as it includes up-to-date information on the number of people for whom an enterprise paid social security contributions in December 2018. This number differs from the data provided by Statistics Poland: the enterprises do not regularly update it, so the latter shows a higher number of companies.
The explanatory variables come from the following sources: • Statistics Poland data, including population and labor market characteristics as well as community wealth, using tax income as a proxy: The Social Insurance Institution's registry has provided the number of enterprises. The authors used the registry data, as it includes up-to-date information on the number of people for whom an enterprise paid social security contributions in December 2018. This number differs from the data ISPRS Int. J. Geo-Inf. 2020, 9, 426 8 of 28 provided by Statistics Poland: the enterprises do not regularly update it, so the latter shows a higher number of companies.
The explanatory variables come from the following sources: • Statistics Poland data, including population and labor market characteristics as well as community wealth, using tax income as a proxy: the number of people in gminas, including people in pre-productive, productive, and post-productive ages; the number of registered unemployed (total, men, women); the number of registered unemployed per 100 inhabitants (derived by the authors); gminas' own revenues from personal income taxes (PIT) per capita; local budgets' total expenditure per capita.
• Data from the local policy assessment supporting the entrepreneurship development, obtained from [41].

•
Data from the Kujawsko-Pomorskie regional authorities: the number of projects supporting entrepreneurship development funded from the European Regional Development Fund (ERDF) and the amount of funding provided to these projects.

•
Spatial data of the Head Office of Geodesy and Cartography from the Topographic Object Database with 1:10,000 Level of Detail regarding, in particular, the degree of urbanization. Following [42,43], we proxy the transportation infrastructure using the road density; we also take into account the characteristics of the landcover. Therefore, the authors include the following: land cover: built-up areas (PTZB class, namely land cover: built-up); land cover: agricultural areas (PTTR02, namely land cover: grassland and arable farming); land cover: orchards (PTUT03, land cover: permanent cultivation); transport network: roads (SKDR, namely transport route: roads); transport network: railway tracks (SKTR, namely transport network: a rail or tracks); location of large cities (from ADMS class, a territorial division unit: a town); the geometry of administrative areas (from ADJA class, an administrative division unit).
• Spatial data from other sources ( Figure 4): Location of large enterprises in the voivodship, obtained using the ranking of the largest companies in Poland. Cooperation with large companies can be an essential lever for increasing the potential of smaller companies, especially in the innovative (technological and organizational) dimension. Large companies need smaller ones because they are more agile and can propose innovative solutions. MSMEs often better know the local markets on which they focus. Specialized MSMEs can meet the diverse and complex needs of large businesses [44]. Distance to science and technological parks, obtained using the information provided by the website, "Invest in Kujawsko-Pomorskie". Science and technology parks create a base for the commercialization of scientific research, research cooperation, and knowledge transfer, which are vital for the development of MSMEs' innovation and entrepreneurship. These parks offer, among others, management support, training services, venture capital access, intellectual property consultations, and laboratory services [45]. Distance to areas of the Pomeranian Special Economic Zone, obtained using the information from its website. Special Economic Zones are instruments that support the MSME sector. The zones assure favorable conditions for business activity and foreign investment. Foreign companies operating within the SEZ provide new business standards such as technology, experience in production processes, business contacts, and good practice in training employees, which are exceedingly significant to the development of the SME sector; they are also their primary source of new technologies [46]. Distance to higher education institutions (HEIs) (source: National Court Register). HEIs are important knowledge alliance partners of the SMEs on the regional level; they constitute the source of tacit knowledge for innovative firms [47]. Location of the A1 highway exits (source: General Director for National Roads and Motorways): the road network on the local level is vital for economic development at both the local and regional levels, as accessibility is one of the main deciding factors in the location of new businesses [48]. In Poland, as [49] found, the more significant the investment in regional transport infrastructure, including national, regional, and local roads, the more visible the financial and economic outcomes of SMEs. transport infrastructure, including national, regional, and local roads, the more visible the financial and economic outcomes of SMEs. Spatial data were not used directly; they were used to enrich the set of data on gminas directly with attributes constituting the formalization of the spatial context and spatial relations of or in the gmina. Consequently, there were additional attributes created, describing the following for each gmina: • Percentage coverage of the gmina with a built-up area ( Figure 3).
• Percentage coverage of the gmina with agricultural areas.
• Percentage coverage of the gmina with orchards.
• Distances from the nearest of the following structures: o Bydgoszcz or Toruń (largest cities in the region with administrative Spatial data were not used directly; they were used to enrich the set of data on gminas directly with attributes constituting the formalization of the spatial context and spatial relations of or in the gmina. Consequently, there were additional attributes created, describing the following for each gmina: • Percentage coverage of the gmina with a built-up area ( Figure 3).

•
Percentage coverage of the gmina with agricultural areas.
• Percentage coverage of the gmina with orchards. • Road network density ( Figure 3). • Railway network density.

•
Distances from the nearest of the following structures: Bydgoszcz or Toruń (largest cities in the region with administrative functions-referred as "main cities" in Figures 2 and 3), another large town, Bydgoszcz, Toruń or another large town, a university, a large company, a technology park, an economic zone, a highway entrance/exit, key road infrastructure in the voivodship (national roads).

Methodology
The study of factors influencing the number of MSMEs in a gmina was conducted using several methods: • studying the correlation of variables, • multivariate regression, • models of spatial econometrics, • classification trees, First, the authors checked each potential explanatory variable's correlation to the number of micro, small, and medium enterprises separately. As in the more populated areas the number of MSMEs is obviously larger, the authors used in their analysis the number of enterprises relative to the population (that is, per 10,000 people).
The next step was to create classical multivariate regression models through two methods: • forward selection-beginning with an empty model and adding further explanatory variables, starting from the one that affects the explained model the most; • backward elimination-starting with a model with all the variables then removing subsequent variables, starting from the variable with the least significance.
Subsequently, based on the sets of variables defined for the regression, spatial econometric models. Such models are also known as geographically weighted regression (GWR) and allow us to include the spatial heterogeneity of the variables in the analysis. In classical regression models, the spatial neighborhood influence is omitted, although it can have significant impact on the explained variable. GWR are particularly used in the modeling of economic indicators (i.e., unemployment rate [50]).
The authors built four GWR models which differ with regards to the type of model and the matrix of weights used. Regarding the type of model: • a spatial lag model-assumes the influence of explanatory variables of neighbors on the response variable; • a spatial error model-assumes the relationship between the model error of neighbors.
Regarding the method of determining the matrix of weights (defining the neighborhood) that determines the influence of particular municipalities' values on each other: • a neighborhood-based matrix-a common border between the gminas indicates that a neighborhood exists; • a distance-based matrix-the neighboring degree is inversely proportional to the distance between gminas.
In studying correlation and in each of the regression models and spatial econometrics, the authors use a p-value = 0.05 as the significance level.
The next step involved making models that explain the number of micro, small, and medium enterprises per person, in particular, gminas using classification trees (Classification And Regression Tree types, CART) [51,52]. A significant advantage of this type of model is that, unlike linear regression or the spatial econometrics based on it, there are no preliminary assumptions about the linearity of the model. It is also possible to use explanatory variables in the various levels of measurement (variables used in the nominal and ordinal levels were added to the previously used variables).
However, the use of classification trees required the discretization of the response variable. The authors decided on the discretization into three classes (low, medium, high) based on the Jenks natural breaks classification method. While this approach results in a loss of informational content of the data (a "downgrade" of the level of measurement), it does produce higher readability and a more straightforward interpretation of the models created this way. The obtained results can be presented in the form of several logical IF-THEN conditions.
Before determining the trees, the authors checked the validity of the available predictors using the chi-square test to evaluate the results. The five most significant predictors (determined independently for each model), a breakpoint condition based on the Gini index, and a minimum number of nodes of 5 were employed to construct CART. Table 1 shows the correlation results. For any of the types of enterprises, there was no significant correlation between the area of the orchards, the distance from large cities (excluding voivodship cities), the distance to the A1 highway exit, the key road infrastructure in the voivodship, or the expenditure per capita in the gminas.

Correlation
The strongest correlation with the response variables was the gmina's own revenue from the PIT per capita, which indicates the wealth of the inhabitants of a given gmina. For all enterprises, there are strong and positive correlations between the percentage of built-up area and the density of the road network. Indirectly, both of these parameters show the urbanization level of a given gmina. It is also easy to notice that there were the fewest significant variables found for small enterprises.
However, one should note that the correlation values calculated in this way only indicate the relationship (or the lack of thereof) between pairs of variables, thus providing very general and limited information. One cannot exclude the existence of more complicated relationships, where, e.g., a combination of two variables that have insignificant correlations with a dependent variable will correlate with it significantly. Therefore, in the subsequent part of the study, regression models, spatial econometrics, and classification trees were created using more variables at the same time. Table 1. Explanatory variables and their correlation with the response variables (the number of enterprises of a given size per 10,000 inhabitants of the gmina). Statistically significant values (p < 0.05) are in bold and with an asterisk (source: own analysis). 1

Regression with Economic and Spatial Explanatory Variables
The authors created a series of regression models using forward selection and backward elimination methods based on the numeric variables listed in Table 1. Table 2 shows a summary of the results obtained.
In general, models created using the forward selection method typically had a smaller number of variables used in the model (and thus were simpler). Simultaneously, they had only a slightly lower coefficient of determination, and there was no problem with the mutual correlations of the explanatory variables. Instead, there occurred the problem of heteroscedasticity, which indicates that the factors not explained by the model are not random. The models for small enterprises were an exception to this rule, while backward elimination left one with a single explanatory variable.
It was possible to explain the most variability (around 2/3) for the enterprises of the smallest (micro) variability. The models fit slightly worse for small enterprises (around half of the variability), while models for medium-sized enterprises had the lowest coefficient of determination R 2 (improved by using spatial econometrics-see Section 3.3).
A gmina's revenue from personal income tax played a vital role in all the models made for every size of the enterprise; it positively correlated with the number of enterprises. The percentage of the built-up area in a gmina was another frequently occurring variable. As for the other variables, the models most often differ according to the enterprise's size and the way a model has been created. The standard errors of the forward selection models ranged from 18% of the mean value for micro-enterprises through 26% for small enterprises to nearly 47% for medium enterprises. Figure 5 presents the number of enterprises per 10,000 inhabitants, projected according to these models and the residuals from regression models as a percentage of the expected value. These results identify the local communities, where the relative size of the analyzed group of enterprises is significantly above or below the level explained by the model. The further analysis of the local determinants might indicate which conditions accelerate or hinder entrepreneurship development.
Additionally, the authors performed regression models based solely on spatial variables. As a result, it was possible to check the extent to which the number of enterprises relative to population These results identify the local communities, where the relative size of the analyzed group of enterprises is significantly above or below the level explained by the model. The further analysis of the local determinants might indicate which conditions accelerate or hinder entrepreneurship development.
Additionally, the authors performed regression models based solely on spatial variables. As a result, it was possible to check the extent to which the number of enterprises relative to population can be explained without referring to economic data. The authors used the backward elimination method; Table 3 presents the results. It is worth noting that, despite the use of a minimal set of output variables, the R 2 coefficient dropped by about 1/3 and remains between 0.31 and 0.45, depending on the enterprise's size. Thus, it is possible to explain this part of the variability by analyzing only the spatial environment of a given gmina without information-e.g., on its own revenue from PIT per capita (although it was the most significant of the correlates). It is also important to note the differences between the models for different sizes of enterprises. For micro and small enterprises, the density of the road network, the distance to the highway interchange (positively), and the density of the rail network (negatively) were relevant. For medium-sized enterprises, it was the percentage of built-up and agricultural areas. The proximity of a large urban center (including the voivodship cities) had a positive impact on the number of small and medium enterprises. Figure 6 shows the values predicted by the models based solely on spatial variables and their residuals as a percentage of the predicted value.
In all the regression models, there were much bigger negative residuals then positive ones. Positive residuals rarely exceeded 50% of the expected number of enterprises (for the models in Figure 5, only for medium enterprises). At the same time, negative residuals (that is, underestimating the number of enterprises per capita) appeared more often and some (single) extreme outliers came off.

Spatial Econometrics
For most models, the spatial econometric models (both lag and error) were not significant (regardless of the neighborhood matrix used). The models for medium-sized enterprises were an exception-in most cases, it was possible to improve the model prediction using information about the gmina's neighborhood (already indicated in Table 2 and Table 3). In all the cases, models using a matrix of weights based on the distance between gminas worked slightly better (Table 4, Figure 7).
It is worth noting that the use of spatial econometric models made it possible to increase the R2 coefficient for medium-sized enterprises: for complete models-from 0.43 to even 0.50; for models based solely on variables related to the gmina's environment-from 0.39 to 0.46.

Spatial Econometrics
For most models, the spatial econometric models (both lag and error) were not significant (regardless of the neighborhood matrix used). The models for medium-sized enterprises were an exception-in most cases, it was possible to improve the model prediction using information about the gmina's neighborhood (already indicated in Tables 2 and 3). In all the cases, models using a matrix of weights based on the distance between gminas worked slightly better (Table 4, Figure 7). What is interesting is the opposite (but comparable, in terms of strength) influence of two factors that would seem very similar: distance from large cities (including Bydgoszcz and Toruń) and distance from universities. It is interesting because the universities are in Bydgoszcz and Toruń. However, using a variable that takes into account the distance to large cities, excluding these two centers (instead of two variables that act oppositely), does not bring the desired effect-the variable D_Cities remains insignificant in all the variants.

Classification Using Classification and Regression Trees (CART).
The number of enterprises (micro, small, and medium) per 10,000 inhabitants served as response variables in the multivariate regression model. To obtain the comparability of results collected with the CART-type trees and visualization of the multivariate linear regression results, the same range limits were used to divide the data into three classes (Table 5, Figure 3-7). The classes defined in such a way were labeled low, medium, or high.
When explaining the number of micro enterprises, the analysis of the significance of predictors (Figure 8) indicates that the following are crucial in explaining the dependent variable: the gmina's It is worth noting that the use of spatial econometric models made it possible to increase the R2 coefficient for medium-sized enterprises: for complete models-from 0.43 to even 0.50; for models based solely on variables related to the gmina's environment-from 0.39 to 0.46.
What is interesting is the opposite (but comparable, in terms of strength) influence of two factors that would seem very similar: distance from large cities (including Bydgoszcz and Toruń) and distance from universities. It is interesting because the universities are in Bydgoszcz and Toruń. However, using a variable that takes into account the distance to large cities, excluding these two centers (instead of two variables that act oppositely), does not bring the desired effect-the variable D_Cities remains insignificant in all the variants.

Classification Using Classification and Regression Trees (CART).
The number of enterprises (micro, small, and medium) per 10,000 inhabitants served as response variables in the multivariate regression model. To obtain the comparability of results collected with the CART-type trees and visualization of the multivariate linear regression results, the same range limits were used to divide the data into three classes (Table 5, Figures 3-7). The classes defined in such a way were labeled low, medium, or high.  11-20 28 When explaining the number of micro enterprises, the analysis of the significance of predictors (Figure 8) indicates that the following are crucial in explaining the dependent variable: the gmina's own revenue from PIT per capita, the distance from large enterprises, the number of inhabitants in a given gmina, the percentage of built-up area, and the density of the road network. own revenue from PIT per capita, the distance from large enterprises, the number of inhabitants in a given gmina, the percentage of built-up area, and the density of the road network.  The developed CART model solely uses three of these decision variables, and it is easy to interpret. For instance, the properties of the leaf ID = 6 ( Figure 9, 10) (the preponderance of gminas with a large number of micro-enterprises in relation to the number of inhabitants) means that entrepreneurship (micro) in a given gmina is high if its own revenue from PIT per capita in this gmina is higher than EUR 175.2 (PLN 746.8-we used the medium exchange rate from 2018, which is PLN 1 = EUR 4.2617) and the distance from large enterprises is smaller than 23.6 km. These data confirm the conclusions from the regression analysis; they also indicate the role of large enterprises in entrepreneurship development at the micro-level, which conforms to the conclusions from the literature. Figure 11 shows the results of the classification carried out by the CART decision tree. Figure 10 indicates also the leaf number in the CART tree used to classify particular gminas. The spatial The developed CART model solely uses three of these decision variables, and it is easy to interpret. For instance, the properties of the leaf ID = 6 (Figures 9 and 10) (the preponderance of gminas with a large number of micro-enterprises in relation to the number of inhabitants) means that entrepreneurship (micro) in a given gmina is high if its own revenue from PIT per capita in this gmina is higher than EUR 175.2 (PLN 746.8-we used the medium exchange rate from 2018, which is PLN 1 = EUR 4.2617) and the distance from large enterprises is smaller than 23.6 km. These data confirm the conclusions from the regression analysis; they also indicate the role of large enterprises in entrepreneurship development at the micro-level, which conforms to the conclusions from the literature. distribution of the CART results is similar to that in the source expert classification (see Figure 3). However, one should emphasize that the added value of using the decision tree is the assigning of particular gminas to the CART leaf identifier, which enables the explanation of the classification results ( Figure 10). The hachure (Figure 11) indicates incorrectly classified gminas; there are 43 such gminas, representing 29.9%. This error arises from including only three explanatory variables in the decision tree; within individual decision classes, it assumes the values as in Table 6.   distribution of the CART results is similar to that in the source expert classification (see Figure 3). However, one should emphasize that the added value of using the decision tree is the assigning of particular gminas to the CART leaf identifier, which enables the explanation of the classification results ( Figure 10). The hachure (Figure 11) indicates incorrectly classified gminas; there are 43 such gminas, representing 29.9%. This error arises from including only three explanatory variables in the decision tree; within individual decision classes, it assumes the values as in Table 6.    Figure 11 shows the results of the classification carried out by the CART decision tree. Figure 10 indicates also the leaf number in the CART tree used to classify particular gminas. The spatial distribution of the CART results is similar to that in the source expert classification (see Figure 3). However, one should emphasize that the added value of using the decision tree is the assigning of particular gminas to the CART leaf identifier, which enables the explanation of the classification results ( Figure 10). The hachure ( Figure 11) indicates incorrectly classified gminas; there are 43 such gminas, representing 29.9%. This error arises from including only three explanatory variables in the decision tree; within individual decision classes, it assumes the values as in Table 6.  Figure 11. The results of CART classification and the prediction errors (source: own analysis).
The obtained results of the classification carried out by the complete CART decision tree indicate that: • entrepreneurship in a given gmina is high if the revenue from personal income tax per person in this unit is higher than EUR 192.1 (leaf 7); • entrepreneurship in a given gmina is average if the own revenue from PIT per capita in this unit is lower than EUR 102.8 and the distance from large companies is smaller Figure 11. The results of CART classification and the prediction errors (source: own analysis). A similar model using CART decision trees was developed for small enterprises (Figures 12 and 13). In this model, as in the classification of micro enterprises, the dominant explanatory variable is the gmina's own revenue from PIT per capita. The second independent variable indicates the significance of distances from large companies located in the Kujawsko-Pomorskie region with which MSMEs may cooperate. In this model, the CART tree was created using only two independent variables; thus, the resulting model is even more straightforward and interpretable than the model for the micro-enterprises. (Figure 13) than 15.7 km (leaf 4), or the revenue is higher than EUR 102.8 (although lower than EUR 182.1) (leaf 6); • entrepreneurship in a given gmina is low if the own revenue from PIT per capita in this unit is lower than EUR 103.0, and the distance from large companies is greater than 15.7 km (leaf 5).     The obtained results of the classification carried out by the complete CART decision tree indicate that: • entrepreneurship in a given gmina is high if the revenue from personal income tax per person in this unit is higher than EUR 192.1 (leaf 7); • entrepreneurship in a given gmina is average if the own revenue from PIT per capita in this unit is lower than EUR 102.8 and the distance from large companies is smaller than 15.7 km (leaf 4), or the revenue is higher than EUR 102.8 (although lower than EUR 182.1) (leaf 6); • entrepreneurship in a given gmina is low if the own revenue from PIT per capita in this unit is lower than EUR 103.0, and the distance from large companies is greater than 15.7 km (leaf 5). Figure 11 shows the results of the classification carried out by the CART decision tree for small enterprises. Thanks to the visualization of the results of spatial distribution, it is possible to compare and evaluate the classification results while pointing to the highest number of small enterprises predicted by the model in the central part of the region, between Bydgoszcz and Toruń.
The number of incorrectly classified gminas is 46, representing 31.9% (error distribution among classes- Table 6). The prediction results indicate a significant overestimation of the "medium" value gminas in relation to the source expert classification (92 gminas in relation to 68) and the underflow in relation to "low" values gminas number (23 in relation to 41). It is worth emphasizing that the CART classifier does not generate the so-called "gross" errors. None of the gminas defined as "low" in the expert classification were classified as "high" or vice versa. Finally, the model classifying gminas in relation to the number of medium-sized enterprises per 10,000 inhabitants was estimated. In this case, apart from the own revenue from PIT per capita, the key to explaining the dependent variable is the road density; built-up density; and the type of gmina-urban, urban-rural, or rural. (Figure 14)  The model results indicate that a low revenue from the PIT per capita translates into a low number of medium-sized enterprises in the gmina. On the other hand, a high revenue from PIT per capita (over EUR 112.8 per person) and the gmina's high level of urbanization (measured by the percentage of the built-up area above 7.2%) means a higher number of enterprises. (Figure 15)   The obtained results indicate a significant underestimation by the CART decision tree of the number of gminas with a high level of entrepreneurship development in medium-sized enterprises (17 compared to the 28 obtained in the expert classification) and an excess of gminas classified at the "medium" level (75 compared to 59). One should principally focus on the CART model's underestimation of entrepreneurship development in the central area of the voivodship near Bydgoszcz and the northern region (see Figure 11). The CART model based on a more significant number of explanatory variables-one that takes into account the road network density, distance from large companies in this voivodship, and the number of inhabitants of individual gminas-yields an almost 60% improvement in the quality of classification at the expense of doubling the complexity of the decision model.

Discussion
Out of a significant number of generalized regression models, the authors used three models-i.e., a multivariate linear regression model, spatially weighted regression models (spatial econometric models), and CART nonlinear classification trees. The use of diverse regression models, both parametric and non-parametric, would enable obtaining more varied results. The purpose of the conducted research is not to assess the adequacy or effectiveness of different models for generalized regression, but to analyze the development of MSME entrepreneurship in the Kujawsko-Pomorskie region. The obtained results indicate that it is possible to develop a relatively reliable model for estimating the level of entrepreneurship development, explaining even 2/3 of the variability in the phenomenon, using simple models of multivariate linear regression and several explanatory variables (for micro-enterprises). In some models, the spatial distribution of the phenomenon, through the spatial weight matrix, enables a several-percent increase in the value of the coefficient of determination, R 2 . It is possible to obtain similar results using non-parametric models-e.g., CART nonlinear regression models. This article is limited only to the use of such classification models, thus facilitating the creation of conceptually simple models that explain the spatial differentiation of entrepreneurship development in a voivodship. Thanks to CART classification trees, it was not only possible to extract the decision variables critical for understanding, but also to explicitly formulate the decision rules.
The conducted analyses show a significant correlation between the level of entrepreneurship development, measured by the number of enterprises per 10,000 inhabitants, and the gminas' own revenues from the PIT per capita.
Gmina's own revenue from the PIT per capita is a proxy for households' incomes at the local level and reflects the potential consumer demand. This result can indicate that the micro, small, and medium enterprises in the region belong to a relatively low hierarchy cluster focused on producing for local consumption, following the grouping proposed by [53]. They divide SMEs in developing countries into three groups. At the lowest tier are small companies that produce for local consumption. The medium-tier companies are better endowed (in capital and skills) and can generate an investible surplus and produce, either directly or on contract, for the domestic and, often, export markets. The third tier includes technically innovative firms that maintain high quality, capable of entering export markets and aspiring to grow. What also confirms the relatively low tier of development of this sector in the analyzed region is the lack of significance of variables related to the distance to technological parks or higher education institutions. Furthermore, the CART classification indicated that the distance to large companies is significant to the density of the micro and small companies at the local level, while it is not significant to medium enterprises. Thus, this confirms that local factors, such as the consumption needs and cooperation between these companies and large enterprises, determine the activity of micro and small enterprises.
Moreover, the authors observed a substantial significance of variables that essentially formalize the spatial context, such as, e.g., the percentage of built-up areas in gminas, the density of the road network, or distances to some structures. The presence of such variables in both models shows the importance of the spatial context to entrepreneurship development. This is essential because economic analyses usually do not include these variables in such a comprehensive manner. However, this study shows that it is necessary to take into account an area's spatial characteristics when determining the possibilities of entrepreneurship development. Therefore, one should consider the data enrichment of the standard data used in economics with information on the spatial context as an essential step in the construction of econometric models, especially since the models based solely on spatial variables used in this study explain between 30 and 45 percent of the variability in the number of enterprises.
Another way of considering the spatial nature of data is by using spatial econometrics instead of classical regression models. These models prove efficient when dealing with the autocorrelation of the spatial distribution of the residuals from the classical regression models (thus, the residuals are not randomly distributed). In this research, the only enterprises for which this phenomenon occurred were medium-sized enterprises, and spatial econometric models were constructed for them. This may be because the business activities of larger enterprises cover a larger area. Thus, their surroundings in neighboring gminas have a more significant impact on the businesses, as per the classification proposed by [53]; medium-sized enterprises belong to the group with a range beyond the local market. In all of the cases, it was better to use a matrix of weights based on the distances between gminas and not on their neighborhood, which confirms that space and distance are essential, not administrative divisions. This confirms analyses of the labor market areas in Poland that form beyond the administrative borders [40,54]. The use of spatial econometric models in place of classical regression models for medium-sized enterprises increased the R 2 coefficient and improved the model's quality of explaining the variable.

Conclusions
The results presented in the paper indicate that spatial development largely determines the local development of micro, small, and medium enterprises. The number of companies at the local level is strongly correlated to the population size. Therefore, the analysis focused on the relative number of companies compared to the population size.
The authors' analysis indicates that the type of entrepreneurship observed in the Kujawsko-Pomorskie region mainly focuses on meeting the demand of local consumers and large companies, as there are more enterprises in those localities where the personal income taxes paid are higher, but also in those with a smaller distance to large enterprises. Furthermore, the results show that the existence of technological parks or special economic zones does not have a significant role in the MSMEs' sector's current structure in the Kujawsko-Pomorskie region. This may indicate that micro, small, and medium companies focus on local consumption. The further development of this sector requires providing the companies with the capital and skills needed to produce for broader markets (both domestic and foreign) and, ultimately, towards a high level of technical innovation with a high potential for growth and international competition. Thus, regional policies should focus on providing access to financial instruments and investing in the current and future workforce's qualifications, including vocational or higher education that recognizes innovative skills.
The authors have also confirmed that the spatial context matters; models that include only geographical variables explain a large share of the variance related to the development of micro, small, and medium enterprises. In the case of micro-enterprises, it correlates with local factors, while in the case of medium-sized enterprises a more comprehensive (geographical) context is essential.
While the presented models do not indicate causality, they can direct the further monitoring of the MSME sector. The results show that there is a group of communities that enjoy a much higher or much lower enterprise development than is explained by the proposed models. These gminas should be further analyzed (using qualitative methods) to identify the factors that stimulate or hinder entrepreneurship development. This may inform the framing of regional policies focused on the development of the MSMEs in the Kujawsko-Pomorskie voivodship.
One should emphasize that conducting comprehensive analyses into the differentiation of entrepreneurship's spatial distribution in particular gminas of the Kujawsko-Pomorskie region required the integration of descriptive data collected by various institutions, as well as the consideration of spatially localized information. The use of the BDOT 10k Topographic Object Database and other spatial data sources (e.g., the location of special economic zones or technology parks) enabled data enrichment. The enrichment of source tabular data with spatial information on, e.g., the distance to the main cities of the region, highway exits, and large enterprises, enabled obtaining additional explanatory variables in generalized regression models. What is more, determining the distance and neighborhood matrices facilitated the creation of spatial econometric models and a more comprehensive explanation of the spatial variability of the phenomenon through a geographically weighted regression model.
The developed generalized regression models, as well as classification trees, have significant advantages over the frequently used non-parametric models. Multivariate linear regression models are straightforward to interpret and apply; they also allow the recipient to determine the impact of specific factors on the model intuitively. The CART classification trees used in this article enable the automatic extraction of decision rules that explain the hierarchical influence of particular predictors on the value of the dependent variable. The obtained results also show the analytical potential of a unified database that depicts entrepreneurship in the Kujawsko-Pomorskie voivodship, as well as the possibility for further data mining with other types of algorithms for generalized regression. This issue will be the subject of further research.
There are, of course, data limitations that need to be considered. As presented in Figure 1, there are outliers in the observed variables that affect the results, which is unavoidable in such analysis. There are also specific factors that affect the MSME development in particular localities (i.e., municipalities at the region border that are affected by centers localized outside the region), that cannot be explained by any type of regression model. Furthermore, in the analysis, the authors did not take into account additional characteristics of the MSMEs in the region, such as sectoral structure, which requires further investigation.
Finally, in the article the authors show that there is additional utility of the administrative data to investigate local characteristics of the MSME sector. The presented models can be extended further to other Polish regions but also to the wider context of CEE or European countries (provided that similar data exists). The authors show that the differences that are observed between regions in the CEE countries, as discussed in the literature, are also observed within regions, with more developed regional capitals and lagging behind peripheral municipalities. The strengthening of the MSME sector capacity and innovativeness requires designing policies that take into account such differences and designing MSME development strategies to the specificity of the local environment-both socio-economic and spatial.