In this section, we offer detailed information on the data and methods employed to address the research objective of unraveling the spatial interaction between planned settlements and small businesses, following the conceptual framework (
Figure 2). To accomplish this, we acquire and link spatially embedded, highly granular data from multiple sources and apply a suite of statistical techniques, namely spatial autocorrelation tests and subsequently, spatial regressions, taking into consideration the nature of the data.
2.1. Data
The analysis was underpinned by spatially embedded data on enterprises from Indonesia’s 2016 Economic Census, maintained by Statistics Indonesia/
Badan Pusat Statistik. The data capture a comprehensive list of enterprises across the country including neighborhood (or
kelurahan) locations, as well as quantity, size (micro, small, medium, and large) and type (i.e., food and beverages, retail). The neighborhood level at a village/
kelurahan resolution represents the lowest administrative geographical units in Indonesia and enables much finer and more detailed observation than found in previous district-level studies [
24,
28]. The 2016 data may seem slightly outdated, given that it is collected every 10 years, and the next iteration is likely to occur in 2026. However, it provides details and spatial resolution suitable for the purpose of this study.
The urban districts portion of the JMA as our study area comprised of 538 neighborhoods. The spatial distribution of MSMEs is represented in
Figure 3, showing the simulated dot density distribution (
Figure 3, left) and total number of MSMEs (
Figure 3, right) at a neighborhood level.
We assessed the relationship between planned settlements and MSMEs following three distinct categories of MSMEs: All MSMEs, F&B (Food and Beverage) MSMEs, and Retail Trade MSMEs. The F&B and retail trade represent the two largest enterprise sectors, out of the 18 enterprise sectors captured in the 2016 Economic Census data.
In assessing these associations, we account for 16 literature-driven variables, grouped into four domains: Land use, Built environment, Economic, and Demographic. The first domain, Land use, is hypothesized as influencing MSMEs’ spatial pattern following findings derived from the firms’ location choice literature [
23,
30]. In this study, the land use domain was derived primarily from the 2016 land use data for the Jakarta Metropolitan Area, which was developed as part of the second iteration of the Jabodetabek Urban Transportation Policy Integration (JUTPI) [
31]. The data capture 14 land use categories including planned settlements as the main indicator of interest, as well as other typical land use classifications such as commercial, industrial, agriculture,
kampong, and transportation infrastructure, among others. The spatial distribution of planned settlements, as the key variables of interest in this study, is depicted in
Figure 4.
The next domain, Built Environment, describes various aspects of urban morphology that have been found to influence the spatial patterns of enterprises [
23,
25,
30,
32]. In this study, this domain includes four variables: household density, land use entropy, street node density, and Connected Node Ratio (CNR). Household density was derived from the Village Potentials/
Potensi Desa (PODES) data. Land use entropy measures the degree of land use mixing based on the measure calculated as follows [
33,
34]:
where
is the proportion of land use
from the total land use area in the neighborhood
;
refers to the number of land use categories incorporated in the computation. An entropy measure ranges from 0 to 1 where 0 indicates that the neighborhood is occupied with only one land use category and 1 represents an equal share of each land use category in the neighborhood.
Street node density and CNR variables were used to unravel the relationship between street network configuration and the prevalence of MSMEs. These measures were derived from OpenStreetMap (OSM) and retrieved using a Python-based OSMNx package [
35]. Street node density is simply a measure of the total number of street nodes, including intersections and Cul-de-Sacs, divided by the neighborhood area, as applied in multiple studies [
36,
37]. CNR captures the ratio between all street intersections divided by the sum of the street intersections plus Cul-de-Sacs. Larger CNR values (maximum = 1.0) indicate higher street connectivity and less dead ends [
36,
37].
Drawing from previous studies aimed at disentangling the association between economic factors and enterprises’ spatial pattern [
23,
30], for the Economic domain three variables were derived: number of traditional markets, number of Traditional markets (30-min), and number of large enterprise employees. The 2016 Economic Census provides the number of employees working at large enterprises. Village-level number of traditional markets was derived from the PODES data. While, the cumulative accessibility to traditional markets within a 30-min time band is computed following methods derived from Palacios and El-Geneidy [
38], which follow earlier works by Hansen [
39] and Levinson and King [
40], as shown as follows:
where
represents cumulative accessibility from geographic boundary (e.g., neighborhood, census tract)
;
accounts for the number of amenities (
) at destination
;
represents the cost incurred to travel from
to
, typically quantified as travel time (e.g., minutes, hours); and
is an impedance function. This impedance function is specified by the following equation where the value would either be 1 or 0 depending on a threshold travel time
[
38,
41]:
The r5r package [
41] based on the R programming language [
42] in RStudio [
43], was used to compute cumulative accessibility (
) derived from travel time matrices between neighborhoods in the JMA. The resultant matrices were used to compute cumulative numbers of traditional markets, or other indicators, for that matter, from each neighborhood’s centroid that could be reached given a particular time threshold.
Following studies that posit the relative importance of demographic factors in shaping the spatial pattern of enterprises [
30], the final domain, Demographic, was composed of two variables: households in poverty (%) and the Gini index. These variables were derived from the 2015 SMERU Poverty and Livelihood Map of Indonesia [
44].
2.2. Statistical Analysis
For the statistical analyses, we first calculated the descriptive statistics for all variables. Next, and as outlined in the conceptual framework (
Figure 2), we assessed the spatial autocorrelation of the outcome variables, encompassing both the total number of MSMEs and sector-specific MSMEs. In the final step, we performed spatial regressions between the MSME combinations and the 16 explanatory variables. Our approach to empirically estimating the spatial interaction between planned settlements and MSMEs adopts the analytical methods developed by Anselin [
45], and later on by Burkey [
46].
The spatial autocorrelation analyses estimated whether MSMEs were distributed randomly across neighborhoods or whether there is an underlying spatial pattern that exhibit certain degrees of clustering [
45,
46,
47,
48]. Such spatial clustering was assessed through a Moran’s I test, as shown in the equation below, adopted from related studies [
47,
48,
49]:
where
and
refer to the spatial settings of
and
, correspondingly;
represents spatially defined weight given distinct geographic boundaries, which in this case is the 538 neighborhoods in the study area; and
is the unit of the said geographic boundaries. Values derived from Moran’s I test range from −1 to 1, with zero indicating a lack of spatial autocorrelation, positive values indicating spatial clustering, and negative value indicating the data are dispersed. Moran’s I was tested using the ‘spdep’ package [
50] in R. In addition, statistically significant spatial clusters were mapped with Moran’s I cluster analyses using ‘rgeoda’ package [
51].
Aside from offering valuable insights into spatial patterns, conducting tests for spatial clustering is critical for the next step. This is because conventional linear models, such as Ordinary Least Squares (OLS), are unsuitable when spatial clustering exists, and their use can introduce biases. In addition, because our assessment revealed that the dependent variables do not follow a Poisson distribution, we adopted methodologies that revolve around spatial regression analyses [
45,
46,
52].
To alleviate such issues and as an alternative to OLS, the Spatial Lag Model (SLM) and Spatial Error Model (SEM) [
53] were used. The SLM is expressed as follows:
where
represents the dependent variable,
captures the spatial autocorrelation parameter for the spatial weight matrix of
, in which a statistically significant value of
indicates the presence of spatial dependence;
refers to the matrix of explanatory variables and
is the regression coefficients; and
is a random error term. The SLM posits that the dependent variable is correlated with and influenced by neighboring locations [
54,
55].
SEM, in contrast, posits that the spatial autocorrelation is explained by the interaction between error terms across neighbors [
54,
55]. The SEM is expressed as follows:
where
is the spatial autocorrelation parameter, in which a statistically significant value suggests that certain factors are responsible for the presence of spatial autocorrelation between error terms;
is the spatial weight matrix; and
refers to the error terms.
In the spatial regressions, we applied a log transformation to the outcome variables, encompassing both all and sector-specific MSMEs, given their non-normally distributed nature. Another consideration stemming from this transformation is the aim of facilitating a more intuitive interpretation of the results.
The choice of which spatial regression approach to use was based on an assessment of parsimony conducted with Lagrange Multiplier (LM) tests [
52] with the ‘spdep’ package in R [
50]. The tests yield Lagrange multiplier and
p-values derived using “LM” and “robust LM” tests to compare between SEM and SLM. The preferred model is the one with higher LM values and greater statistical significance.
We also conducted two further tests contingent on whether SLM was the recommended model. The first test assesses the indirect and direct effects of planned settlements on MSMEs’ spatial patterns. The final test performed was a comparison between OLS, SEM, and SLM models to evaluate the sensitivity of applying models with and without spatial autocorrelation.