Analysis of Built-Up Areas of Small Polish Cities with the Use of Deep Learning and Geographically Weighted Regression

: Small cities are an important part of the settlement system, a link between rural areas and large cities. Although they perform important functions, research focuses on large cities and metropolises while marginalizing small cities, the study of which is of great importance to progress in social sciences, geography, and urban planning. The main goal of this paper was to verify the impact of selected socio-economic factors on the share of built-up areas in 665 small Polish cities in 2019. Data from the Database of Topographic Objects (BDOT), Sentinel-2 satellite imagery from 2015 and 2019, and Local Data Bank by Statistics Poland form 2019 were used in the research. A machine learning segmentation procedure was used to obtain the data on the occurrence of built-up areas. Hot Spot (Getis-Ord Gi* ) analysis and geographically weighted regression (GWR) was applied to explain spatially varying impact of factors related to population, spatial and economic development, and living standards on the share of built-up areas in the area of small cities. Signiﬁcant association was found between the population density and the share of built-up areas in the area of the cities studied. The inﬂuence of the other socio-economic factors examined, related to the spatial and economic development of the cities and the quality of life of the inhabitants, showed great regional variation. The results also indicated that the share of built-up areas in the area of the cities under study is a result of the conditions under which they were established and developed throughout their existence, and not only of the socio-economic factors affecting them at present.


Introduction
Small cities are an important part of settlement systems. They are the glue that binds the network of villages, medium-sized cities, and large cities together. Small cities influence the level of local and regional development. They perform some of the functions necessary for rural areas (administrative, educational, commercial, service, cultural, etc.) and also provide facilities for large cities [1][2][3][4]. The term "small" in relation to a settlement unit recognized as a city varies from region to region or country to country. In China, this term is used for cities under 500,000 [5] or 100,000 inhabitants [6], in the United States, it is under 50,000 [7], while in developing countries it is between 5,000 and 20,000 [8], with local definitions also varying [9]. In Poland, cities with fewer than 20,000 inhabitants are considered small. The smallest Polish city (Wyśmierzyce) had 894 inhabitants in 2019 [10].
Some shortcomings with regard to small city studies were pointed out by Bell and Jayne [8] and Demazière et al. [11]. They noted the focus of researchers on big cities and metropolises while marginalizing small cities, the study of which is of great importance to progress in social sciences, geography, and urban planning. These authors also set out the direction that should be taken in small city research. In the transnational context, these included the ways in which small cities were linked in the international division of labor, the globalization of finance, and investment. On the other hand, in a local context, studies of small cities should address population, social dynamics, governance structure, as well as their economic position and links with larger national centers. It is also important to understand the history, heritage, cultural life, values, and goals of small cities. The study of the physical structure of the landscapes and the spatial organization of small cities was mentioned as other much needed research directions.
Nowadays, sustainable development of small cities [12,13], smart cities [14], quality of life of inhabitants [15][16][17], demographic problems [18] are also discussed. Various indicators have been proposed to assess their sustainability: economic, social and environmental [19], historical [20,21], as well as aiming to implement the idea of smart sustainable cities. Studies of small cities are most often carried out on a regional or local scale, but there are also studies with a broader spatial scope, addressing the topic of small cities even on a continental basis [22][23][24].
Geographically Weighted Regression (GWR) was used to explain the influence of demographic, social, and economic conditions on the spatial variation of the share of built-up areas in the area of small cities in Poland. This method allows the estimation of local spatial patterns of the influence of different variables on the phenomenon under study and has been successfully applied in various urban studies [51]. Li et al. [46] used GWR to investigate the spatial heterogeneity of determinants of spatial structure fragmentation in 289 cities in China. A study of the spatial variability of the influence of different anthropogenic determinants of landscape fragmentation using GWR was conducted for Shenzhen City, Guangdong Province, China [52]. Ivajnšič et al. [50] used GWR for urban heat island (UHI) studies, to model the relationship between mean air temperature and influencing factors on the example of the small city of Ljutomer in Slovenia. Using GWR, Zhao et al. [49] examined the effects of five variables describing socio-demographic, economic, and spatial situations on the compactness of built-up areas in 160 Chinese cities. Royuela et al. [53] examined the impact of quality of life on urban sprawl. Using the example of Barcelona, Spain, they used GWR to assess how variables describing quality of life influence household location decisions. Bagan and Yamagata [54] analyzed the spatial-temporal expansion of urbanized areas in Japan between 1990 and 2006. They used GWR to model population density using the example of the city of Sapporo, Hokkaido, Japan. Noresah and Ruslan [55] used GWR to estimate the strength of the relationship between urban built-up areas and the factors, described by 20 variables, affecting their changes, for Sungai Petani city in Malaysia. This method was also used by Shariff et al. [56], to investigate the effects of variables describing environmental, physical, and socio-economic factors on urban land use change in Penang Island, Malaysia.
The history of Poland's small cities is varied and mostly dates back to the distant past. Some of the cities had important administrative functions in the past, which they lost over time to other more rapidly developing centers. This was particularly evident during the period of industrialization and the development of the transport network. Some of them are currently experiencing economic development and population growth, and many are now looking for new development factors [57][58][59][60]. For a large group of cities, stagnation or regression processes in the social, economic and spatial spheres are more characteristic [9,11,17,57,61]. The diversity and role of small cities in the settlement system make them a very interesting subject of research.
The main objective of the study was to undertake research on the impact of selected socio-economic factors on the share of built-up areas in small Polish cities and their spatial differentiation across the country. Based on literature review, various variables describing the determinants of the share of built-up land in the urban area were identified and used in the study of Polish small cities (Table 1).

Study Area
The study area included small Polish cities. The area of Poland is 322,000 km 2 . The capital and largest city is Warsaw. The population of Poland in 2019 was 38,253,955 [10], of which 60% lived in cities, and the number of administrative units defined as cities was 940. As many as 75% of these were small cities, i.e., those with fewer than 20,000 inhabitants. In the first two decades of the 21st century, the population of small cities in Poland fluctuated insignificantly, slightly exceeding 5 million people, which means that about 13% of Poles lived in small cities. Small cities can be found all over Poland and are quite evenly distributed, but there are areas where there are more of them, namely the central and southern parts of the country. In the study, small cities were defined as those units with a population of less than 20,000 inhabitants in 1999; a total of 665 cities were selected ( Figure 1).

Data Sources
The study used data from three sources. Data from the Local Data Bank of Statistics Poland, the Central Office of Geodesy and Cartography, and The European Space Agency (ESA) were used. Sentinel-2 images were downloaded from the Copernicus Open Access Hub provided by ESA (https://scihub.copernicus.eu (accessed on 13 May 2021)). The images used were from 2015 and 2019, acquired between 1 April and 30 September each year. A wide timespan was used in order to acquire cloudless images, although images with 1-3% cloud cover were also accepted. Four Sentinel-2 channels: 4 (red), 3 (green), 2 (blue), and 8 (near infrared), with a resolution (ground sample distance, GDS) of 10 m were combined into four-channel rasters. The main source of data on built-up areas up to date for 2015 was the Database of Topographic Objects from the resources of the Central Office of Geodesy and Cartography. The thematic scope of the Database of Topographic Objects includes three levels of accuracy. The database contains nine object classes divided into 57 categories containing 244 types of topographic objects. Five object types were used in the study, representing different types of building objects, belonging to the category of built-up areas included in the Land Cover object class. The data of built-up areas were obtained with an accuracy corresponding to a map at a scale of 1:10,000 with a minimum patch area of 0.1 ha. Socio-economic data were obtained from the Statistics Poland in MS Excel spreadsheet format. In Polish statistics, data for cities are collected and disseminated in the administrative division of Poland into municipalities. Some of the cities are separate municipalities, but a large group of them are parts of the so-called urban-rural municipalities. In such cases, much of the information is aggregated for the entire municipality, i.e., its rural and urban parts. For this reason, a limited amount of data exclusively applicable to small cities could be obtained for the study. Six variables were used, for which descriptive statistics were calculated (Table 2).

Methods of Analysis
The share of built-up areas cover within a given spatial unit area was calculated with the use of Sentinel-2 satellite images. In the first stage, a machine learning model was used to perform semantic segmentation and delineate built-up land cover. Then the obtained results were aggregated to the borders of the cities.
Semantic segmentation is a machine learning task of detecting a specific region of an image and assigning it a label to make this region distinguishable from different discovered regions and thus facilitating the process of image content interpretation [67]. Segmentation, in terms of the presented research, is a process of classifying pixels, originating from Sentinel-2 satellite images, into two categories representing built-up areas and other land cover types ( Figure 2). The mask in supervised machine learning classification was representing built-up areas created by residential and industrial buildings, warehouse, agricultural production buildings, etc., together with small areas and devices, functionally related to buildings-such as yards, squares, court-yards, passages, crossings, home playgrounds, etc. [68].
The machine learning pipeline was prepared to support DeepLabV3+ [69] model using the Xception [70] backbone. The model has been implemented in Python 3.7 using Tensorflow 2.2.0 [71] and Keras 2.3.1 [72] frameworks. DeepLabV3+ is a deep convolutional neural network architecture characterized by its outstanding capabilities to handle the problem of segmenting objects at multiple scales by designing its main modules to capture multi-scale context. The use of other machine learning model architectures was also considered. During preliminary studies, such solutions as U-Net [73], FPN [74], and PSPNet [75] were tested but they showed less effectiveness than DeepLabV3+.
The model was trained using a loss function being a sum of binary focal loss [74] and Jaccard index loss [76]. During the process of model training, only the 2015 dataset was utilized. The 2015 dataset was the only one containing the area of the whole country and success in its correct segmentation was necessary to achieve satisfactory results in the 2019 dataset. The 2015 dataset was split into three subsets. The first subset intended for training accounted for 90% of the dataset. Validation subset was formed using 5% of the dataset. The main purpose of the validation subset was measuring the performance during intermediate training steps and for early stopping. Test data used during final model evaluation accounted for 5% of each set for 2015 and 2019. The accuracy of the model was monitored using the intersection over union (IoU), f1-score, pixel-wise binary accuracy, precision, and recall metrics. Foreground to background threshold was set to 0.5, meaning probability map values above 0.5 will be treated as built-up areas. Summary of evaluation on all datasets is presented in Table 3. Score values were calculated in relation to the ground truth segmentation mask, which had some flaws the model had to deal with. Although, the dataset used to produce segmentation maps is the largest and most precise source of information on built-up areas available in Poland, it is not free from defects. During careful analysis, one can discover that in multiple areas the delineated regions are visibly under or oversegmented. These issues were mitigated by crafting a model that is able to generalize well-enough to make the segmentation smoother. Overall, the model achieved satisfactory results, which was also indicated by the results of the perceptual evaluation.   It should be noted that the limitation of using the machine learning method to obtain data on built-up areas is the occurrence of cloud cover in satellite images used as source materials. The presence of cloud cover may limit or prevent the recognition of objects. However, the selection of cloudless satellite images or the use of cloud removal techniques makes this limitation quite easy to overcome [77][78][79].
The spatial diversity of cities in terms of the share of built-up areas in their area was analyzed using Geographic Information Systems (GIS) tools available in ArcGIS 10.7 software. Hot spot analysis using the Getis-Ord Gi* statistic was applied first [80][81][82]. It allows for finding statistically significant clusters of high (hot spots) and low values (cold spots) and is expressed by the formula: where x j is the attribute value for feature j, w i,j is the spatial weight between feature i and j, n is equal to the total number of feature and: The Getis-Ord Gi* local statistic score for each trait in the data set is the z-score. The occurrence of statistically significant positive z-scores indicates a hot spot. The larger the z-score, the more intense the clustering of high values. For statistically significant negative z-scores, there is a cold spot, the lower the score, the more intense the clustering of low values.
The use of GWR in the study was preceded by Ordinary Least Square (OLS), which aimed to identify the global spatially continuous influence of demographic, social, and economic factors on the share of built-up areas in the surface of small cities in Poland [49,83]. The OLS model is expressed as follows: where y i is the observation of the dependent variable at location i, β 0 is a constant term, and β j measures the relationship between independent variable x ij and y for the set of i locations. ε i is the error associated with location i. GWR is an extension of global regression models and allows the estimation of local influences of demographic, social, and economic factors on the share of built-up area in small cities in Poland, the examination of spatial relationships between variables in the model and the identification of patterns [84][85][86]. The GWR model is expressed as follows: where y i is the dependent variable, i represents regions of the study area, (u i , v i ) denotes the location of i observed region, β j (u i , v i ) indicates the j regression parameter at the location of observation i, which is a function of the geographical position, x ij is the independent variable, and ε i is the random error of i region.
The results of GWR application were presented on a map showing the coefficient of determination describing the compatibility of local models with empirical observations and on maps estimating the values of the influence of local parameters of independent variables on the share of built-up area in the area of small cities in Poland and the significance of the identified impacts [87,88].
The procedure of data preparation and the processing steps were summarized in a schematic workflow given in Figure 3.

Share of Built-Up Areas in the Area of Small Cities in Poland in 2019
Based on the data obtained as a result of machine learning segmentation of Sentinel-2 satellite images, the acreage of build-up areas was calculated for each city (Figure 4). Built-up areas were related to the area of each city within the administrative borders and the share of built-up areas in the urban area was determined.

Hot Spots and Cold Spots of the Share of Built-Up Areas in the Area of Small Cities in Poland in 2019
The applied Hot Spot analysis (Getis-Ord Gi*) allowed the identification of clusters of cities with a statistically significant high and low share of built-up areas in their area ( Figure 6). There are two hot spots in the studied area, the first one in the central part of Mazowiecki region, in the area of Warsaw, the second, more extensive, covering Wielkopolski and Kujawsko-Pomorski regions. Cold spots occurred in three locations. The most extensive covered the southern part of Dolnośląski region, as well as Opolski andŚląski regions. The second was located on the border of three regions:Świętokrzyski, Mazowiecki, and Lubelski. The last cold spot was located in Podlaski region.

Global and Local Model of the Share of Built-Up Land in the Area of Small Cities in Poland
The global regression model (OLS) explaining the effect of selected variables on the share of built-up area in small cities in Poland in 2019 was evaluated ( Table 4). The coefficient of determination for the global model (R 2 ) reached 0.790, adjusted R 2 0.788 and AICc (Akaike's Information Criterion) equal to 863.971. This result can be regarded as satisfactory and pointing to significant associations between socio-economic variables and the share of built-up areas. All the more so, as four out of six variables were identified as statistically significant (p-value < 0.01). Positive associations were noted for population density (Pop_Density) and the emergence of new residential buildings (Buildings). This may be caused by the increase in the wealth of the inhabitants and the need to improve housing conditions or the local development policy favoring the creation of new built-up areas. On the other hand, in cities with a higher unemployment (Unemployment) rate, the share of built-up areas in the area of the surveyed units is smaller. Interestingly, a negative association between the share of built-up areas in the surface area of small cities was shown for domestic economic entities (Enterprises). For GWR, R 2 reached 0.868, adjusted R 2 0.843 and AICc 713.881. Which indicates that the estimation of local models based on this method more effectively describes the phenomenon under study. Higher R 2 values indicate a stronger explanatory power of the regression model, and a model with a lower AICc value indicates a better fit to the observed data. GWR with Fixed Kernel type was used in the study because it achieved a smaller AICc compared to Adaptive Kernel type with an AICc of 717.095. The accuracy of GWR-based estimates of the local influence of various demographic social and economic factors on the share of built-up area in small cities in Poland varies spatially (Figure 7). The best estimates of the share of built-up areas according to the components under study were obtained for cities located in the Zachodniopomorski, Lubuski, and Dolnośląski regions occupying the western part of the country, and for cities from the Opolski and Sląski regions located in the south of Poland. This also applied to cities located in the north-east of Poland in the Podlaski and partly in the Warmińsko-Mazurski regions.
GWR coefficients for socio-economic factors describing the spatial variation of the share of built-up areas in the area of small cities (Table 5)    Population density (Pop_Density) was the most important spatial stimulant of the share of built-up areas in the area of the cities studied ( Figure 8). The higher the population density, the greater the proportion of built-up areas. A significant relationship between population density and the share of built-up areas has been found for the whole of Poland. The strongest association with this variable was observed in the south of Poland in the following regions: Małopolski, Podkarpacki, andŚwiętokrzyski, and in the east of the country, in Lubelski and Podlaski. The weakest impact of this variable is seen in northern Poland in Pomorski, Kujawsko-Pomorski, and Warmińsko-Mazurski regions. As well as in the southern part of Dolnośląski and Opolski regions. These regions formed part of the cold spot of the share of built-up areas in the area of small cities ( Figure 6) and are regions with an unfavorable economic structure, threatened by depopulation, which consequently also adversely affects the share of built-up areas in the area of small towns located within their borders. A significant positive association was also found between the share of built-up areas in the area of small cities was also found in the number of new residential buildings put into use (Buildings) for a prevailing part of Poland, with the exception ofŚląski, Małopolski, part of Lubelski, and Wielkopolski and Lubuski regions. This association was strongest in the northern part of Poland, in Pomorski and Zachodniopomorski regions, and partially in Warmińsko-Mazurski and Podlaski (Figure 9). This is an expected relationship, the construction of new buildings is an obvious cause of increasing the share of built-up areas in the area of the surveyed units. However, the lack of significance of this factor for the regions indicated above may mean that the buildings in the cities located there are relatively old and their large share in the area of the cities under study has been shaped by historical processes. This is particularly evident in the case of Wielkopolski region, where a hot spot of the share of built-up areas in the area of small cities is located (Figure 6).  Figure 10). This association was mostly negative, meaning that the lower the unemployment rate, the higher the share of built-up areas in the area of the surveyed cities. The influence of this factor was strongest in southern Poland, in a belt running from the southern part of Lubuski, Dolnośląski, Opolski regions to the southern part ofŚląski. The association with this variable can be seen in the occurrence of hot spots of the share of built-up areas in Wielkopolski and Mazowiecki regions (Figure 6), which were the areas with the lowest unemployment rate in Poland in 2019. It can as well be inferred that it has a partial influence on the occurrence of cold spots of the share of built-up areas in the area of small cities visible in the south of Poland, especially in Opolski and partly in Dolnośląski, which are regions at risk of depopulation and less development with higher unemployment. Issues related to the share of working-age population in the total population (Work_Pop) are a predominantly negative factor associated with the share of built-up areas in the area of the surveyed cities. The smaller the share of the population of working age, the larger the share of built-up areas in the area of small cities. However, the impact of this variable is significant only for the central and northern part of Mazowiecki and the western part of Podlaski region (Figure 11). Although the impact of this variable is very limited, it can be concluded that it is related to the influence of the capital city of Warsaw, whose developed labor market attracts people of working age, especially from small cities located in its immediate and near vicinity.
Another variable-the number of newly registered domestic economic entities in the REGON register (Enterprises) had a significant association with the share of built-up areas in the area of small cities in the majority of the country, except for the regions located in the south of Poland: Opolski,Śląski, Małopolski, Podkarpacki, and partly Łódzki, Swiętokrzyski, and Lubelski ( Figure 12). The impact of this factor is overwhelmingly negative. The smaller the number of enterprises, the greater was the share of built-up areas in the area of the surveyed cities. This influence was strongest in the regions of western and northern Poland. From Dolnośląski region to the north through Zachodniopomorski to Podlaski region to the east. It can be assumed that buildings in these cities were mainly residential and newly registered domestic economic entities, e.g., services, were located in owners' place of residence.   The use of combined deep learning, Hot Spot analysis (Getis-Ord Gi*), and GWR methods allowed to identify the significance and impact of individual determinants. It was found that population density and the construction of new residential buildings had the greatest influence on the share of built-up areas in the area of the surveyed cities. The results also indicated that the share of built-up areas in the area of the cities under study is a result of the conditions under which they were established and developed, and not only of the socio-economic factors affecting them at present.

Discussion and Future Directions
The fact that cities are built up is not debatable, but the regional variation of the built-up areas suggests a need to examine the reasons and factors influencing this fact. The variables associated with the share of built-up areas in small cities in a national perspective have not yet been considered by researchers. A certain problem in determining the built-up areas in all cities in a given year is the lack of available up-to-date geodetic data sources. This is made possible by information extracted from satellite images [35,[89][90][91]. The aim of this study was to investigate the spatial differentiation of the share of built-up areas in the area of small cities in Poland, as well as to search for factors influencing this differentiation. Sentinel-2 satellite images and data from the Statistics Poland were used for this purpose.
The paper is not able to make any causal claims, but the obtained results allowed for the identification of some statistically significant relationships between the studied variables. The scheme of the adopted research procedure has already been successfully applied in urban studies [49,50,63] The method of Hot Spot Analysis (Getis-Ord Gi*) was used to determine regional variations in the share of hot-spots and cold-spots in builtup areas (Figure 4). The GWR method was applied and it was found that the variables explaining the spatial variation in the share of built-up areas are regionally differentiated and have positive, negative, or mixed associations with the variable under study.
The variables used in the research were also applied in the other studies of small towns in Poland. The population related variable used most frequently was the number of inhabitants, population density, but also population dynamics and migrations. These variables often were the base for the selection of cities taken into account in the research, they were used to study the development of small towns and create a typology of settlement units [18,57,62,64]. Variables allowing to describe the economic activity of inhabitants of small cities in Poland were also often used. Data on professional activity, unemployment level, working age population, number of enterprises, or employment in a specific sector of the economy, e.g., in services, were used to study the development of small cities and the role they play in the country's settlement system [1,57,64,92]. Variables related to the spatial development of small cities, such as changes in the administrative area of cities, share of urbanized area of the town, changes in the number of flats and houses, and infrastructure are used less frequently and mainly in the field of small town development [57,62]. The greatest differentiation occurs among the variables describing the living standards of small city residents. This issue can be described in various ways, often also through the subjective opinions of the inhabitants; however, the most frequently used in research are: the level of infrastructure development and access to services related to education, health, trade, and culture [1,16,92]. Positive associations with the share of built-up areas in small Polish cities have been detected for population density (Figure 8) and the number of new residential buildings (Figure 9). Negative relationships have been found for three socio-economic variables: the share of the registered unemployed in the working-age population (Figure 10), the share of working-age population in the total population (Figure 11), and-which was quite surprising-the number of newly registered domestic economic entities ( Figure 12). The quality of life of inhabitants of small towns were determined with the use of the variable showing the share of dwellings equipped with facilities (bathrooms) in the total number of dwellings. This variable showed notable spatial differentiation and various associations (Figure 13). In the Mazowiecki region around Warsaw, it had a positive association with the share of built-up areas, but in Wielkopolski region the relationship was opposite. This indicates differences in the equipment of buildings forming built-up areas in the cities of these regions.
The inclusion of a temporal variable can not only improve inference but also reveal new relationships. In future research, the research area can be narrowed down to a single city or a group of cities, but it can also be extended to include a network of cities in Poland's neighboring countries. Sentinel-2 satellite images offer this possibility. The issue to be resolved will be the selection of variables available for all the countries studied.
Unfortunately, a major barrier is the lack of availability of statistical data. In the databases of Statistics Poland as well as Eurostat-European Statistics, most data are aggregated to the level of municipalities, which affects limited possibilities of statistical data analysis for small cities.
The presented research scheme may be an inspiration to undertake similar research. Capturing the relationship between the share of built-up areas and socio-economic factors requires further research. The proportion of built-up areas in cities is the result of longstanding processes and urban morphology. Most of the cities studied have medieval origins, but there are also those established during the industrialization of Poland, as well as tourist and spa towns, and others [93]. They play different roles in the Polish settlement system [1,92,[94][95][96]. More and more towns in the suburban areas of large cities are growing rapidly in terms of population and built-up areas. As a consequence of this development, they will obtain city rights and change the urban settlement network of Poland. Others are going through economic stagnation or regression, with population declining and builtup areas remaining unchanged. It is worth continuing research with new variables and also their dynamics [97]. New variables can also be of a qualitative nature related to: the origins of cities, geographical location (lowlands, highlands, mountains), proximity to roads, transport hubs, ports, and large cities. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://bdl.stat.gov.pl/BDL, http://www.gugik.gov.pl/pzgik, https://scihub. copernicus.eu (accessed on 13 May 2021).