Different Ways Ambient and Immobile Population Distributions Inﬂuence Urban Crime Patterns

: The article aims to propose a new way of estimating the ambient and immobile urban population using geotagged tweets and age structure, and to test how they are related to urban crime patterns. Using geotagged tweets and age structure data in 37 neighborhoods of Szczecin, Poland, we analyzed the following crime types that occurred during 2015–2017: burglary in commercial buildings, drug crime, ﬁght and battery, property damage, and theft. Using negative binomial regression models, we found a positive correlation between the size of the ambient population and all investigated crime types. Additionally, neighborhoods with more immobile populations (younger than 16 or older than 65) tend to experience more commercial burglaries, but not other crime types. This may be related to the urban structure of Szczecin, Poland. Neighborhoods with higher rates of poverty and unemployment tend to experience more commercial burglaries, drug problems, property damage, and thefts. Additionally, the count of liquor stores is positively related to drug crime, ﬁght-battery, and theft. This article suggests that the age structure of the population has an inﬂuence on the distribution of crime, thus it is necessary to tailor crime prevention strategies for different areas of the city.


Introduction
Each crime incident is geographic in nature. It is committed at a certain place and time. The offender usually comes to the place of his/her commission from other places, and he/she also has a permanent or the most frequent place of residence, e.g., a place of residence, work, or study [1]. These places may be identical or adjacent to each other. Time and place play fundamental roles in understanding and explaining crime patterns. The spatiotemporal differentiation is one of the distinctions of crime pattern studies and a premise for research on this issue. In order to study crime patterns, various efforts have been made by scholars not only in criminology, but also geography, sociology, and social sciences at large. Geography of crime in the geography field and environmental criminology (also known as crime science) in the criminology field specifically focus on the location of crime incidents.
Theories from environmental criminology are helpful in understanding temporal and spatial crime patterns. The routine activity theory argues that crime is the result of the direct contact of three elements in space and time: a motivated offender, a suitable target, and the absence of the capable guardian [2,3]. It suggests that the people's routine activities contribute to possible intersections in time and space with a potential offender, and when it happens, the probability of crime increases drastically [2].
Crime pattern theory also shows that the distribution of population in time and space is important to crime pattern studies. This theory suggests that the locations of crime incidents are not accidental, rather, they have clear spatio-temporal patterns. This theory highlights the activity spaces and routes connecting them. These activity spaces include (1) nodes-e.g., residences, shopping centers, workplaces, schools, recreation and entertainment areas, places to meet friends, etc., (2) paths connecting these nodes, and (3) edges, which are boundaries dividing areas with various forms of management, rulership, or functions. Moving along the paths between the nodes, the criminal's awareness spaces are created. The space of action is reflected in the offender's consciousness in the form of a cognitive space map. According to this theory, a motivated offender has contact with a relatively small part of the city areas in the course of routine everyday activities. From perceived and realized nodes, paths, and edges, he/she selects appropriate objects or victims of crime in a multi-stage decision-making process. The distribution of crime in a city depends on its spatial structure, transport system, and street networks, and is shaped by the distribution of crime generators, attractors, and detractors [1,4]. Moreover, the city crime problem usually concentrates in relatively small areas, as the Law of Crime Concentration and Iron Law of Troublesome Places suggest [5][6][7][8][9]. In many cities, the downtown area or central business districts would fall in this category because of the concentration of population and opportunities.
Thus, the problem of obtaining information about the real-time or close-to-real-time location of the population is one of the key pieces of information in crime pattern research. Publicly available census information does not reflect the actual whereabouts of the population as it does not take into account the phenomenon of daily movements of the citizens [10]. As many studies have pointed out, increased mobility and mobile range can change the dynamic distribution of the population, which in turn, influences the crime pattern [11][12][13][14][15]. Such a population measure issue can be even more acute when using it to measure crime and other socioeconomic factors. One frequently used crime indicator is the crime rate, which is calculated by dividing the number of recorded crimes in a certain area by the population of that area. Similar indicators are practically used in various socioeconomic phenomena studies, e.g., GDP per capita, poverty rate, unemployment rate, etc. However, there are also problems when determining the actual residing population. For large areas, such as countries or regions, census data are generally taken every 10 years. Such censuses are not always available in all regions, and the census data can become obsolete because of the long measure gap, which could distort the results. Data from the current population registration kept by the government are commonly used in such studies, especially in European countries. Their credibility depends on the quality of administration and the provisions of the registration obligation. Considerable migratory movements, especially periodic residents like college students, tourists, and those on business trips, further make it difficult to determine the actual residing population. In countries with weak administration and a low level of civil discipline, such registers are not kept, and the count of actually residing population can only be estimated.
The above-stated data quality concerns may also influence smaller areas, such as cities and their neighborhoods. Thus, the emerging measures which can capture the high mobility of residents are getting more attention: yearly (holidays and other longer periods without work), weekly (weekdays/weekends), and even daily (day/night and hours of day or night) population distribution measures become available. The earliest attempts to determine the number of people during the day and at night for individual districts were conducted as follows: the nighttime population was taken based on the registered population in the district. While the daytime population is the combination of the number of newcomers during the day (based on the number of jobs, services, education, places in hotels, and the load on public transport/car traffic) and the registered population in each district. Such estimates have been used in many studies on the spatial differentiation of crime and other phenomena in cities [16][17][18][19].
Further, capturing the location of individuals has recently become possible due to the rapid expansion of social media platforms like Twitter. Twitter posts with specific spatial information, also known as geotagged tweets, have been found to be useful in indicating the real distribution of the population [11][12][13][14]. This makes it possible to predict people's behavior by analyzing the location of tweets [20] and verifying the theory of human behavior [21]. Tweets also offer a possibility of conducting large-scale research, where both local and non-local communities may participate in the data collection and brainstorming processes [22]. The analysis of tweets' contexts makes it possible to understand the users' age and profession based on the evidence of their hobbies, expenses, and leisure locations [23]. Researchers interested in dynamic population patterns also conduct research based on tweets because, in addition to the above-mentioned information, tweets may also contain real-time location information [24].
Geotagged tweets contain information about people's actual location at a given time. This opens new possibilities for geography of crime studies, e.g., predicting future crime patterns [25,26]; preventing crime in short-and medium-term periods [27]; detecting the ambient population for crime analysis [12,[28][29][30]; analyzing crime's intraday variations and the spillover effect of the ambient population [31]; detecting emerging crimes, traffic accidents, emergencies, and hazards, etc. [32,33]; studying the importance of Twitter as a platform for the crime news dissemination [34]; assessing how social media influence the number of different types of crime [35]; helping people to report suspicious activities or crimes [36]; providing security alerts for the real-time detection of phishing tweets and security alert proposal [37]; assessing how major events influence crime patterns in cities [13,38]; studying crime related to prejudice or intolerance towards the issues related to national origin, sentiment, religion, race, etc. [39][40][41]. Gerber (2014) found that the addition of Twitter-derived variables improves prediction performance for 19/25 crime types more than the model of solely historical crimes [25]. This study demonstrates the benefits of tweets for crime prediction. Bendler and colleagues (2014) used the amount of point of interest (POI) as the independent variable to simulate crime patterns in San Francisco and added the count of tweets into the model [35]. Ristea and colleagues (2017) used tweets count as an explanatory variable to test the crime-tweets relationship [42]. Lan and colleagues had done serial studies to test the crime-tweets relationship and suggested the reliability of tweets as a feasible dynamic population measure [12,31,43]. Hipp and colleagues (2018) also found that tweets can help explain the presence of crime in California [11].
However, as suggested by previous studies, geotagged tweets generally show the dynamic distribution of the population with higher mobility as they are the major Twitter users [11,23,[44][45][46][47][48][49][50]. Consequently, only using geotagged tweets to study crime patterns may overlook the population with limited mobility, e.g., those who are very young (younger than 16) and are elderly (older than 65). This study fills this gap by using two data sources to include both the ambient and immobile populations in crime analysis. The first source of data is geotagged tweets, as a measure of the ambient population who have increased mobility during the day as they commute and move a lot (Group 1) [23]. The second source of data is census data which are used to locate the immobile social groups (not tweet as much) such as children (age ≤ 15) and the elderly (age > 65), who usually stay in their residential neighborhoods (Group 2). As routine activity theory argues, crime happens when a motivated offender meets a suitable target/victim at space and time, while no capable guardian is onsite [2]. Crime pattern theory also emphasizes space's importance by arguing that the overlapped activity spaces of both offender and victims are riskier [1]. Therefore, both the mobile population (Group 1) and the immobile population (Group 2) are possibly involved in crime in their residential neighborhoods; additionally, the mobile population (Group 1) are also possibly involved in crime in the neighborhoods they frequent and visit [11][12][13]31]. Thus, we feel it is necessary to fill this gap and consider both groups to assess different ways ambient and immobile population distributions influence urban crime patterns.

Research Purpose and Questions
Our goal is to provide information on the real location of the population by combining the location information of the mobile and immobile social groups in a city. More importantly, we want to check which social group, mobile or immobile, is more vulnerable to crime and to what types of crime. In order to achieve these goals, in this study we provide answers to the following research questions: (1) How can one estimate the size of the ambient population in individual neighborhoods of a large city? (2) What is the relationship between the size of the ambient population (Group 1) and different types of crime, with socioeconomic characteristics controlled? (3) What is the relationship between the size of the immobile population (Group 2) and different types of crime, with socioeconomic characteristics controlled?
These questions reflect the level of originality and the novelty of the approach used in this study. It is crucial to distinguish between two groups of the population, with varying degrees of risk of crime: mobile and immobile. This was done using two different estimation methods: the number of tweeds posted in a given neighborhood, which is rarely used in the region. On this basis, the ambient population was estimated. Another innovation is taking into account not only ambient and immobile population relations in the research, but also socioeconomic characteristics that are important for crime opportunities. It is also worth emphasizing the significant number of crime types being analyzed.

Data and Methodology
We collected 3-year crime data in Szczecin, Poland (2015-2017), including (1) burglary in commercial buildings, (2) drug crime, (3) fight and battery, (4) property damage, and (5) theft. Following established practices in the field, we combined 3 years of data to overcome the limitation of the small number of incidents in each year [51][52][53][54]. This is acceptable due to the spatial distribution of crime in a city being stable over years [5,53,55]. Szczecin is the capital of the West Pomeranian Voivodeship in northwestern Poland. It is a major seaport as it is near the Baltic Sea and the German border. In 2015, it had a population of 404,712, and this number slightly declined to 404,000 in 2017. The size of the city is 301 km 2 , and it is composed of 37 neighborhoods. The city lies on the delta of the Oder River and is known as the "Paris of the North", because of the characteristic star-shaped layout of streets and squares [56].
Burglaries in commercial buildings (2228) are distributed throughout the inhabited area of the city (Figure 1). They are mostly concentrated in the downtown area (the centralwestern area just west of the Oder River), which results from the fact that there are many, usually small, commercial and busy facilities in this area. They are usually located on the ground floors of mix-used buildings and in small pavilions which are typically easy to break into because of the lack of security measures. Large commercial and service centers, of which there are about ten in Szczecin, are well guarded by closed-circuit television (CCTV) and security measures, so they are not frequent targets of commercial burglaries. Such information comes from local law enforcement officers and scholars. Drug crime incidents (2060) occur throughout the entire inhabited area of the city but are clearly concentrated in the central part of downtown and an area north of it known for its high overall crime rate ( Figure 2). Drug crime incidents (2060) occur throughout the entire inhabited area of the city but are clearly concentrated in the central part of downtown and an area north of it known for its high overall crime rate (Figure 2).
Fights and batteries (1709) have a similar spatial distribution. A high concentration occurs in the central part of downtown. This is related to the fact that there is a significant concentration of alcohol distribution points in this area (catering outlets, alcohol shops, grocery stores, etc.). As local residents and law enforcement officers indicate, alcohol consumption strongly influences this type of crime. Apart from that, the downtown area has a sizable number of people who are known to be bad-tempered (Figure 3). Fights and batteries (1709) have a similar spatial distribution. A high concentration occurs in the central part of downtown. This is related to the fact that there is a significant concentration of alcohol distribution points in this area (catering outlets, alcohol shops, grocery stores, etc.). As local residents and law enforcement officers indicate, alcohol consumption strongly influences this type of crime. Apart from that, the downtown area has a sizable number of people who are known to be bad-tempered (Figure 3). Property damages (1844) have a similar distribution too, which results from the conditions of the population described above. The concentration of this crime type in the downtown area is even more obvious (Figure 4). Property damages (1844) have a similar distribution too, which results from the conditions of the population described above. The concentration of this crime type in the downtown area is even more obvious (Figure 4).
The most common type of crime-theft (10,100)-is almost everywhere across the entire inhabited areas of the city, and the concentration of these crime incidents in the downtown area is not as dominant. This is due to the fact that the victims of the theft are individuals, and they can be mobile ( Figure 5). The most common type of crime-theft (10,100)-is almost everywhere across the entire inhabited areas of the city, and the concentration of these crime incidents in the downtown area is not as dominant. This is due to the fact that the victims of the theft are individuals, and they can be mobile ( Figure 5).  We collected tweets from 2015 to 2017 in the same city through TweetScraper, a Python script built on Scrapy without using Twitter's APIs [57]. Scrapy is an open-source and collaborative framework for extracting data from websites [58]. According to the tool description, TweetScraper mimics the Tweet Search on a web browser and can bypass the Twitter API's 1% limit, enabling it to crawl all publicly available tweets. Clearly, the spa- We collected tweets from 2015 to 2017 in the same city through TweetScraper, a Python script built on Scrapy without using Twitter's APIs [57]. Scrapy is an open-source and collaborative framework for extracting data from websites [58]. According to the tool description, TweetScraper mimics the Tweet Search on a web browser and can bypass the Twitter API's 1% limit, enabling it to crawl all publicly available tweets. Clearly, the spatial distribution of tweets ( Figure 6) is remarkably similar to the aforementioned crime types, especially thefts ( Figure 5). We use the neighborhood of the city as the unit of analysis, and there are 37 neighborhoods in total. The neighborhood is the finest unit of sociodemographic data which is available in this city. Therefore, though we can collect point-level crime data and tweets, we have to aggregate them to the neighborhood level to compare them with age structure and other sociodemographic factors. The dependent variables are the counts of each crime type. The independent variables are twofold: (1) tweet count as an ambient population measure [11,12,14,31,43], and (2) young (≤15) and elderly (>65) population as a measure of people with limited mobility, hereafter referred to as the immobile population [59]. The control socioeconomic variables are population density, population assisted by the Municipal Family Assistance Center, unemployed population, demographic load index (an age-structure indicator calculating the percentage of non-productive age population), and count of liquor stores in 2015 [52,[60][61][62][63][64][65]. Table 1 shows descriptive statistics of variables. Five dependent variables are counts of burglary in commercial buildings (commercial burglaries), drug crime, fight and battery (assault), property damage (vandalism), and theft. Two major independent variables are the tweets-derived ambient population and the immobile population (population under age 16 or older than 65). Additional control variables are population density (a measure of population concentration), population assisted by the municipal family assistance center (people in poverty), unemployed population (people who are not employed), demographic load index (shows the burden to the society by the unproductive population), and the count of liquor stores (known to be related to many violent and property crime types) [52,[60][61][62][63][64][65]. The control variables in this study are additional independent variables that are not of direct interest to this study's goals but are controlled because they have We use the neighborhood of the city as the unit of analysis, and there are 37 neighborhoods in total. The neighborhood is the finest unit of sociodemographic data which is available in this city. Therefore, though we can collect point-level crime data and tweets, we have to aggregate them to the neighborhood level to compare them with age structure and other sociodemographic factors. The dependent variables are the counts of each crime type. The independent variables are twofold: (1) tweet count as an ambient population measure [11,12,14,31,43], and (2) young (≤15) and elderly (>65) population as a measure of people with limited mobility, hereafter referred to as the immobile population [59]. The control socioeconomic variables are population density, population assisted by the Municipal Family Assistance Center, unemployed population, demographic load index (an age-structure indicator calculating the percentage of non-productive age population), and count of liquor stores in 2015 [52,[60][61][62][63][64][65]. Table 1 shows descriptive statistics of variables. Five dependent variables are counts of burglary in commercial buildings (commercial burglaries), drug crime, fight and battery (assault), property damage (vandalism), and theft. Two major independent variables are the tweets-derived ambient population and the immobile population (population under age 16 or older than 65). Additional control variables are population density (a measure of population concentration), population assisted by the municipal family assistance center (people in poverty), unemployed population (people who are not employed), demographic load index (shows the burden to the society by the unproductive population), and the count of liquor stores (known to be related to many violent and property crime types) [52,[60][61][62][63][64][65]. The control variables in this study are additional independent variables that are not of direct interest to this study's goals but are controlled because they have been tested by previous studies to influence crime patterns. We include them to make sure the crimetweets relationship and the crime-age structure relationship are not accidental. For each of these five dependent variables, two models are fit, one with the ambient population (tweets) and controls, another with the immobile population (≤15 or >65) and controls. Consequently, 2 × 5 = 10 models are fit. To use a traditional linear regression model, the dependent variable needs to follow the normal distribution. However, as Law of Crime Concentration and Iron Law of Troublesome Places suggest: few places are responsible for most of the crime, and most places do not experience any crime, so the distribution of crime is always skewed [5,6,8,9]. As evidenced in Figures 1-5, crime patterns in Szczecin are also clustered. This clearly violates the basic assumption of linear regression; thus, Poisson or negative binomial regression should be used [66][67][68]. The Poisson regression model may be used for count data like crime incidents. However, to use Poisson regression, the dependent variable's mean needs to be equal to the variance, which is often not satisfied in crime data. Therefore, following the general practice in crime studies, we use the negative binomial regression model to assess the relationship between crime and the ambient population versus the immobile population (≤15 and >65). The negative binomial regression model has been widely used in criminology studies because it does not assume homogeneity of variance and does not require normal distribution of the dependent variable [67]. The unit of analysis is the neighborhood (N = 37). The negative binomial regression model is: where NB stands for negative binominal, y i is the crime count in the ith (i = 1, . . . , n) neighborhood, x ik is the kth explanatory variable for neighborhood i, β k (k = 0, 1, . . . , p) are the coefficients, and α is the parameter of overdispersion [68]. Table 2 shows the results of negative binomial models for five different crime types. As specified in the methodology section, each of these five crime types is fit with two different models: one with the ambient population (tweets), and one with the immobile population (≤15 or >65). All models also contain necessary control socioeconomic variables, and the sample size of each model is 37. The tweet-based ambient population measure has a positively significant relationship with all five crime types: commercial burglary (β = 0.53, p < 0.05), drug crime (β = 0.44, p < 0.01), fight-battery (β = 0.34, p < 0.01), property damage (β = 0.36, p < 0.05), and theft (β = 0.09, p < 0.01). This means that if the neighborhood has more ambient population, the risks of these crimes tend to be higher. These five types of crime are all closely related to the dynamic distribution of population on and near streets: commercial burglary happens in commercial facilities and commercial facilities where people frequent [69]; drug crime (including distributing, dispensing, possessing, dealing drugs), fight-battery, property damage (vandalism), and thefts tend to happen on streets and near streets where people visit [12,[70][71][72][73]. The immobile population (≤15 or >65), however, is only significantly related to commercial burglaries in a positive manner. A probable reason for this is in Szczecin, residential and commercial structures are highly mixed, and neighborhoods have more residential dwellings and commercial facilities. In addition, the less mobile populations tend to use only shops and service establishments close to their place of residence. Coincidently, offenders of commercial burglaries are also mainly residents of a given area, and they tend to target small premises that are poorly protected [74,75].

Results and Discussion
Regarding the control variables, population density is negatively related to commercial burglaries in Szczecin. This may suggest that areas with smaller population density experience more commercial burglaries. A potential explanation is that burglars tend to avoid super busy areas as the risk of being identified is higher because more people are nearby, and commercial facilities in those areas typically utilize more security measures to prevent burglaries [76,77]. Areas with more population in poverty and unemployed tend to experience more commercial burglaries, drug problems, property damage, and thefts as suggested in previous studies [70,[78][79][80]. Additionally, the count of liquor stores is positively related to drug crime, fight-battery, and theft, which parallels well with existing literature [81][82][83].
It is necessary to acknowledge the limitations of social media data like tweets. As previous studies have discovered, tweets may not represent the whole picture of the census population in an area. This is because Twitter users are relatively young, urban residents, and mobile users [44,48,49]. Additionally, only 15% of internet-using adults use Twitter, and most of them are young adults [84,85]. Thus, analyses based on tweets should not overstate claims about the representativeness of the data. Nevertheless, these limitations do not undermine the discoveries of this study, as we use the tweet count as a measure for the population who are more mobile. Results suggest that the distribution of mobile and immobile populations is correlated with the patterns of different crime types. Due to data limitations, we are only able to conduct this study in Szczecin, Poland. More tests should be done in other European countries, or even other parts of the world in order to further improve the discipline and advance knowledge. Further, we use the neighborhood as the study unit because that is the finest unit of sociodemographic data we could obtain in this area. As suggested by a previous study, the study unit size may influence the results due to the modifiable areal unit problem (MAUP) [31]. Thus, if data are available, more tests may be needed at even finer spatial units, e.g., track, block group, and blocks, in order to see whether such relationships persist across different spatial scales.

Conclusions
The main goal of the article has been achieved. The adopted method of estimating the ambient population based on geotagged tweets turned out to be effective, justified, and providing of fruitful research, as suggested in various previous studies. The same applies to the estimation of the immobile population from age structure. Using both mobile and immobile social groups creates a new perspective for researching crime in cities. Utilizing negative binomial regression models, a positive correlation is found between the size of the ambient population and all investigated crime types (burglary in commercial buildings, drug crime, fight and battery, property damage, and theft). While the distribution of immobile social groups is only related to commercial burglaries. Areas with more population in poverty and unemployment tend to experience more commercial burglaries, drug problems, property damage, and thefts. Additionally, the count of liquor stores is positively related to drug crime, fight-battery, and theft. As found in this study, the residents' age structure may influence crime patterns in the city. Thus, when analyzing and preventing crime, the dynamic distribution of people with different mobile abilities needs to be considered.
As discussed in the earlier section, more tests in other regions and at different spatial scales are recommended. Comparison studies among different cities across countries can also be beneficial using the same method. This procedure, commonly found in experimental sciences, is seldom used in the social sciences. However, it should be taken into account that in different socioeconomic conditions, the control variables may have different meanings and require different interpretations. The overall level of crime in a country or region is also relevant in such surveys. In Poland it is relatively low, in Szczecin it is average compared to the rest of the country, and the most serious crimes with the use of weapons and murders are practically isolated cases.
The assumption that age structure and mobile ability need to be systematically considered in crime analysis studies should also be verified in other cities, but with a similar population age structure, level, and lifestyle. In more affluent countries, young people and the elderly are generally more mobile thanks to individual and collective means of transportation.
In future studies, it should be checked as well whether the applied methods of estimating the number of mobile and immobile populations reflect their real mobility. Such an experimental study, using detailed lists of residents obtained, for example, from property owners and CCTV cameras, could verify the assumptions of this method and indicate the level of possible error.

Conflicts of Interest:
The authors declare no conflict of interest.