Models of Geospatially Referenced People Distribution as a Basis for Studying the Daily Cycles of Urban Infrastructure Use by Residents

Parygin, Danila; Anokhin, Alexander; Anikin, Anton; Finogeev, Anton; Gurtyakov, Alexander

doi:10.3390/smartcities8010001

Open AccessArticle

Models of Geospatially Referenced People Distribution as a Basis for Studying the Daily Cycles of Urban Infrastructure Use by Residents

by

Danila Parygin

^1,*

,

Alexander Anokhin

¹,

Anton Anikin

¹

,

Anton Finogeev

²

and

Alexander Gurtyakov

¹

Department of Digital Technologies for Urban Studies, Architecture and Civil Engineering, Volgograd State Technical University, 1 Akademicheskaya Str., 400074 Volgograd, Russia

²

Department of Computer-Aided Design Systems, Penza State University, 40 Krasnaya Str., 440026 Penza, Russia

^*

Author to whom correspondence should be addressed.

Smart Cities 2025, 8(1), 1; https://doi.org/10.3390/smartcities8010001

Submission received: 1 October 2024 / Revised: 27 November 2024 / Accepted: 16 December 2024 / Published: 24 December 2024

(This article belongs to the Section Applied Science and Humanities for Smart Cities)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Highlights

What are the main findings?

Data from statistics on demographics and professional specialization of the city’s population, combined with data on the composition of the urban infrastructure, make it possible to obtain a model of the distribution of people with a spatial–temporal reference.
The model of people placement in the city can reflect both the general hourly load of the territory and individual infrastructure facilities and the specifics of their use in terms of the types of activities produced.

What is the implication of the main finding?

The proposed approach to modeling the placement of people in the city allows for combining and analyzing data of varying detail to take into account the features of the daily functioning of both individual environmental objects and complex infrastructure systems.
The presence of a customizable and updatable model of population distribution in the city will allow it to be integrated into the environment of common data of smart city services and to configure the operating modes of individual infrastructure facilities in accordance with the predicted load.

Abstract

City services and infrastructures are focused on consumers and are able to effectively and qualitatively implement their functions only under conditions of normal workload. In this regard, the correct organization of a public service system is directly related to the knowledge of the quantitative and qualitative composition of people in the city during the day. The article discusses existing solutions for analyzing the distribution of people in a territory based on data collected by mobile operators, payment terminals, navigation systems and other network solutions, as well as the modeling methods derived from them. The scientific aim of the study is to propose a solution for modeling the daily distribution of people based on open statistics collected from the Internet and open-web mapping data. The stages of development of the modeling software environment and the methods for spatial analysis of available data on a digital cartographic basis are described. The proposed approach includes the use of archetypes of social groups, occupational statistics, gender and age composition of a certain territory, as well as the characteristics of urban infrastructure objects in terms of composition and purpose. Solutions for modeling the 48 h distribution of city residents with reference to certain infrastructure facilities (residential, public and working) during working and weekend days with an hourly breakdown of the simulated values were created as a result of the study. A simulation of the daily distribution of people in the city was carried out using the example of the city of Volgograd, Russian Federation. A picture of the daily distribution of city residents by district and specific buildings of the city was obtained as a result of the modeling. The proposed approach and the created algorithm can be applied to any city.

Keywords:

people distribution modeling; archetypes; social group scenarios; urban infrastructure; city; OpenStreetMap; geodata; data analysis; spatial distribution; geodata visualization

1. Introduction

There is a constant increase in population in cities. This occurs both due to natural reasons associated with the growth of the world population and due to the migration of the population to more developed and larger settlements. Because of this, the problem of proper organization of urban space arises: an increase in the number of people present in a city and visiting certain places leads to a higher load on the infrastructure, which negatively affects the efficiency of its functioning [1]. Urban infrastructure planning is a complex task, but it can be simplified by assessing the distribution of service consumers throughout the city for subsequent analysis and making proposals for modernizing the urban environment [2,3].

The main problem with most existing solutions is that the most complex of them use data received from mobile operators, payment terminals, navigation systems and other electronic devices. Telecom service providers charge a fee for receiving data this way. The data themselves are often just coordinates, and these data anonymized and do not allow for dividing people into groups based on any criteria [4,5,6,7,8]. Access to these data is difficult for a wide range of specialists. The methods for distributing people into groups used in these solutions are hidden and can occur over a limited list of infrastructure objects (for example, only residential ones). Such decisions do not allow for tracking the territorial redistribution of the inhabitants of a particular city.

The purpose of this study is to develop methods and software for collecting and integrating population data, building a model and creating a spatial visualization of the distribution of people in the city during the day using various open sources of data. The results of the work are a continuation of ongoing studies that demonstrate the possibilities of analyzing and processing data and will be useful for developing effective strategies for detecting and solving existing problems in urban spaces, as well as for reducing the costs of eliminating them and achieving sustainable development in the city [9,10,11].

Data are not very informative in terms of making decisions on the development of the urban environment if they are presented only by general numerical indicators and even if they divide people into social groups of people. The availability of data on the number of people in the city and their distribution among city districts and specific buildings are a prerequisite for proper city planning and the development of its territory and infrastructures. Such data open up possibilities for city planning: from the width of streets (so that they can cope with the flow of people and traffic) to the accessibility and capacity of shops or clinics (so that they can serve the required number of residents). The capabilities and functionality of the city’s infrastructure and its ability to withstand loads depends on the volume of population for which a particular facility is planned. Understanding the number of people who will use a particular facility and their needs allows for identifying problems and proposing the changes necessary to meet the real needs of the population in each specific location of the city [12,13].

This task faces the problem of initial information. There are statistics collected by various institutions, but such information is fragmented and makes it difficult to aggregate for a comprehensive study of the whole city. Now there are universalized methods for obtaining such data that are mainly related to a wide range of software and hardware information regarding smartphones [14]. Various applications in mobile phones can track where people are moving from and to. However, each application owner or mobile operator has its own amount of data. Not every person has a phone or is not able to use it regularly. Therefore, such data require additional processing and end-to-end analysis with other sources [15,16,17].

Modeling the distribution of people in a certain area is a topic that has been raised for a long time and does not lose its relevance. The works [18,19,20,21,22,23,24,25,26] confirm the relevance of the topic and substantiate the effectiveness and prospects of using the agent-based simulation method for solving problems of city management and beyond. Studies show the benefits of using these technologies to create decision-support systems in the aspect of big city data management [27]. Simulation modeling is a powerful tool for studying complex dynamic systems [28,29]. The argument is that this method is available to a wide range of users due to its low cost compared to other modeling methods. The material [30] discusses both the advantages of simulation modeling over other approaches, the difficulties that researchers had to face, as well as the opportunities for promoting this method and prospects for its development.

The study [31] describes the importance of various aspects for modeling urban environment data so that they can be used in the development of urban structures. There are a large number of interrelated factors and processes that affect data modeling. Here, we can also add the work [32], where the factors identified during the distribution are indicated. The authors wrote that the distribution of human activity reveals the so-called social–functional frameworks. Frameworks make it possible to establish the degree of mixture and diversity of activities, as well as to monitor their dynamics and changes in the spatial properties of the urban environment.

The use of methods is demonstrated in [33], where population density and mobility maps are compiled based on data from mobile operators. Mobile-phone location data can indeed reveal the spatiotemporal distribution of city dwellers. The authors of [34,35] used agent-based modeling methods to study the movement of pedestrians. The work can help to identify the shortcomings of the designed pedestrian infrastructure by analyzing the points of attraction for people.

Work [36] assesses the extent to which a city’s population is exposed to real or potential disasters using modeling of the distribution of people and an analysis of the resulting distribution. The result of the study was a picture of the congestion in certain areas of the city at a given time of day. This study does not classify people into groups or occupations.

The work [37] shows how distribution modeling can be useful. The analysis is based on available data, but the reasoning is based on mobile phone data.

Methods for improving the accuracy of population forecasting in a small area are presented in [38,39,40]. This made it possible to improve the spatial resolution of the urban population distribution model. Also, a set of agent-based models is presented in [41] and is designed to study the impact of people’s movements on a local scale.

An approach to extracting additional information for modeling was demonstrated in [8], where spatiotemporal patterns were identified based on the available data. Also, the material [42] demonstrates the effectiveness of modeling local spatial dependence using a conditional autoregressive process to display the distribution of population in space.

Other approaches are being explored in addition to those already discussed. Paper [43] demonstrates a big data modeling method using geotagged Twitter posts. The material [44] provides an overview of decision-making methods in an agent-based modeling system which are integrated into various agent-based models, and provides insight into the future of decision modeling.

This work proposes a new approach to modeling the daily distribution of people in the city, taking into account the experience of previous studies.

In Section 2, it is described how the data in this approach is collected and used. The algorithm that models the distribution of the citizens across the city is described here as well as the architecture of the program that implements this algorithm. In Section 3, the program itself and the details of its implementation are discussed. This paragraph contains the results of the program’s work on modeling the distribution of residents in the city space depending on the time of day and day of the week. In Section 4, the advantages of the proposed solution are discussed. In Section 5, the final conclusions on the topic are made and the future work is discussed.

2. Materials and Methods

First of all, it is necessary to state the proposed concept of modeling the daily distribution of people in the city. Here, it is worth mentioning that the aim of this approach is to create a static picture of distribution of citizens in the city for a given time of a day in the week based on the data about citizens and their occupations.

2.1. Data Sources

The basic source of population statistics is proposed to use the data of the state statistical body or its local representative office in order to operate with official open data. Rosstat [45] serves as a data source in the proposed approach. Rosstat is the federal state statistics service of the Russian Federation, which conducts censuses of population, agriculture, industry and other sectors of the economy. Rosstat also collects data from organizations, enterprises and government agencies for subsequent processing and analysis of these data. Statistical reports and analytical materials on the collected data are the result of the organization’s work.

The following data were obtained from this source:

Data on the age distribution of the population. These data are necessary to create archetypes based on the age of the actors as well as to select a suitable profession for a specific age.
The total number of people of a given age. Data are needed for reconciliation and balance when assigning people to jobs.
Number of women and men of a given age. These data are also used when creating archetypes, since, for each archetype, a corresponding script will be created. Also, gender and age division is used when distributing actors into archetypes, since there are differences in how a person (man or woman) of a given age falls into one or another archetype.
Data on groups of professions. This is necessary to match actors with building tags corresponding to their professions.
The number of people belonging to a specific group of professions (industry). The data are used to determine the checksum of people classified in a particular specialty.

These data reflect the demographics and employment of people for the simulated area. They are able to characterize the location of certain social groups in specific time periods. The data collected from Rosstat are current (as of 2022).

The source of geospatial information about the objects of the territory is the open service of cartographic data OpenStreetMap (OSM) [46].

The OSM service is proposed for obtaining:

The type of the resulting geospatial feature. This is necessary to separate buildings and service tags. Each map object can only have one purpose.
The latitude and longitude of the building. These coordinates are used to accurately obtain the location of objects.
The tag assigned to the building. Tags are necessary to divide buildings into certain groups (work, public and residential) so that, in the future, people corresponding to this group will reach this desitnation.
Polygon. The building polygon is represented by point coordinates and is used to calculate the area of the building, the value of which determines the number of people in the building.

Data about the geographical features of the city allow buildings to be classified. This will make it possible to determine the number of people that can be distributed into specific groups of buildings using information about the characteristics of specific map objects.

The next stage is the creation of archetypes. An archetype is considered a category through which one or more people can be identified using information about gender and age. An archetype has certain nuances of behavior that determine at what time, and on which day of the week, a representative of this archetype is in a particular area on the map.

The creation of archetypes requires information about the number of people belonging to a certain age and gender group, as well as about groups of organizations and the number of employees in them. This information is provided by Rosstat, as described above.

Gender and age group are indicated for each of the archetypes. The set of buildings in which a person can be at a certain hour of a given day is then determined. Then, a ratio is drawn up that specifies in which building (according to the coordinates on the map) a representative of the specified archetype can be located at a specific hour on a weekday and a specific hour on a weekend.

Archetypes are necessary for the formation of social processes and the creation of a model that reflects the logic of the distribution of people in the space of infrastructure objects.

Eight archetypes were identified in total:

Baby (any gender);
Kindergarten-age child (any gender);
Schoolchild (any gender);
Student (any gender);
Working-age person (male);
Working-age person (female);
Retired (male);
Retired (female).

There is no distinction regarding behavior based on a person’s gender for most archetypes. Gender is important to consider only for the working age population and pensioners. This is due to the different retirement ages for men and women.

All the data described are necessary to create algorithms for detailed distribution modeling and will be used to display statistics on the distribution of groups of people across districts of the selected city.

2.2. Algorithm for People Distribution in Buildings

An algorithm using data obtained from Rosstat and from the OpenStreetMap map must be compiled to evenly distribute people among residential and work buildings. The first step is to sort the entire list of buildings received from OpenStreetMap by marking the buildings as residential and work. Then, the area of all buildings of the same category is summed up and divided by the number of people. The total number of people in the city is taken into account when residential buildings are considered. The number of people in a particular occupational group that includes the selected buildings is used if work buildings are considered. The result of the calculation is the value of the number of square meters per person, i.e., area for one city resident. The formula for calculating the number of square meters per person is presented below:

Q = ΣS_building/N_group,

(1)

where ΣS_building—the sum of the areas of residential/working buildings in square meters;

N_group—the number of people in the city or people belonging to a group of occupations.

Using this formula, it is possible to count exactly how many square meters are present for every citizen taken into account. Now that the parameter of area per citizen is known, it is possible to move to the next step.

The next step is to iterate through the list of buildings. The area is already known for each building on the list. Therefore, the task of this stage is to determine how many people are in this building. To do this, the area of a particular building must be divided by the resulting number of square meters per person. The result of this step is the number of people who are in a particular building. The formula for calculating the number of people in a building is presented below:

N_building = S_building/Q,

(2)

where S_building—building area in square meters;

Q—number of square meters per person.

After this step, it is known exactly how many people can be housed in the buildings of the city.

2.3. Simulation Environment Architecture

The software implementation of the simulation environment was structured in accordance with the architecture presented in Figure 1. The proposed solution includes the following modules:

Data Collection Module—This downloads and prepares data obtained from open and verified data services “Rosstat” and “OpenStreetMap” using special libraries. Object data are saved to files and stored on the device.
Distribution Module—This works with collected data and performs the distribution of people for a specific time of the day using predefined methods and archetypes. Allocation information is saved to files and stored on the device.
Simulation Module—This works with the prepared data of the Distribution Module and uses the library to create a map with marks for distribution visualization. Created models are saved to files and stored on the device.
Visualization Module—This is a module for managing the program and viewing the distribution of people on an interactive map and distribution statistics. The module uses distribution data to display statistics and a model to visualize the distribution.

The proposed architecture makes it possible to implement software for analyzing available data sources and modeling the daily distribution of people in accordance with the proposed algorithms and archetypes.

3. Results

The results of the implementation of the model of the daily distribution of people in the city will be considered. The programming language Python [47] was chosen to develop the simulation environment. Various libraries of this language allow for automatically downloading, processing and working with data from services, regardless of whether it is cartographic data or statistical data, which is especially important for this study. The following libraries are used in the project:

Pandas [48]—used for data processing and analysis;
OSMnx [49]—allows for downloading geospatial data from OpenStreetMap;
Folium [50]—the library is used to visualize data on interactive maps;
Pyproj [51] and Shapely [52]—designed for spatial analysis;
PyQt5 [53]—a set of libraries for creating a graphical interface.

Computational and experimental tasks were carried out on the “High Performance Computing Complex” deployed at the Department of Digital Technologies for Urban Studies, Architecture and Civil Engineering at Volgograd State Technical University as part of the implementation of the strategic academic leadership program “Priority 2030”.

3.1. Data Collection·

A test study was performed on the data of the city of Volgograd, Russia. Collection of data on buildings and boundaries of districts of the city of Volgograd from the OSM service is carried out using the OSMnx library and its built-in functions.

A function was created to convert the resulting lists of buildings and boundaries into a spatial data format. This function receives the necessary data about buildings, tags and district boundaries and then prepares the received data for forms for processing by subsequent modules of the modeling environment. In addition, the created function is used to calculate the area of buildings.

The conversion of coordinates from one system to another must be carried out before moving on to calculating the area of buildings. The Pyproj library is included in the project for this. The Shapely library was also connected to the project to construct building polygons and calculate their areas using converted coordinates.

The received data are saved to a file after all the functions described above have been completed. The work of the Data Collection Module is completed at this stage, and the received data are sent to the next program module.

3.2. Distribution of People by Objects

The Distribution Module requires data prepared by the previous module to operate. These data is automatically loaded from files that store information about people, buildings, tags and areas. The Pandas library is connected to the project, which is necessary for filtering the list of buildings. Only residential buildings and buildings of potential places of work for the population remain as a result of the work of this library for further work.

A dataframe is created based on the processed data. This is a table that contains records about individuals that is created using the developed function.

The next step involves using a separately developed function, the task of which is to create a dictionary containing data about city districts and their corresponding polygons on the map. These data is necessary to determine the location of each person and build statistics on the distribution of people by area and employment.

Then, the function that calculates the number of square meters per person in residential buildings and buildings where workplaces are located is activated. This is necessary for the correct operation of the algorithm for distributing people by area. This function necessarily takes into account what day of the week it is given as input: a weekday or a weekend.

The resulting data are sent to the next function, which was used to prepare the data for the start of distribution. In addition, information about the previously identified archetypes of people is also supplied to the input of this function. The distribution of people is launched in the same function and a cyclic processing of a list of people categorized by archetypes is carried out.

The data about people, buildings and tags are updated as a result of these functions. A separate function for generating statistics on the distribution of people by region is launched at the final stage of this module. This function takes into account distribution data and calculates the number of vacationers, workers and students present in a particular city district. The final data on the distribution of people by infrastructure facilities and statistical data on the distribution of people by district are saved in separate files.

3.3. Modeling by Periods

The Simulation Module reads the data from the people distribution file and then starts the simulation preparation function. The function in the loop starts processes for modeling periods where the distribution data and the simulated hour are specified in the parameters. Data on people’s belonging to the previously designated archetypes is used for correct modeling: belonging to the archetype and the current value of the time of day and day of the week allows for determining at what point on the map a particular actor is located. Processes are created thanks to the Multiprocessing module built into Python 3.11.4. Functions work in parallel, which is necessary to speed up data processing.

Creating a map instance that includes distribution layers for residential buildings, work buildings and educational buildings occurs as part of the modeling process. Cluster groups are created where the created people marks will be placed. Clusters are also needed to optimize data displays.

The latitude, longitude and distribution tag data are then looped through to create markers and add them to the cluster group based on the current person’s distribution tag. The populated clusters are added to the layers after the entire loop has been processed, and the layers are then added to the map. Thus, maps of the distribution of people for the specified hours in the distribution scenarios are compiled and saved.

3.4. Model Data Visualization

Visualization of model data is performed in the program interface, which includes a widget for displaying a map with the distribution of people and a widget for analyzing the distribution for a selected hour (Figure 2). The day and simulated hour selectors are required to update and display the map with the selected values. The day types include weekday and weekend. Simulated hour time: 0–24 (in increments of one hour).

The interface provides a data update control that invokes the Data Collection, Distribution, and Simulation Modules. This allows for correcting the models when the initial data on population statistics and urban infrastructure facilities change. The map update takes the value from the day selection and simulated hour selection windows and initiates the map update trigger. The program looks for a file with the specified day type and hour to simulate and then updates the displayed map on the user’s screen. Data updating in the distribution analysis statistics widget occurs in addition to updating the map. This is carried out by reading data from the area statistics file with the specified day type and hour.

Figure 2 shows the interface of the program. On the left side of the window, the map is displayed, and the distribution of citizens is shown here by the circles. Each circle is a cluster with a number that describes how many people are present in this cluster. On the right side of the window, the distribution by city district and by occupation are shown. It is possible to see how many people of a given occupation are present in a given district of the city.

The city of Volgograd, Russia, was chosen to test the proposed approach. The population of the city is 1.025 million people [45]. Figure 3 shows the boundaries of an urban area and an example of the distribution of people. The data are automatically clustered when the map scale is changed. The colors of the cluster markers show the visual difference in the quantitative gradation. Figure 3b shows an example of the distribution of people by occupation.

The variant in which the colors of the cluster markers indicate different types of people’s activity was also considered during testing of the visual representation of the model data. Figure 4 shows the distribution of people by activity types with corresponding color differentiations in the markers. However, there is no population gradient with this solution. The value is indicated only by the number inside the marker.

Clustering becomes clearly visible when the map is scaled. Nearby points (individual people) are collected into a cluster as the map zooms out (farther away). Clusters are merged with neighboring clusters upon further scaling, and so on.

Layers of simulated data allow for displaying different levels of detail (Figure 5). Large infrastructure facilities include estimates for the entire working/living area of the buildings. The objects of urban service, marked in the OpenSteetMap data, allow for estimating the number of individual companies within the enlarged business spaces.

4. Discussion

An analysis of the subject area and ready-made solutions shows that existing solutions are largely based on data from mobile operators or from aggregators of the GPS coordinates of users’ mobile devices. These data are limited to the audience of users of a particular digital/network product and often do not have free access. In addition, such source data have limitations in regard to meta-information, such as the division of people into social groups. While such information is necessary for making decisions on rational distribution between different zones in the city. Also, existing analytical solutions are focused on the study of population concentration without reference to certain objects of urban infrastructure [54,55,56,57,58]. Based on such information, it is only possible to estimate the number of people in a selected area of the city without the ability to assess the load on specific infrastructure facilities and residential buildings.

The use of data from cellular operators offers reliability but significantly limits the breadth of their use. All the information provided by the operators consists of the instant location of a certain person at a given point in the city. Information does not contain characteristics about who exactly this person is and what they do, but it is a significant factor in the issue of optimizing the urban environment. It is impossible to create a comfortable and safe environment for everyone without understanding the specific needs of specific people.

It is worth mentioning that the information on the location of any person must be anonymous to protect this person’s private information and to not violate any laws regarding personal information such as GDPR in the EU. Thus, the approach based on the data from cellular operators is flawed by not being able to provide information that is especially helpful in regard to achieving sustainable city development.

The approach proposed in this study is qualitatively different from existing ones. It provides a greater breadth of data application in exchange for its reliability. This study does not use data about the actual location of a certain person at points on the map but rather uses a broader array of data related to who a particular person may be, and these data are the basis for an assumption about where in the city a person may be located. This way, the algorithm does not use any kind of personal information and cannot be used to violate anyone’s privacy.

The approach proposed in this study implements the simulation of the distribution of city residents with reference to certain infrastructure facilities (residential, public and working) and is performed for 48 h during working and weekend days with an hourly breakdown of the simulated values. This solution was chosen based on the analysis of existing solutions and corresponds to modern approaches to the analysis of the daily distribution of the population, taking into account work/day-off cycles and differentiation by the hours of the day [59,60]. In [61], author G. Boeing proposes a similar approach which is based on census information and uses open-source data combined with information from employers in the United States. The article [62] also presents a similar approach used in China while focusing on land use at different times of day.

Thus, the developed solution complements the existing approaches and provides a new tool for studying the problems of the city. The modeling environment in its current form is focused on the use of open data sources on the Internet. The use of state statistics makes it possible to operate with official data. At the same time, they are sufficiently detailed to obtain an adequate result in the framework of the proposed approach when calculating scenarios for the distribution of people. However, data can also be used from other sources. That is, the data of mobile operators or navigational geodata converted into the format used in the created modeling environment can also be the basis for refined scenarios for the distribution of people.

It is especially important to once again note the additional effect of using the proposed approach for solving the problems of sustainable urban development. The study of development problems in densely populated urban areas is extremely relevant for thousands of places in almost all countries of the world. The task of identifying bottlenecks in the infrastructure provision of megacities associated with daily fluctuations in the number of people is urgent. However, the cost of data and the availability of tools for processing them are limitations to their use. In fact, the results of this study can be considered in the context of developing a pool of funds for uncovering hotbeds of problems in the sustainable functioning of cities.

The aspect of implementing the functionality of the project’s software shell is an important distinctive feature of this work. It becomes possible not only to obtain data on the instantaneous location of people in various points of the city but also to build models of the distribution of people at specific hours of weekdays and weekends. This allows for analyzing the trend of increasing and decreasing congestion of certain points depending on the time of day and day of the week. Such an analysis will allow for making more accurate assumptions about how the city’s infrastructure needs to be improved. The implementation of such a solution does not focus on one specific point in time but allows one to consider various intervals, which is especially valuable for analyzing the population of urban areas and supporting decisions on the placement and improvement of urban services.

5. Conclusions

The approach to modeling the distribution of people in the city was developed as a result of the study. It takes into account the daily activity scenarios of various social groups based on the identified archetypes. A method is proposed for linking people to certain infrastructure facilities in accordance with professional employment and gender and age characteristics. The created approach formed the basis for designing the modeling environment and software implementation.

The theoretical significance of the study and the attention of researchers to the topic were confirmed through the analysis of existing studies. The practical significance of the developed tool was determined by the fact that spatial modeling is one of the most important elements in the process of searching for optimal measures for the development of the urban environment. The developed software can be useful in supporting decision-making on the placement of new facilities in the city and transforming the functionality of existing infrastructure facilities, with an emphasis on creating a comfortable urban environment for all social categories of city residents.

The use of the approach principles outlined in this material will allow researchers to create their own software solutions aimed at studying urban mobility. The main advantage of the presented approach, consisting of the use of exclusively open data, opens up opportunities for ensuring sustainable development of the urban environment.

Further research on this topic may be aimed at expanding the existing set of archetypes as well as generating data sets to clarify their characteristics based on other open sources. In addition, it is necessary to create more accurate point scenarios with a deep breakdown of the resulting archetypes. More detailed behavioral scenarios will make it possible to reflect differences in the use of urban infrastructures even among now generalized groups of citizens.

Author Contributions

Conceptualization, D.P.; methodology, D.P. and A.A. (Alexander Anokhin); software, A.G.; validation, A.A. (Alexander Anokhin), A.A. (Anton Anikin) and A.F.; formal analysis, A.G. and A.A. (Alexander Anokhin); investigation, A.G. and A.A. (Alexander Anokhin); resources, A.F. and A.A. (Anton Anikin); data curation, D.P., A.A. (Alexander Anokhin) and A.A. (Anton Anikin); writing—original draft preparation, A.A. (Alexander Anokhin); writing—review and editing, D.P.; visualization, A.G.; supervision, D.P.; project administration, D.P.; funding acquisition, D.P. and A.F. All authors have read and agreed to the published version of the manuscript.

Funding

The study has been supported by the grant from the Russian Science Foundation (RSF) and the Administration of the Volgograd Oblast (Russia) No. 22-11-20024, https://rscf.ru/en/project/22-11-20024/ (accessed on 25 November 2024). The results of Section 3.1 were obtained within the RSF grant project No. 20-71-10087.

Data Availability Statement

The data presented in this study were derived from the following resources available in the public domain: Geofabrik at https://download.geofabrik.de/russia/south-fed-district.html; Rosstat at https://rosstat.gov.ru/scripts/db_inet2/passport/pass.aspx?base=munst18&r=18701000.

Acknowledgments

The authors express gratitude to colleagues from the Department of Digital Technologies for Urban Studies, Architecture and Civil Engineering, VSTU and the Urban Computing Laboratory (UCLab) involved in the development of the UrbanBasis.com scientific projects. Special thanks are expressed to Maxim Anishchenko for his contribution to the development of software components and the formation of a draft model description.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sadovnikova, N.; Savina, O.; Parygin, D.; Churakov, A.; Shuklin, A. Application of Scenario Forecasting Methods and Fuzzy Multi-Criteria Modeling in Substantiation of Urban Area Development Strategies. Information 2023, 14, 241. [Google Scholar] [CrossRef]
Zelenskiy, I.; Parygin, D.; Savina, O.; Finogeev, A.; Gurtyakov, A. Effective Implementation of Integrated Area Development Based on Consumer Attractiveness Assessment. Sustainability 2022, 14, 16239. [Google Scholar] [CrossRef]
Pozoukidou, G.; Angelidou, M. Urban Planning in the 15-Minute City: Revisited under Sustainable and Smart City Developments until 2030. Smart Cities 2022, 5, 1356–1375. [Google Scholar] [CrossRef]
Gong, Y.; Lin, Y.; Duan, Z. Exploring the spatiotemporal structure of dynamic urban space using metro smart card records. Comput. Environ. Urban Syst. 2017, 64, 169–183. [Google Scholar] [CrossRef]
Wang, W.; Pei, T.; Chen, J.; Song, C.; Wang, X.; Shu, H.; Ma, T.; Du, Y. Population Distributions of Age Groups and Their Influencing Factors Based on Mobile Phone Location Data: A Case Study of Beijing, China. Sustainability 2019, 11, 7033. [Google Scholar] [CrossRef]
Vazquez-Prokopec, G.M.; Bisanzio, D.; Stoddard, S.T.; Paz-Soldan, V.; Morrison, A.C.; Elder, J.P.; Ramirez-Paredes, J.; Halsey, E.S.; Kochel, T.J.; Scott, T.W. Using GPS technology to quantify human mobility, dynamic contacts and infectious disease dynamics in a resource-poor urban environment. PLoS ONE 2013, 8, e58802. [Google Scholar] [CrossRef] [PubMed]
He, M.; Xu, Y.; Li, N. Population Spatialization in Beijing City Based on Machine Learning and Multisource Remote Sensing Data. Remote Sens. 2020, 12, 1910. [Google Scholar] [CrossRef]
Ma, Y.; Xu, W.; Zhao, X.; Li, Y. Modeling the Hourly Distribution of Population at a High Spatiotemporal Resolution Using Subway Smart Card Data: A Case Study in the Central Area of Beijing. ISPRS Int. J. Geo-Inf. 2017, 6, 128. [Google Scholar] [CrossRef]
Parygin, D.; Sadovnikova, N.; Gamidullaeva, L.; Finogeev, A.; Rashevskiy, N. Tools and Technologies for Sustainable Territorial Development in the Context of a Quadruple Innovation Helix. Sustainability 2022, 14, 9086. [Google Scholar] [CrossRef]
Kaluarachchi, Y. Implementing Data-Driven Smart City Applications for Future Cities. Smart Cities 2022, 5, 455–474. [Google Scholar] [CrossRef]
Shuklin, A.; Parygin, D.; Gurtyakov, A.; Savina, O.; Rashevskiy, N. Synthetic News as a Tool for Evaluating Urban Area Development Policies. In Proceedings of the 2022 International Conference on Engineering and Emerging Technologies (ICEET), Kuala Lumpur, Malaysia, 27–28 October 2022. [Google Scholar] [CrossRef]
Vinh Ha, T.; Asada, T.; Arimura, M. Changes in mobility amid the COVID-19 pandemic in Sapporo City, Japan: An investigation through the relationship between spatiotemporal population density and urban facilities. Transp. Res. Interdiscip. Perspect. 2022, 17, 100744. [Google Scholar] [CrossRef]
Zornoza-Gallego, C. Means of Transport and Population Distribution in Metropolitan Areas: An Evolutionary Analysis of the Valencia Metropolitan Area. Land 2022, 11, 657. [Google Scholar] [CrossRef]
Shi, Y.; Yang, J.; Shen, P. Revealing the Correlation between Population Density and the Spatial Distribution of Urban Public Service Facilities with Mobile Phone Data. ISPRS Int. J. Geo-Inf. 2020, 9, 38. [Google Scholar] [CrossRef]
Zhao, G.; Yang, M. Urban Population Distribution Mapping with Multisource Geospatial Data Based on Zonal Strategy. ISPRS Int. J. Geo-Inf. 2020, 9, 654. [Google Scholar] [CrossRef]
Bhaduri, B.; Bright, E.; Coleman, P.; Urban, M.L. LandScan USA: A high-resolution geospatial and temporal modeling approach for population distribution and dynamics. GeoJournal 2007, 69, 103–117. [Google Scholar] [CrossRef]
Grippa, T.; Linard, C.; Lennert, M.; Georganos, S.; Mboga, N.; Vanhuysse, S.; Gadiaga, A.; Wolff, E. Improving Urban Population Distribution Models with Very-High Resolution Satellite Information. Data 2019, 4, 13. [Google Scholar] [CrossRef]
Omarov, B.; Altayeva, A.; Turganbayeva, A.; Abdulkarimova, G.; Gusmanova, F.; Sarbasova, A.; Omarov, B.; Dauletbek, Y.; Altayeva, A.; Omarov, N. Agent Based Modeling of Smart Grids in Smart Cities. Commun. Comput. Inf. Sci. 2019, 947, 3–13. [Google Scholar] [CrossRef]
Cheliotis, K. An agent-based model of public space use. Comput. Environ. Urban Syst. 2020, 81, 101476. [Google Scholar] [CrossRef]
Mehdizadeh, M.; Nordfjaern, T.; Klöckner, C.A. A systematic review of the agent-based modelling/simulation paradigm in mobility transition. Technol. Forecast. Soc. Change 2022, 184, 122011. [Google Scholar] [CrossRef]
Parygin, D.; Usov, A.; Burov, S.; Sadovnikova, N.; Ostroukhov, P.; Pyannikova, A. Multi-agent Approach to Modeling the Dynamics of Urban Processes (on the Example of Urban Movements). Commun. Comput. Inf. Sci. 2020, 1135, 243–257. [Google Scholar] [CrossRef]
Metzner, N. A comparison of agent-based and discrete event simulation for assessing airport terminal resilience. Transp. Res. Procedia 2019, 43, 209–218. [Google Scholar] [CrossRef]
Kim, B.; Lim, C.-G.; Lee, S.-H.; Jung, Y.-J. A Study on the Population Distribution Prediction in Large City using Agent-Based Simulation. In Proceedings of the 2021 23rd International Conference on Advanced Communication Technology (ICACT), PyeongChang, Republic of Korea, 7–10 February 2021. [Google Scholar] [CrossRef]
Happach, R.M.; Tilebein, M. Simulation as Research Method: Modeling Social Interactions in Management Science. Philos. Stud. Ser. 2015, 122, 239–259. [Google Scholar] [CrossRef]
Termos, A.; Picascia, S.; Yorke-Smith, N. Agent-Based Simulation of West Asian Urban Dynamics: Impact of Refugees. J. Artif. Soc. Soc. Simul. 2021, 24, 2. [Google Scholar] [CrossRef]
Makarov, V.L.; Bakhtizin, R.A.; Beklaryan, G.L.; Akopov, A.S. Simulation modeling of the Smart City System: The concept, methods, and cases. Natl. Interests Priorities Secur. 2019, 15, 200–224. [Google Scholar] [CrossRef]
Ustugova, S.; Parygin, D.; Sadovnikova, N.; Yadav, V.; Prikhodkova, I. Geoanalytical System for Support of Urban Processes Management Tasks. Commun. Comput. Inf. Sci. 2017, 754, 430–440. [Google Scholar] [CrossRef]
Crooks, A.; Heppenstall, A.; Malleson, N.; Manley, E. Agent-Based Modeling and the City: A Gallery of Applications. In Urban Informatics; Shi, W., Goodchild, M.F., Batty, M., Kwan, M., Zhang, A., Eds.; Springer: Singapore, 2021; pp. 885–910. [Google Scholar] [CrossRef]
Antelmi, A.; Cordasco, G.; D’Ambrosio, G.; De Vinco, D.; Spagnuolo, C. Experimenting with Agent-Based Model Simulation Tools. Appl. Sci. 2023, 13, 13. [Google Scholar] [CrossRef]
Heppenstall, A.; Crooks, A.; Malleson, N.; Manley, E.; Ge, J.; Batty, M. Future Developments in Geographical Agent-Based Models: Challenges and Opportunities. Geogr. Anal. 2021, 53, 76–91. [Google Scholar] [CrossRef]
Hassan, M.I.; Elhassan, S.M.M. Modelling of Urban Growth and Planning: A Critical Review. J. Build. Constr. Plan. Res. 2020, 8, 245–262. [Google Scholar] [CrossRef]
Emelianov, S.G.; Bakaeva, N.V.; Zuleta, D.P. Criteria for reconstruction of urban environment on principles of harmonizing nature, society and human being. IOP Conf. Ser. Mater. Sci. Eng. 2019, 687, 066002. [Google Scholar] [CrossRef]
Vorobyev, A.N. Application of geodata for operational study of population placement and movement. IOP Conf. Ser. Earth Environ. Sci. 2021, 629, 012003. [Google Scholar] [CrossRef]
Anokhin, A.; Burov, S.; Parygin, D.; Rent, V.; Sadovnikova, N.; Finogeev, A. Development of Scenarios for Modeling the Behavior of People in an Urban Environment. Stud. Syst. Decis. Control 2021, 333, 103–114. [Google Scholar] [CrossRef]
Crooks, A.; Croitoru, A.; Lu, X.; Wise, S.; Irvine, J.M.; Stefanidis, A. Walk This Way: Improving Pedestrian Agent-Based Models through Scene Activity Analysis. ISPRS Int. J. Geo-Inf. 2015, 4, 1627–1656. [Google Scholar] [CrossRef]
Carneiro Freire, S.M. Modeling of Population Distribution in Space and Time to Support Disaster Risk Management. Ph.D. Thesis, University of Twente, Twente, The Netherlands, 2020. [Google Scholar]
Kubíček, P.; Konečný, M.; Stachoň, Z.; Shen, J.; Herman, L.; Řezník, T.; Staněk, K.; Štampach, R.; Leitgeb, Š. Population distribution modelling at fine spatio-temporal scale based on mobile phone data. Int. J. Digit. Earth 2019, 12, 1319–1340. [Google Scholar] [CrossRef]
Zou, Y.; Zhang, S.; Min, Y. Exploring Urban Population Forecasting and Spatial Distribution Modeling with Artificial Intelligence Technology. Comput. Model. Eng. Sci. 2019, 119, 295–310. [Google Scholar] [CrossRef]
Dong, J.; Li, G.; Du, L. Research on Population Distribution Model Based on Real Estate Big Data. In Proceedings of the 8th International Conference on Management and Computer Science (ICMCS 2018), Shenyang, China, 10–12 August 2018. [Google Scholar] [CrossRef]
Zakharov, K.; Aghajanyan, A.; Kovantsev, A.; Boukhanovsky, A. Forecasting Population Migration in Small Settlements Using Generative Models under Conditions of Data Scarcity. Smart Cities 2024, 7, 2495–2513. [Google Scholar] [CrossRef]
Pizzitutti, F.; Pan, W.; Feingold, B.; Zaitchik, B.; Álvarez, C.A.; Mena, C.F. Out of the net: An agent-based model to study human movements influence on local-scale malaria transmission. PLoS ONE 2018, 13, e0193493. [Google Scholar] [CrossRef]
Epifani, I.; Ghiringhelli, C.; Nicolini, R. Modeling Local Spatial Dependence in Shaping Population Distribution. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3136302 (accessed on 30 April 2023).
Kavak, H.; Padilla, J.J.; Lynch, C.J.; Diallo, S.Y. Big data, agents, and machine learning: Towards a data-driven agent-based modeling approach. In Proceedings of the Annual Simulation Symposium, Baltimore, MD, USA, 15–18 April 2018. [Google Scholar]
DeAngelis, D.L.; Diaz, S.G. Decision-Making in Agent-Based Modeling: A Current Review and Future Prospectus. Front. Ecol. Evol. 2019, 6, 237. [Google Scholar] [CrossRef]
Federal State Statistics Service. Available online: https://rosstat.gov.ru/ (accessed on 25 May 2023). (In Russian)
OpenSteetMap. Available online: https://www.openstreetmap.org/ (accessed on 30 August 2023).
Python. Available online: https://www.python.org/ (accessed on 17 April 2023).
Pandas. Available online: https://pandas.pydata.org/ (accessed on 10 May 2023).
OSMnx. Available online: https://osmnx.readthedocs.io/en/stable/ (accessed on 2 May 2023).
Folium. Available online: https://python-visualization.github.io/folium/ (accessed on 26 April 2023).
Pyproj. Available online: https://pypi.org/project/pyproj/ (accessed on 4 June 2023).
Shapely. Available online: https://pypi.org/project/shapely/ (accessed on 7 June 2023).
PyQt5 Reference Guide. Available online: https://doc.bccnsoft.com/docs/PyQt5/ (accessed on 28 August 2023).
Ratti, C.; Pulselli, R.; Williams, S.; Frenchman, D. Mobile Landscapes: Using Location Data from Cell Phones for Urban Analysis. Environ. Plan. B Plan. Des. 2006, 33, 727–748. [Google Scholar] [CrossRef]
Fujishima, S.; Fujiwara, N.; Akiyama, Y.; Shibasaki, R.; Sakuramachi, R. The size distribution of ‘cities’ delineated with a network theory-based method and mobile phone GPS data. Int. J. Econ. Theory 2020, 16, 38–50. [Google Scholar] [CrossRef]
Garrido-Valenzuela, F.; Cats, O.; van Cranenburgh, S. Where are the people? Counting people in millions of street-level images to explore associations between people’s urban density and urban characteristics. Comput. Environ. Urban Syst. 2023, 102, 101971. [Google Scholar] [CrossRef]
Todd, J.; Yano, K.; Hanaoka, K. A dashboard application to explore population distribution derived from GPS location data during the COVID-19 pandemic in Kyoto, Japan. Abstr. ICA 2023, 6, 1–2. [Google Scholar] [CrossRef]
Zhao, S.; Luo, X.; Ma, X.; Bai, B.; Zhao, Y.; Zou, W.; Yang, Z.; Au, M.H.; Qiu, X. Exploiting Proximity-Based Mobile Apps for Large-Scale Location Privacy Probing. Secur. Commun. Netw. 2018, 2018, 3182402. [Google Scholar] [CrossRef]
McPherson, T.N.; Brown, M. Estimating daytime and nighttime population distributions in US cities for emergency response activities. In Proceedings of the Symposium on Planning, Nowcasting, and Forecasting in the Urban Zone, Seattle, WA, USA, 10–12 January 2004. [Google Scholar]
Zhang, C.; Li, M.; Ma, D.; Guo, R. How Different Are Population Movements between Weekdays and Weekends: A Complex-Network-Based Analysis on 36 Major Chinese Cities. Land 2021, 10, 1160. [Google Scholar] [CrossRef]
Boeing, G. Estimating local daytime population density from census and payroll data. Reg. Stud. Reg. Sci. 2018, 5, 179–182. [Google Scholar] [CrossRef]
Qi, W.; Liu, S.; Gao, X.; Zhao, M. Modeling the spatial distribution of urban population during the daytime and at night based on land use: A case study in Beijing, China. J. Geogr. Sci. 2015, 25, 756–768. [Google Scholar] [CrossRef]

Figure 1. Architecture of the simulation software environment.

Figure 2. Model data visualization interface.

Figure 3. Displaying numeric values within distribution boundaries: (a) test city boundaries; (b) cluster markers for the distribution of people by work activities in the city (markers colors show the visual difference in the quantitative gradation).

Figure 4. Distribution representation with color gradation of population activity types (green markers are people who are resting; yellow markers are those who are studying; orange markers represent those who are working).

Figure 5. The number of employees in one of the commercial and business districts of the city (the colors of the cluster markers show the visual difference in the quantitative gradation): (a) detailing at the level of individual buildings; (b) detailing at the level of departments and stores.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Parygin, D.; Anokhin, A.; Anikin, A.; Finogeev, A.; Gurtyakov, A. Models of Geospatially Referenced People Distribution as a Basis for Studying the Daily Cycles of Urban Infrastructure Use by Residents. Smart Cities 2025, 8, 1. https://doi.org/10.3390/smartcities8010001

AMA Style

Parygin D, Anokhin A, Anikin A, Finogeev A, Gurtyakov A. Models of Geospatially Referenced People Distribution as a Basis for Studying the Daily Cycles of Urban Infrastructure Use by Residents. Smart Cities. 2025; 8(1):1. https://doi.org/10.3390/smartcities8010001

Chicago/Turabian Style

Parygin, Danila, Alexander Anokhin, Anton Anikin, Anton Finogeev, and Alexander Gurtyakov. 2025. "Models of Geospatially Referenced People Distribution as a Basis for Studying the Daily Cycles of Urban Infrastructure Use by Residents" Smart Cities 8, no. 1: 1. https://doi.org/10.3390/smartcities8010001

APA Style

Parygin, D., Anokhin, A., Anikin, A., Finogeev, A., & Gurtyakov, A. (2025). Models of Geospatially Referenced People Distribution as a Basis for Studying the Daily Cycles of Urban Infrastructure Use by Residents. Smart Cities, 8(1), 1. https://doi.org/10.3390/smartcities8010001

Article Menu

Models of Geospatially Referenced People Distribution as a Basis for Studying the Daily Cycles of Urban Infrastructure Use by Residents

Abstract

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sources

2.2. Algorithm for People Distribution in Buildings

2.3. Simulation Environment Architecture

3. Results

3.1. Data Collection·

3.2. Distribution of People by Objects

3.3. Modeling by Periods

3.4. Model Data Visualization

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI