Density Estimation of Mobile Users’ Address Queries before and during the COVID-19 Pandemic

The aim of this study was to monitor social mobility using mobile users’ address searches before and during the outbreak of COVID-19. Mobile Google users’ address inquiries between the dates of 15 February 2020 and 27 July 2020 in the historical peninsula of Istanbul were gathered. The spatial distribution of the searches was examined and a heat map was produced based on kernel density estimation (KDE). The density of the inquiries started to decline in March, which is the month in which the first cases were reported in Turkey. An increase was reported in address queries in June and July.


Introduction
A dramatic increase in the population of cities has led to problems in public health, disaster planning, employment, real estate, transportation, national defense, urban planning, natural resources management, pollution, etc. Well-organized, scalable, efficient, and sustainable planning is possible with information communication technologies (ICT). Geographic information systems (GIS) are a subset of information systems, and allow collection, storage, querying, and display of data with related spatial information. These technologies provide storage, monitoring, and analysis of real-time data collected from different sources, such as networks, services, cameras, and sensors. Because data is continuously streamed, it can have a large size. Growth in the unstructured data obtained from different sources led to the introduction of the term big data, which is one of the most important trending topics in ICT. Rapid developments and changes in technology and science have changed the way both academics and professionals operate. Due to these changes, many business processes have changed [1]. One of the most important developments in the business world in recent years has been the beginning of the age of big data. Due to its potential, big data can transform the relationship between individuals and institutions, customers and companies, patients and the healthcare system, students and universities, and voters and the government [2]. The means of processing big data and obtaining useful information can be applied to many disciplines. Today, an important source of big data is geographic data that includes spatial information. Geographical data provides the global location of an object or person, or indicates what is located at a particular point [3].
Spatial data has become more important and valuable. The term "spatial" has emerged as an additional sense that can be understood as space that is monitored or sensed using electronic devices as a new form of cognition. Ref. [4] states that most big data is georeferenced and 80% is spatial. Furthermore, as indicated by [5], the amount of personal geolocation data is increasing by 20% each year. Approaches for handling such a large volume of geospatial big data collected from remote sensing methods (satellite images, GPS, Bluetooth, wireless, etc.) have become crucial in many GIS applications [4]. Recent developments in web-based applications support the development of GIS as a new architecture, which provides geographical information from sources at different points, and covers user requirements in real time. Due to the emergence of the new generation of GIS and geospatial image transmission standards, many software packages are currently in use. In addition, due to the increasing structure of broadband communication and personal devices that support high performance graphics, professional data display functions have become popular with services such as Google Maps, NASA Worldwind, Open StreetMap, Bing Maps, and GeaBios. To construct this software, different combinations of technologies and services have been used, such as displaying geospatial images that have high quality at high frame speed by transmitting spatial data from server to client, web mapping, and semantics [6]. Consumable electronic technology has made the use of spatial information possible in a natural and involuntary manner; for example, a typical cell phone can retrieve and store numerous embedded information services by integrated sensors to provide geographic coordinates, mobility patterns, search queries, and social interaction data without further assistance. This wide utilization of spatial information is a product of software dedicated to GIS and similar domains. Another crucial dimension of spatial information is its use in high-scale applications at regional, national, and international levels [7].
In this study, we aimed to plot geospatial data using R and static maps from Google Maps. We show the capabilities of geographic information mapping methods to visualize spatial data (e.g., points or polygons) in the context of data analysis and to assist decisionmakers. By pursuing a multidisciplinary approach and combining GIS, statistics, and public health, we aimed to enhance the role of mobility data in the control of the COVID-19 pandemic. In our study, we suggest an alternative approach to monitor social mobility, which may provide new perspectives for the adoption of precautions or planning for decision-makers during or after the pandemic.
The following sections of the paper provide an up-to-date literature review, a detailed description of GIS and its usage, an explanation of the proposed kernel density estimation (KDE) method in density mapping, information about the dataset used, and a depiction of the results gathered from the KDE. In the final section, we discuss our results in anticipation of future interdisciplinary research and conclude with recommendations.

Literature Review
Due to the outbreak of COVID-19 (the disease caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARSCOV-2)) at the beginning of 2020, both consumption behavior [8] and mobility [9] changed unexpectedly. GIS-related studies published since the outbreak can be classified into five different aspects [10]: (1) spatiotemporal analyses measuring the spatial spread of the pandemic; (2) health and social geography; (3) environmental variables; (4) data mining; and (5) web-based mapping.
In line with the scope of this research, studies related to data mining and web-based mobility pattern analysis are discussed in the current paper (see Table 1). Ref. [11] used satellite images from different regions (such as North Korea, Russia, Germany, USA, and China) to identify the change in vehicle volumes at regular points, such as ship traffic at ports and aircraft at gates, to examine population and economic changes during the pandemic. The aim of their study was consistent with that of the current study, with the exception of differences in the data source and the preferred methodology. They used descriptive statistics for comparison, whereas our paper includes an analytical approach to mapping the intensity. Ref. [12] used anonymized geolocation data from mobile devices of a sample of U.S. residents to measure the mobility of the population per day. The resulting geospatial statistics were used to predict levels of disease spread and evaluate the effectiveness of health policy strategies for containment. According to equivalent results, a large reduction in mobility has occurred, both in the US and globally, following the pandemic. Ref. [13] used similar data as that of [12] to perform mobility pattern analysis of the U.S. population. They introduced a new platform, which enables users to plot mobility change patterns of the U.S. [14], and undertook data analysis at the global scale to describe the changes in mobility for various geographies in different types of locations, including recreation, supermarkets, shops and pharmacies, parks, transit stations, workplaces, and private residences. They used anonymized Google mobility data that covers the first 43 days of the pandemic (their data is available on https://osf.io/rzd8k/, accessed on 13 November 2020). The data gathering method of the authors is parallel with ours. However, in density mapping, Ref. [14] used the frequency of geolocations. Ref. [15] investigated changes in park visitation behavior of U.S. citizens after COVID-19. Ref. [16] aimed to identify the diagnostics of the reduction in daily mobility and the changes in the modal distribution and journey purposes in Spain. The results of [14] satisfy the intentions of [15,16].  [20] Iban (2020) Review A number of published papers can be grouped as reviews. One of these review papers is [17]. The authors highlighted the importance of the use of GIS technology in the aggregation of big data from multiple sources, to quickly visualize epidemic information, spatially track confirmed cases, predict regional transmission, spatially segregate epidemic risk and level of prevention, balance and manage supply and demand for material resources, and provide socio-emotional guidance and reduce panic. Study results provided strong spatial information support for decision-making, formulating measures. and evaluating the effectiveness of prevention and control of COVID-19 cases. Ref. [18] published a review of cartographic web viewers and Chinese mobile applications for COVID-19, which were launched in January and February 2020, to slow contagion. Ref. [19] undertook a review of the role and/or usage of OpenStreetMap in the COVID-19 response. Ref. [20] reviewed the availability of spatial distribution data of confirmed cases.

Materials and Methods
A detailed description of GIS and its use are detailed in this section, with an introduction to our proposed density mapping method and information about the gathered dataset.

Geographical Data and Geographic Information Systems (GIS)
Geographic information mostly includes data about the place where we live. That is, when talking about objects, places, and similar elements on the earth's surface, data about their location on the earth constitute geographical information. Many questions are answered within the scope of geographical information. Information such as the location of an individual, the location of the nearest hospital or on-duty pharmacy, neighboring nations of a country, and the position at which the lunar eclipse can be monitored are subjects that can be answered using geographic information. In this context, large amounts of data are analyzed and modern data processing technologies are used for geographic information [21].
GIS can be defined as a computer-based data system. In this data system, it is possible to collect, analyze, and store all kinds of data located anywhere in the world. GIS helps to address many social, economic, and environmental problems globally due to the large quantity of geographical information obtained by GIS, and the results obtained from data collection and analysis [22]. GIS is a concept that has a wide range of uses. This system, which can also be used as a database, provides data to institutions and organizations in many areas ranging from infrastructure management to topological mapping [23]. According to [24], GIS uses big data to help users in location-based decision-making processes, and addresses complex social, environmental, and economic problems. It is an augmented methodology, which combines software, hardware, method, geographic data, and personnel to jointly fulfill the features of storage, collection, spatial analysis, processing, management, presentation, and query. GIS focuses on geographically specific information, and represents the locations and details of geographical information. Although many applications are used to show geographical features on the earth, GIS consists of software that focuses only on this subject. Thus, GIS applications can meet a wide range of information needs, which are related to the geographical features. From this point of view, GIS can be defined as a platform in which map and location requirements are easily met [25]. The following points should be taken into consideration in preventing potential errors in the use of GIS [26]:

Components of Geographic Information Systems (GIS)
In the literature, many sources focus on five different components of GIS: software, hardware, method, data, and human. The role of these components is discussed below.

Software
A number of programs are used to perform various operations, such as storing, analyzing, and mapping data collected in GIS. GIS software (ARCGIS, ARC/INFO, ARCVIEW, MAPINFO, IDRISI) is used with operating system software, application software such as Java, and network software. In GIS, software systems are divided into rule-based systems and information-based systems. The systems that support decision making for problems related to the field are called spatial decision support systems (SDSS). They are frequently used in planning and resource utilization/management processes [26]. The key tool for using software is the Internet, which allows users to access the information stored in the GIS. In addition, the Internet is required to transfer spatial data. Furthermore, the Google Maps Application Programming Interface (API) offers significant assistance in creating, transferring, and using data related to GIS on the Internet. The API plays a critical role in the processes of preparation and use of GIS with relatively low cost. Google Maps is a free application that performs most map-based services. Users can easily navigate to different locations using the arrows on the map and find locations entering their address information. One can zoom out and zoom out with the "+" and "−" keys [27].

Hardware
Using the hardware, operations such as storage, input creation, and output generation are performed. In addition, the hardware is useful in creating software [28].

Method
The method is a set of procedures that determines the tasks and how they are accomplished in the process of GIS. The subjects within the scope of the method are data collection, data storage, database design and management, data transformation, and analysis methodology. Failure to determine the methods in GIS may lead to unsuccessful results. First, in the process of creating the system, planning is required for many issues, such as the tasks of the people and analysis methods. In addition, an order should be established within the institution that will use the GIS in matters such as information sharing and usage [28].

Data
Data models are classified to two groups, raster and vector. In the raster group, there are a series of cells in which the geographic data representation coordinates are located. Each cell shows its location independently from the others. Each cell has its own specific point, and each location has a cell. A set of cells and its associated value is called a layer. In raster models, spatial analyses can be performed faster and easier. Raster models require the use of large amounts of data. In vector models, coordinates are used to show the determined positions. Less storage space is required for vector data models. In addition, less effort is required at the point of software creation in vector models [26]. The stages in the process of using data and the important issues in the relevant stages include [29]: Database design-straight data tables, related database, characterizing the database, providing error-free data entry. GIS-planning of locations, establishing the connection of the obtained data with the locations, data coding, the punctuation of regions and locations, backgrounds. Implementation strategies-planning the formats of the maps, material selection, software selection, creating strategy, planning data flow, development of the process, obtaining the necessary equipment.

Human
The human component is the basic element of GIS. The scope of this element includes trained GIS personnel, system administrators, database managers, system analysts, software specialists, and programmers [30].

Dataset and Method
In this study, the dataset was composed of 15,730 anonymized mobile users' address queries from IOS and Android platforms, between the dates of 15 February 2020 and 27 July 2020. The dataset was retrieved from Google's location-based services-Google Maps Geolocation API. The coordinates were limited to the historical peninsula of Istanbul (ranged between the latitudes of 41.002 and 41.028, and the longitudes of 28.946 and 28.986). The dataset includes location information and address queries of the users. Missing queries or records that do not contain location information, were excluded from the dataset. In some cases, only location names, zip codes, or addresses were provided, rather than geographical coordinates. To obtain appropriate geographical coordinates of the location addresses and zip codes, we used geocode function from the ggmap package in R. We also used the Google Maps Geocoding platform for the purpose of crosschecking. While plotting data, we downloaded map images and computed contours based on the geospatial data with two-dimensional KDE using the stat_density2d and geom_density2d functions in R. In Turkey, the first confirmed Covid-19 case was officially announced on March 10, 2020. To determine the impact of the pandemic and display important patterns in spatiotemporal data, we set the "month" variable as a discrete temporal component and plotted the queries by month.
To capture important patterns in data structures, data smoothing techniques play a fundamental role in data analytics [31]. The probability density function (referred to as density) serves as a smoother where data-smoothing techniques are employed. The problem of constructing an estimator for a set of observed data points based on an unknown probability function is called density estimation. In the literature, estimation approaches can be divided into two groups. Parametric density estimation approaches have a fixed number of parameters, whereas non-parametric approaches have an increasing number of parameters as training data size increases. Due to its mathematical advantages, one of the non-parametric approaches, kernel density estimation (KDE), is better suited to the problem of real-time incoming data from the field [32]. KDE can be used to analyze density without the need for prior knowledge and produces a continuous spatial point density on a two-dimensional surface. KDE is defined as spatial analysis that brings a new form to interdisciplinary studies by combining geography and statistics. It combines both spatial analysis and geographical analysis. However, KDE is sensitive to data quality and noise. KDE measures the intensity of the incidents by drawing a circle with a certain radius and a function expressed due to the increase and decrease in the density. The values are denser at the center of the circle, and less dense as they move away from the center of the circle. In KDE, the regions of the circle are shown according to the probabilities of the incidents. Although the estimators have different functions, such as normal, uniform, quartic, triangular, and negative exponential, the general formula of KDE in a 2D plane is shown in Equation (1) [33]: where K is usually considered to be symmetric probability density function and the function determines the shape of the bumps. h > 0 is the smoothing bandwidth and determines the width of the bumps. x is a vector of coordinates that represent the location where the function is being estimated and xi is the vector of coordinates that represent each point observation i. Selection of bandwidth (h) directly affects the smoothness of density patterns but there no standard rule exists to specify an appropriate bandwidth [33].

Results
In the study, we calculated smoothed values of the mobile users' address queries with R. We represented raw data points with the coordinates limited to the historical peninsula of Istanbul, as shown in Figure 1. The darkest points are centered on the place of the highest density of address queries. For example, in February, the darkest points contain approximately 70% of all queries performed during the month. For all months, most concentric regions include 80% of the cases. To investigate how the observations are related to each other and estimate differences of the densities before and during the COVID-19 pandemic, we show the point coordinates of address queries by months. In February 2020, before the announcement of the first confirmed case of COVID-19 in Turkey, it can be concluded that mobile user address queries were more densely located in tourist spots and historical places. There was a decrease in the density of the queries in the next few months and the lowest density occurred in May.
In a geospatial contour map, the lines are expressed as the bins and the higher values are located around the contour center. The number of observations is counted in each bin and the values decrease as they extend toward the periphery. We plotted the density of the queries by adding polygonal density contours (layers) to Google Maps. We adjusted the boundaries of the map by increasing the zoom level to 14. The density was calculated using the longitude and latitude coordinates of the queries and the contours were filled according to KDE results. Each contour was smoothed by the gradient. In Figure 2, we depict the queries using the density function instead of points. This plot provides useful information about concentric regions in certain areas. The plot indicates that there was one main hotspot of activity in February, whereas the activity represents the lowest density in March and April. The density is more apparent from May until July.

Discussion
Research on the geographical areas in which humans live has been conducted throughout history. The processes of recognizing, examining, and shaping the geographical area in which humans live are as old as human history. Therefore, it can be argued that the concept of geographical information is not new and has a history that dates back to ancient times. Geographical information has been researched and used in many situations that are necessary for survival throughout human history, such as building monumental tombs, determining suitable places for hunting animals, building shelters, finding new pastures, and determining water resources. Now, the scope of geographical information has become much more complex due to the increasing population and expanding residential areas. As a result, geographical information now involves comprehensive data flow.
In this study, we performed hot spot analysis with KDE-aided GIS. As a result of the analysis, the regions of the addresses searched for most frequently by mobile users were determined in the historical peninsula of Istanbul before and during the COVID-19 pandemic. The change in address search patterns based on geographical locations was easily recognizable after the announcement of the first confirmed case of COVID-19 in March. This change was presumably due to the nationwide shutdown in response to the COVID-19 outbreak. Despite the downward trend, the density of mobile users' queries started to increase in the specified area after the COVID-19 restrictions were eased in the second half of May 2020. We conclude that the outbreak negatively affected the mobility of mobile users and, therefore, address searches. Density analysis can be undertaken using various field definitions in GIS. To determine hot spots within a geographical location by considering different criteria, such as time, location, size, quantity, and shape, KDE provides valuable computational advantages compared to other approaches. For this reason, the density value determined with the help of the KDE and more concentric regions were obtained. Statistical analysis with the implementation of the geospatial package in R can provide easy and useful practical tools for spatial data analysis. For future research, the Google Maps API can be incorporated into R for additional capabilities.
The use of geolocation data of mobile phone users in geospatial analysis is very popular. In addition, anonymized Google mobility data is also a preferred data source. Despite the similarity of the data collection methodology with that of previous studies, our paper focused on address searches in Istanbul's historical peninsula. In addition, due to the scale of the data, detailed density estimations and novel results were obtained.
The results obtained from the proposed approach show that address queries made by mobile users can be used to determine mobility during the pandemic period. In this context, it is well known that the intensity of social mobility will have an effect on the increase in virus spread. We believe that our proposed approach, which can be used to easily map social mobility, will support the decision systems of the related public health authorities.
Funding: This research received no external funding.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy and confidentiality concerns.

Conflicts of Interest:
The authors declare no conflict of interest.