Urban Big Data Analytics: A Novel Approach for Tracking Urbanization Trends in Sri Lanka

: The dynamic nature of urbanization calls for more frequently updated and more reliable datasets than conventional methods, in order to comprehend it for planning purposes. The current widely used methods to study urbanization heavily depend on shifts in residential populations and building densities, the data of which are static and do not necessarily capture the dynamic nature of urbanization. This is a particularly the case with low-and middle-income nations, where, according to the United Nations, urbanization is mostly being experienced in this century. This study aims to develop a more eﬀective approach to comprehending urbanization pa�erns through big data fusion, using multiple data sources that provide more reliable information on urban activities. The study uses ﬁve open data sources: national polar-orbiting partnership/visible infrared imaging radiometer suite night-time light images; point of interest data; mobile network coverage data; road network coverage data; normalized diﬀerence vegetation index data; and the Python programming language. The ﬁndings challenge the currently dominant census data and statistics-based understanding of Sri Lanka’s urbanization pa�erns that are either underestimated or overestimated. The proposed approach oﬀers a more reliable and accurate alternative for authorities and planners in determining urbanization pa�erns and urban footprints


Introduction
Urbanization is an ongoing process which is conventionally understood as referring to the influx of populations towards urban centers from surrounding rural areas.In the present context, the magnitude of urbanization around the globe is generally discussed with the total number of megacities, where populations are more than 10 million, increasing from 23 million to 31 million over the last decade and projected to be 41 by 2030 [1].In addition to this, a large number of new urban centers are also expected to emerge to accommodate this influx of populations.However, this 'urban nuclei' centered, socio-demographic, and statistics-based view is less likely to provide a comprehensive picture of the state and trends of urbanization processes.
Population statistics mostly come from surveys conducted once every 2-10 years based on countries' census year policy [2,3].The conventional methods used by most countries consider residential population distribution as the only or the main criterion for the identification of urban areas and the pa erns of urbanization [4,5].Yet, these methods inherit a few limitations with them.First, they are time-consuming and incur reasonable costs, requiring lots of human resources involved in them, and therefore, limiting the ability to update frequently [6].Second, the effects of urbanization and urban activities are essentially not limited to residential populations, as commuter populations too have sizeable impacts on urban functions.Third, the delineation of urban areas is challenging due to the absence of a universally accepted definition and methodologies.Above all these, there is a parallel process of 'hidden urbanization' that is not necessarily reflected in the form of conventional statistics.The increased mobility, use of advanced technology, and changing aspirations of people inevitably change their lifestyles, irrespective of the place of their residence [7].Thus, urbanization needs to be understood more as a 'way of life' [8] than a ma er of the movement of populations.
This complex pa ern of urbanization will have a profound impact on the spatial characteristics of cities in future [9][10][11] Under such circumstances, managing urban areas and providing their inhabitants with a pleasant urban life is becoming increasingly challenging for local governments.For national and provincial governments, the fair allocation of necessary resources is equally challenging, for which a proper understanding of the state of urban areas is crucial, particularly when it comes to sustainable urban development [12,13].
As discussed, it is difficult to identify urban areas and urbanization pa erns by purely relying on survey and census data, which only provide information for a specific point in time [14][15][16].Still, the identification of urban areas with some level of precision is important for any country or state with multiple functions such as governance, planning and implementation, and resource allocation, and more importantly, for formulating national-and regional-level spatial policies to guide future urban development scenarios [17][18][19].Furthermore, the identification of the magnitude and nature of the complex and hidden urbanization in low-and middle-income countries is crucial for formulating appropriate development strategies to assure sustainable urban futures for these countries [20,21].The use of alternative data sources to identify urban areas and their growth patterns is worth considering in this big data era, where data related to living things, devices, machines, and almost all objects can be traced and recorded with convenience and on a daily basis [22].Nonetheless, utilizing open-source big data sources to study urban phenomena is not yet popular [23].
Against this backdrop, this paper is focused on a more effective and novel approach to determining the pa erns of urbanization with the fusion of big data with other multiple datasets obtained from reliable sources of information on urbanization-related activities.Such an approach is particularly critical for low and middle-income countries, as towns in these nations mostly reflect complex socio-economic and land use dynamics.For instance, informal urban spaces, small boutiques, and third places at the junctions of neighborhoods mostly provide demanded services for local neighborhoods, unlike high-income countries' shopping malls and urban squares [24].However, local community demands can be well met in such townships in low-and middle-income countries [17,23,25,26].This study uses Sri Lanka as a testbed as a low and middle-income country context example to demonstrate the approach proposed.For the study, multiple openly available data sources, which reflect urban lifestyles and urban facilities in urban areas, instead of population statistics as a single dataset are used.The developed approach is straightforward, user-friendly, and cost-effective and will help to provide a be er understanding about urbanization pa erns and processes for authorities and planners.

Definitions of Urban, Urbanization and Urban Dynamics
The terms "Urban", "Urbanization", and "Urban dynamics" mostly go hand in hand."Urban" refers to the characteristics and a ributes associated with city environments, including a high population density and extensive infrastructure [1,27]."Urbanization", on the other hand, describes the process by which rural areas transform into urban areas, marked by population migration and city expansion."Urban dynamics" describes the development and change pa erns and processes that occur in urban se ings.This encompasses the interplay among social, economic, and environmental elements that impact the development, configuration, and operations of urban areas.Urban dynamics is a broad term that includes a variety of phenomena, including changes in the population, the economy, land use, infrastructure, and social trends.Comprehending urban dynamics is imperative for proficient urban planning and administration, as it facilitates the anticipation and resolution of issues pertaining to urbanization, sustainability, and the standard of living in urban areas [28].
'Urban' may not be a universally defined term and, therefore, the delineation of 'urban areas' also lacks a commonly agreed criterion [1,3,27].In the absence of a widely agreed definition or methodology to identify urban areas, each country has adopted its own criterion to identify urban areas and urban shares of populations.Such criteria can be categorized into four main groups, based on the core units they consider for this purpose: Out of them, demographic characteristics and land uses are the proxies most adopted to delineate urban areas [29].However, urbanization is a more complex phenomenon than what has been captured with head counts and land uses [30], and must account for a combination of the economic, demographic, social, and technological processes that lead to an increasing share of populations embracing urban characteristics [8,31].Louis Worth (1938) suggested that "urbanism is a way of life", denoting the process by which an increasing number of individuals embrace urban lifestyles and linking urbanization to the transition from rural or less populated regions to urban environments, impacting social a itudes, lifestyles, and the overall organization of society [8].
Sri Lanka has adopted a local government institutions-based approach to delineating urban areas, in which municipal councils (MCs) and urban councils (UCs) are classified as urban areas, while the rest of the Pradeshiya Sabhas are considered as 'rural' areas [31].India considers Municipal Corporation Areas as urban areas.Still, to be considered as urban, Indian Municipal Corporation Areas must have a population over 5000, a population density over 400 people per km 2 , and 75% of the male workforce engaged in non-agricultural activities [5].
Bangladesh follows a more comprehensive definition compared to Sri Lanka and India.It defines areas as urban if the respective area is densely populated, developed around a central place with the necessary infrastructure, and most of the population is engaged in the non-agricultural sector [29,32].Areas that are densely developed and have commercial, residential, and other non-residential land uses are considered as urban in the US [33].
Two basic drawbacks were observed in the aforementioned approaches.The first is that most of these definitions rely on statistical figures such as the population density and type of economic activities, and the second is the use of administrative boundaries, which may not necessarily trace the complex behavioral characteristics and dynamic nature of urban areas and their functions.Hence, research on urbanization has to be based on more sophisticated and complex methods which are competent enough to capture the lifestyles and spatial practices that propel urban dynamics, rather than static spatial elements and periodic statistics.

Contemporary Tools and Techniques for Understanding Urbanization Pa erns
The literature revealed some a empts to adopt different approaches to identify the dynamics of urban areas and the pa erns of their evolution.Remote sensing and geographic information system (GIS) technologies have been widely used in this regard, maybe due to their conveniently acquired and reliable data [34][35][36].Remote sensing data with a medium to high resolution i.e., Landsat, SPOT, AVHRR, Quick Bird, IKNOS, WorldView, and Sentinel, are commonly used in urban monitoring and detection satellites [37,38].There, urban areas are identified at the pixel level.
Nevertheless, the areas covered by such medium-and high-resolution imageries are very small when considering the regional and the national scales [39].In addition to this, state organizations seem to be reluctant to accept satellite-image-based methods to identify urban areas, maybe because of their cost and the high-end technological competencies required for their use [40].Therefore, it is a challenging task to frequently use satellite imagery to study urban areas and urbanization pa erns at the regional and national scales.Yet, the virtues of satellite imagery in understanding urbanization pa erns cannot be underestimated.
Based on satellite bands, different urban indices have been used by contemporary research studies to identify urban areas.Among them, the Normalized Difference Built-Up Index (NDBI) and Normalized Differentiate Vegetation Index (NDVI) are the most used.Still, these indices simply consider the spatial distribution of building footprints, which may lead to many misidentifications such as the classification of abandoned builtup areas as urban.The Human Se lement Composite Index (HSCI) normalizes Nigh ime Light (NTL) satellite images with the MODIS Normalized Difference Vegetation Index (NDVI) [41,42].The Vegetation-adjusted NTL urban index (VANUI) and enhanced vegetation-adjusted NTL index (EVANTLI) are based on the conceptualization that vegetation and urban built-up areas are inversely related [42,43].The use of NTL images can be identified as an emerging approach to identifying functional urban areas, as it differentiates functioning buildings from non-functioning buildings based on Digital Numbers (DN).
Apart from remote sensing and GIS-based approaches, the use of urban big data is an emerging but understudied resource that can be used in urban research.Urban big data refers to the large and complex datasets generated by urban environments, and, as a result, these data can be used for numerous applications [44].Urban big data can be used to gain insights into a wide range of subject areas such as urban area extraction [45], urban traffic prediction [46], urban air pollution [47], urban disaster management [48,49], and so on.However, they have not yet been widely used to study the urbanization processes of countries.

Urban Big Data and Urbanization
Urban big data are massive amounts of dynamic and static data generated from the subjects and objects, including various urban facilities, organizations, and individuals, which have been collected and collated by city governments, public institutions, enterprises, and voluntary individuals using new-generation information technology [44,50,51].Urban big data have been used for a wide range of purposes such as to study the urban spatial structure and function division [16,52,53], landscape analysis and design [54,55], and the connection intensity between cities [22,56,57].
Refs. [58][59][60] used urban big data to extract urban areas and then to understand the temporal variations in the urban spatial structure.Still, most studies limit their investigations to one or two data sources.As urbanization is a dynamic and complex process, such complexities may not be captured with a single or limited number of data types.An analysis of urban big data of multiple types from a variety of sources is the only way to properly interpret urbanization based on the lifestyles of people, as elucidated by [8] in his seminal work on 'urbanism as a way of life'.

Case Study
Sri Lanka, as a middle-income country, forms an ideal testbed for the proposed novel approach to determining urbanization pa erns with urban big data fusion for the following reasons: First, according to the widely accepted classification by international organizations (e.g., the UN and World Bank), Sri Lanka is a middle-income country with a notably high urbanization rate [61].However, a puzzling trend in its urbanization pa ern has been observed during the past decades.For instance, according to official statistics, Sri Lanka's urban population dropped from 21.5% in 1981 to 14.6% by 2001, and marginally increased up to 18.2% by 2012.This sudden drop in urbanization level, which was quite inconsistent with the ground reality [62], was a result of a change of the definition of 'urban areas' used by the Department of Census and Statistics (DCS).
Second, the 'urban' areas considered for national census before 1987 included municipal councils (MCs) and urban councils (UCs) in addition to town councils (TCs), which were abolished in 1987 and merged with village councils to form a set of new units called 'Pradeshiya Sabha', and these are considered as 'rural' by the DCS [20].The distribution of urban areas as per the DCS definition (MCs and UCs) is given in Figure 1.Third, this local-government-type-based urban area classification, however, is misleading and has caused several repercussions in the areas' policy decisions, prioritization of urban investments, planning of future development activities, and so on.In this context, more revealing and vividly descriptive methods are essential to be er comprehend the complexities in capturing the urbanization pa erns and urban footprint in Sri Lanka.

Methodology
Having realized the complexity demanded by this task, this study adopted the big data fusion methodology to identify the urbanization pa erns in the testbed of a low-or middle-income country context i.e., Sri Lanka.The Python programming language is used to handle large-sized nigh ime light (NTL) images and for the segmentation of urban areas.A raw dataset of about 65 GB for the years of 2013, 2017, and 2021 was collected, analyzed, and fused through the Geometric Mean fusion technique.The Adaptive Threshold Segmentation algorithm was used as the urban area extraction method.These methodological steps are elaborated as follows.As given in Figure 2

Data Acquisition
Table 1 shows the datasets used to examine Sri Lanka's urbanisation.Accordingly, NTL images, point of interest (POI) data, mobile network coverage (MNC) data, NDVI data, and road network coverage (RNC) data were utilized for the data fusion.In line with Louis Wirth's "Urbanism as a way of life", this study intended to capture information that reflected the different aspects of urban lifestyles.
The included datasets on point of interest (POI), Normalized Difference Vegetation Index (NDVI), road network coverage (RNC), mobile network coverage (MNC), and Nigh ime Light (NTL) data are instrumental for understanding the heterogeneity of urban spaces, reflecting the diversity of services, amenities, and environmental features that characterize urban life.Meanwhile, MNC data, by detailing the distribution and intensity of mobile network usage, offer insights into the anonymity and impersonality prevalent in urban se ings, highlighting the non-physical interactions that define modern urban life.Additionally, NTL data provide a unique perspective on urban areas' vibrancy and activity levels, indirectly showcasing the diversity and intensity of urban existence.These datasets, carefully chosen for this study, enable a multifaceted analysis of urbanization, convincingly extrapolating critical aspects of urban life and providing a comprehensive understanding of its complexity.In this table census years are denoted using "*" mark.
These datasets are particularly effective for capturing the unique urbanization patterns in Sri Lanka.NTL images from NPP/VIIRS reveal active urban areas and infrastructure growth, crucial for identifying urban expansion in densely populated regions [27].POI data from OpenStreetMap highlight the distribution of essential services and amenities, reflecting their functional diversity [63].MNC data from local providers are used to identify the urbanization pa ern with a higher accuracy [64].RNC data map the extensive road infrastructure, critical for analyzing connectivity and accessibility, which are indicators of the growth of urban areas, especially in low-and middle-income countries [65].Finally, NDVI data from Landsat 8 provide insights into land use changes and the environmental impacts of urbanization [66].These datasets are important as they are cost effective and readily accessible for urbanization mapping.Further, the aforementioned data sources make them ideal for a comprehensive, accurate, and economically feasible study of urbanization in any country, ensuring that the findings are relevant and actionable for local urban planning and development.

Nigh ime Light Satellite Images
Compared to the satellite images of Landsat, SPOT, AVHRR, and Quick Bird, NTL images help to identify functioning buildings [67].NTL images can be obtained as Visible Infrared Imaging Radiometer Suite (VIIRS) NTL images from the National Aeronautics and Space Administration (NASA) and Defense Meteorological Satellite Program Operational Linescan System (DMSP/OLS).NTL images can also be acquired from the National Oceanic and Atmospheric Administration (NOAA).The NTL image dataset used in this study was derived from the National Polar-orbiting Partnership (NPP)/VIIRS satellite.To understand urbanization trends, an NPP/VIIRS annual nigh ime light (VNL) 2.1 average composite dataset was downloaded for the years of 2013, 2017, and 2021.The composite NTL images had a spatial resolution of 15 arc seconds (~500 m).The spatial resolution of NPP/VIIRS is two times higher than that of DMPS/OLS.The NPP/VIIRS NTL images also had a higher radiometric resolution of 14-bit compared to the 6-bit resolution of DMPS/OLS.As a result of having a higher radiometric resolution, NPP/VIIRS managed to reduce the saturation issues that exist with DMPS/OLS NTL images with a 6-bit resolution [68].Figure 3 shows the NTL images of Sri Lanka for 2013, 2017, and 2021.

Mobile Network Coverage Data
Mobile network coverage maps were developed considering the openly available data published by different mobile service providers in Sri Lanka. Figure 5 shows the mobile network coverage maps for Sri Lanka for 2013, 2017, and 2021.The following maps in Figure 6 show the NDVI images for Sri Lanka for the years 2013, 2017, and 2021.

Normalized Difference Vegetation Index Data
The NDBI given in Figure 8 was calculated using Landsat 8 Collection 2 Tier 1 images.The NDBI dataset was composite considering all the available images within the year and obtaining the average value for the considered year.The following Equation ( 2) was used to extract the NDBI images for the whole of Sri Lanka covering the study period.

Administrative Boundaries
Adhering to the DCS definition of urban areas in Sri Lanka, MC and UC boundaries (Figure 1) were considered to examine the changes between the study outcomes and the existing urban areas.

Data Preprocessing
Data preprocessing was conducted to enhance the quality of the raw dataset to make it suitable for use in the fusion exercise.Firstly, the downloaded NTL images were subset to the Sri Lankan area using the administrative boundary layer.Secondly, the raw POI dataset was cleaned and categorized into the categories of public places, education, health, leisure, catering, accommodation, shopping, and financial.Under the cleaning process, duplicates and wrongly geocoded POI data were removed.A comprehensive summary of the cleaned dataset is given in Table 2. Thirdly, MNC data, which were received as GeoTIFF files, were converted into points.Fourthly, the NDVI datasets were converted into (1-NDVI) using the raster calculator in ArcMap, as, usually, the vegetation density is lower in urban areas.Hence, (1-NDVI) had a positive relationship with urban areas [42].Fifthly, population data obtained for the census years were estimated for the years of 2013, 2017, and 2021.Finally, all the raster data and shapefiles were projected to a WGS 1984 Zone 44N coordinate system.

Pre-Data Fusion
This process of identifying urbanization in Sri Lanka required a pre-data fusion stage for two reasons: (a) to give equal weight to the five datasets used in this study, and (b) to prepare various types of datasets for the data fusion.In this study, min-max normalization and resampling were used to give all datasets equal weights, and a Kernal Density Analysis (KDA) was used for Kernel Density Estimations to convert all datasets into the same raster format prior to the stage of data fusion.
In the pre-data fusion stage, KDA was applied for the POI, RNC, and MNC data, and min-max normalization and resampling processes were conducted for the NTL images and POI, MNC, NDVI, and RNC data to obtain accurate and more reliable data fusion results.
Areas with a higher agglomeration of POI data, MNC data, and RNC were identified through the KDA.For instance, the POI KDA reflected higher values in urban areas than rural areas and lower ones in transition zones between urban and rural areas [69].All the raster layers were normalized using the min-max normalization equation before the data fusion exercise.According to research conducted by [70], it has been established that performing data resampling before fusion can lead to improved outcomes.Specifically, the utilization of nearest neighbor resampling has proven to be advantageous due to its ability to retain the original values of the raster data, thereby minimizing the introduction of errors.In line with this understanding and to mitigate the potential for errors, this study employed the nearest neighbor resampling technique.In this study, a 100 m spatial resolution was selected for all five fusion raster layers: the NTL raster, POI raster, MNC raster, (1-NDVI) raster, and RNC Raster.

Data Fusion
Data fusion is the integration of information from multiple sources through the application of advanced techniques, with the objective of deriving more precise and valuable insights than those obtainable from any singular data source [71].The advantage of the data fusion method is that fused data contain more details than single-sourced data.As a result, data fusion methods have been shown to improve study accuracy and reliability [67,72].Fusing different types of data, especially with NTL images, (a) reduces the blooming effect [73]; (b) reduces the over saturation effect [23]; and (c) increases the accuracy [37,66].There are different data fusion methods used by different studies such as Wavelet Transform [27,67] Geometric Mean [12], Multi-Level Data fusion [37], and so on.
The use of the geometric mean over other methods discussed, such as Wavelet Transform and Multi-Level Data Fusion, lies in its simplicity and greater accuracy.Unlike Wavelet Transform, which requires complex multi-scale image fusion processes [37], or Multi-Level Data Fusion, which involves intricate steps of sample selection, pixel resolution unification, and feature weighting [45], the geometric mean provides a straightforward yet effective way to fuse different data types.This method effectively eliminates the impact of extreme values and retains the original information of the datasets, ensuring accurate representations of urbanization pa erns [12].Consequently, the geometric mean method is not only easier to implement, but also ensures a high accuracy in capturing the dynamic urbanization pa erns crucial for planning and development and urban sprawl identification, not only as a real-time monitoring tool, but also as a predictive tool [74].
This research used the 'geometric mean' data fusion method based on the literature to combine the NTL, POI, MNC, RNC, and (1-NDVI) raster data into one detailed raster to identify urban areas.Equation (3) presents the calculation for the geometric mean.
where GM is the geometric mean of the n number of variables, x , x , and x represent the values of the variables, and n is the number of variables.
The geometric mean is a frequently used method in data fusion that can efficiently minimize the influence of image extremum while preserving the original raster's details [18].
In this study, the datasets were assigned weights based on three criteria: the reliability of the data source, the direct usability of the data, and potential data errors, with the values of 1 (high) and 0 (low).The NTL dataset, sourced from NASA, received a weight of 1.5 due to its high reliability, direct usability from annual average maps, and low errors.The POI data from OpenStreetMap were assigned a weight of 0.5, reflecting their low reliability and potential errors, despite being directly usable.The NDVI data from the USGS were weighted at 1.0, given their credible source and low error rates, though they required merging layers, which could introduce errors.The RNC data, also from OpenStreetMap, received a weight of 0.5 due to similar issues of low reliability and potential data errors.The MNC dataset from local network service providers was assigned a weight of 0.25, as it required manual digitization, introducing errors despite being from a credible source.Table 3 shows the reliability and weights of the each of five datasets.
where FI is the composite index of i, NTL is the ith nigh ime light DN value, POI is the i th POI density raster value, MNC is the i th MNC raster value, and RNC and (1 − NDVI) are the kernel density values of the RNC and NDVI of point i.
Under data fusion, all five datasets (NTL, POI, RNC, MNC, and NDVI) were fused into one raster and then subset for Sri Lanka for the years of 2013, 2017, and 2021 [57].

Urban Area Extraction
The Adaptive Threshold Segmentation algorithm was used from the Open-Source Computer Vision Library (OpenCV) python library to extract the urban areas from the Fused Image (FI).In the ATS algorithm, there are two ways to calculate the optimal threshold values [75].They are: (a) Cv2.ADAPTIVE_THRESH_MEAN_C-calculates the threshold by calculating the mean value of the given block size; and (b) Cv2.ADAPTIVE_THRESH_GAUSSIAN_C-calculates the threshold value by taking the weighted sum of the pixels, with weights assigned using the Gaussian window technique.
In this algorithm, the block size can be set based on the area that needs to be scanned for the threshold.If the block size is higher, a large area will be considered when calculating the threshold value.As this study was conducted for the whole of Sri Lanka, the block size was given as 6001 and the Gaussian method (Option b) was used to obtain the threshold value.The 6001 value was selected based on a visual inspection of the urban areas, and, accordingly, NTL, POI, MNC, RNC, and (1-NDVI) were separated by the image segmentation algorithm.

Accuracy Assessment
Two methods were used to verify the accuracy of this data-fusion-based method of monitoring urbanization in Sri Lanka.These were Precision Assessment and Spatial Accuracy Assessment.The formula for the confusion matrix for Accuracy, Precision, and Recall assessments is given below: Overall Accuracy = TN + TP TN + FP + FN + TP (5) where TP is the number of times a predicted yes was an actual yes, TN is the number of times where a predicted no was actually a no, FP is the number of times where a predicted yes was an actual no, and FN is the number of time where a predicted no was an actual yes.In the Kappa Equation ( 6), P denotes observed agreement and P denotes expected agreement.
The spatial accuracy assessments were conducted by comparing the identified Urban Patches (UPs) with satellite imagery.

Definition of Urban Growth and Rate of Urban Growth
To evaluate the spatial distribution and rate of urban expansion, this study adopted two indicators: urban growth (UG) and rate of urban growth (RUG).Urban growth can be used to represent changes in the urban area within the study period.The rate of urban growth was calculated for the per unit time that the urban growth happened.So, they were two key indices for evaluating the spatial changes in urban expansion in Sri Lanka.The UG and RUG are defined as follows, In the above equations, denotes year 1 and denotes year 2.

Understanding Urbanization Pa erns in Sri Lanka through Data Fusion
As shown in Figure 10, the urban areas identified through the big data fusion approach were divided into 61 Urban Patches (UPs).After individually screening the characteristics of the UPs, Wathupitiwala UP (UP 48) was excluded as it referred to the Wathupitiwala Export Processing Zone ('PZ') boundaries.Finally, 60 UPs were identified.
In this study, the term 'rate of urban growth (RUG)' refers to the annual average urban growth (UG) of a particular UP.Accordingly, all the above-identified UPs were categorized as large towns, medium-sized towns, and small towns.The distributions of towns under the above three categories are given in Figures 10 and 11. Figure 11 shows detailed descriptions of the selected UPs in Sri Lanka.The UPs were organized based on the area extent of the identified UPs in 2021.Accordingly, all the aforementioned UPs were categorized based on their RUG and urban area extent.This was to understand the urbanization pa erns of different UPs.  Figure 11 indicates UPs based on the status of urbanization pa erns with reference to the current definition of urbanization of the country: 'under-bounded (identified UP goes beyond the urban area identified by the CSD)', 'over-bounded' (identified UP does not go beyond the urban area identified by the CSD), and 'not-within (identified UP is not identified as an urban area by the CSD)'.
Kandy town (UP1) and Colombo city (UP2) had a higher RUG and expansion that separated these two main areas from the rest.Also, a clear pa ern was observed in terms of the RUG rate and expansion identified in Gampaha town (UP 23) and Rathnapura town (UP 24).Balanced UPs had relatively small variations in terms of both RUG and urban area extent.Table 4 shows the estimated accuracy and kappa values of the methodology adopted.The developed approach using data fusion to understand the urbanization pa erns in Sri Lanka demonstrated a high efficiency and accuracy across the evaluated years according to the estimate Accuracy and Kappa values.The method consistently produced strong accuracy metrics and moderate to substantial Kappa values, indicating a reliable classification performance for urban areas.Figure 11, given below, categorizes the identified UPs based on the rate of urban growth (RUG) and urban area extent.
Figure 12 visualizes all the identified UPs based on the category they belong tolarge towns (L), medium towns (M), and small towns (S).Further, it displays 15 indicators related to each UP.They are: (1) name of the main town center; (2) UP code with reference to the codes given in Figure 10 10) MC/UC name-this was to understand the related administrative boundaries of each UP.MC/UC areas are known to be urban areas as per the CSD definitions; (11) MC/UC area-this indicator showed the MC/UC area extent relevance to the identified UP; (12) status-this indicator categorized the identified UPs into three statuses.They were: (a) 'under-bounded', where the identified UP went beyond the urban area identified by the CSD, (b) 'over-bounded', where the identified UP did not go beyond the urban area identified by the CSD, and (c) 'not-within', where the identified UP was not identified as an urban area by the CSD; (13) cities and towns in 2013-this indicator showed the names of towns and cities existing within the UP identified in 2013; (14) cities and towns added to the UP by 2017-this showed the names of newly added cities and towns to the same UP identified in 2013, due to the UP expansion; and (15) cities and towns added to the UP by 2021-this showed the names of newly added cities and towns to the same UP identified in 2021, due to the UP expansion.However, none of the UPs identified in 2013 were observed to be shrinking.The le ers "L", "M", and "S" in Figure 13 refer to the "large", "medium", and "small" town categories, as identified in Figure 12.Except for the said categories, distribution of Expressways, A, B, C and Minor roads are visualized in the map.Detailed explanations of subclasses of A,B,C and Minor road categories can be found through h p://www.rda.gov.lk/source/rda_roads.htm (accessed on 18 April 2024).Only two UPs were identified in Sri Lanka to be in the category of large towns.They were Kandy (UP 1) and Colombo (UP 2).These can be considered as renowned urban metro regions in Sri Lanka.Even though both UPs had a larger urban area extent and higher RUG, the analysis indicated that the RUG slowed down in the two considered time periods-from 2013 to 2017 (Period 1) and from 2017 to 2021 (Period 2).Further, both UPs have well expanded beyond their current administrative boundaries-the MC and UC areas integrating the surrounding urban areas into forming urban conurbations.By 2021, the Colombo UP and Kandy UP accounted for 15.51% and 35.65% of the country's total urban area, respectively.Therefore, around 57% of the country's urban area was located inside these two major UPs.
These medium-sized towns has the characteristic of being formed around a central urban core.Unlike the Colombo and Kandy large UPs and except for the Negombo, Kurunegala, Badulla, Nuwara Eliya, and Gampaha medium-sized UPs, the balanced towns experienced higher urbanization growth during period 1 (2013-2017) and period 2 (2017-2021).Most importantly, 4 of the 21 medium-sized UPs-Dambulla, Welimada, Polonnaruwa, Nuwara Eliya, Vavuniya, Mahiyanganaya, Rikillagaskada, Higurakgoda, Rikillagaskada, Mawanella, Galewela, and Ibbagamuwa-were not even located within the official urban boundaries identified by the DCS.They were meant to be non-urban areas based on the DCS definition.This shows that the DCS definitions have not considered the political and socio-economic changes that have happened over the considered time period.For instance, after the end of the civil war, Jaffna and Vavuniya experienced rapid urban growth, which could be the reason behind identifying them as urban areas because of the fusion exercise.

Small Towns
UPs with a small RUG variation and urban area extent were identified as small towns.These UPs had an urban area of less than 20 km 2 .Out of the 60 UPs, 37 UPs were categorized as small towns.Figure 15 shows all the identified small towns in Sri Lanka based on the data fusion approach.The majority of the small towns identified in this study do not belong to the current definition of urban areas presented by the DCS.

Assessment of the Accuracy of the Findings
To assess the accuracy of the identified urbanization pa ern, a reliable reference dataset is needed, which will enable comparing and contrasting the differences between the urban areas identified through this data fusion approach with the already established areas.However, in the absence of such a dataset, this study adopted a limited validation, which was employing satellite imagery to visually assess the accuracy of the urban areas identified through data fusion.. Figure 16 shows the Colombo and Kandy urban areas in 2021.
When examining Colombo (UP 2), a similar pa ern was revealed between the urban areas identified by the data fusion model and those observable in the satellite images.This alignment hinted that the data fusion approach was able to accurately capture not only the urban sprawl of Colombo, but also the urban conurbation of Colombo, showcasing its effectiveness as a tool for urban area determination.The analysis of Kandy (UP 1) was challenging due to its diverse topography.Although the data fusion approach occasionally extended into less inhabitable hilly areas, the identified urban area closely resembled the actual urban footprint captured in the satellite overlays and NTL images.This finding was significant, as it demonstrated the data fusion approach's capacity to adapt and provide meaningful insights into urban dynamics, even in geographically complex areas.The statistical analysis further highlights the effectiveness of the data fusion approach and its potential as a suitable approach for capturing the dynamic urbanization pa erns in low-and middle-income countries.For instance, compared to the urban area extent identified by the DCS, Colombo UP (UP 2) increased by 232.12, 604.76, and 1072.62 square kilometers for the years of 2013, 2017, and 2021, respectively.Similarly, the Kandy UP (UP 1) was 183.58, 331.82, and 447.03 square kilometers larger than the urban area extent identified by the DCS in the same time periods.These figures highlight the suitability of the data fusion approach in delineating urban areas to obtain a broader understanding of the dynamic urban growth of a country.Table 5 shows the underestimation/overestimation of urban areas identified through data fusion.Even though the data fusion approach showed accuracy of an acceptable level, there are further areas that need to be improved.Yet, this data fusion approach has the relative advantages of quick adaptation, usability in situations where updated census data are scare, especially low-and middle-income countries, and revealing the urban lifestyles of citizens that extend beyond formally demarcated administrative boundaries.Based on the context of use and data availability, this method is useful for understanding the dynamic nature of urbanization.

Discussion
Urbanization is a complex process that occurs at different scales, from the local to the regional level [23,75].Understanding the urbanization process often helps authorities to appropriate urban development policies and infrastructure development projects.The use of one data source does not allow for capturing the full extent of the urbanization process and might not provide adequate details to understand the process at the local level.Urban areas are mostly delineated based on building footprints and density, but not on urban facilities such as mobile network coverage and accessibility to facilities such as schools, cinemas, and hospitals, etc.To overcome said drawbacks, this study suggested the data fusion approach using openly available data.The study identified 2 UPs as large towns, 21 UPs as medium-sized towns, and 37 UPs as small towns.Accordingly, the majority of the urban characters were small towns.
When it comes to the growth of all UPs, the Colombo and Kandy UPs had the highest urban area extents from 2013 to 2021.These seemed to have grown over the last few decades into urban metro regions forming conurbations assimilating adjacent townships and localities.Colombo port was the catalyst for it to rapidly expand as the commercial capital of the country.Unlike the other medium-sized towns of Anuradhapura, Polonnaruwa, and Kurunegala, which are historical cities, Kandy continued to grow, even after its attraction in the socio-political context ceased, because of the continuation of the prominence that it received under British ruling of the island.
Except for Welimada, Nuwara Eliya, Higurakgoda, Rikillagaskada, Galewela, Kegalle, and Ibbagamuwa, all the other medium-sized towns-Dambulla, Bandarawela, Anuradhapura, Polonnaruwa, Kurunegala, Vavuniya, Badulla, Mahiyanganaya, Negombo, Galle, Mawanella, Ha on, Jaffna, and Gampaha Ups-had an urban footprint since 2013.Among them, Dambulla (Overall RUG = 23.467),Bandarawela (Overall RUG = 18.44), and Anuradhapura (Overall (RUG) = 11.00)recorded the highest RUGs among the medium-sized towns.Anuradhapura was an ancient capital city of the island that lost its prominence due to the shift of the monarchy to other locations.But from the early 1950s, it has continued to grow as a famous pilgrim destination with the government's implementation of the Anuradhapura Sacred Area Plan.Dambulla is predominantly a commercial center with Sri Lanka's largest wholesale produce market.It is also famous for the historical Royal Cave Temple, which is a popular tourist destination.
When compared with the study findings, it seems that the current delineation of urban areas has not integrated an adequate understanding of the real ground dynamics.Thus, the urban extents of these areas may need to be revisited to guide the direction of development and to provide the needed infrastructure and basic services, assuring that they will encompass environments conducive for human habitation, business promotion, and proper urban functions.
Small towns can be identified as being in the infant stage of an urbanity that might grow into a metropolitan region of the country in the future.However, the life cycle of a small town will change based on the internal and external political and socio-economic factors that might emerge in and around the town.For instance, the Kandy UP, which was identified as a large-scale town, emerged as a riverine small town.Hence, the urban character of 'small towns' is not exclusive to Sri Lanka, it is evident in numerous other countries such as Bhutan (Paro), Malta (Mdina), Luxembourg (Vianden), and Andorra (Ordino).
The positive RUGs in Trincomalee, Dambulla, Sigiriya, Kurunegala, Mawanella, and Kegalle can be a ributed the potential for the development of the proposed Colombo-Trincomalee economic corridor in Sri Lanka (NPP, 2019).This corridor is aimed at enhancing economic activities and connectivity, which will lead to improving infrastructure and transportation networks, a racting businesses and industries along the route.
This study indicates that, in Sri Lanka, the urbanization process, pa ern, and rate are governed by both natural and behavioral factors and socio-political and economic forces.This phenomenon may be common to many other nation states of the same category and with similar conditions supportive of urbanization.Accordingly, understanding urbanization cannot be simply confined to instantaneous and periodic statistical information and static boundaries.Hence, new approaches that capture the ground realities and dynamics of urbanization are required to serve planning and development purposes.At the same time, the trend pa erns of urbanization demand robust planning and development policies, guiding frameworks, and strategies to circumvent adverse impacts and ensure sustainable urban growth in the future.
The findings in this context of a case study of the nation of Sri Lanka revealed that the proposed approach is invaluable in determining urbanization pa erns adequately and efficiently, despite some limitations mentioned in the next section.The approach also holds strong value to be adopted in other low-and middle-income country contexts.This methodology is especially beneficial in low-and middle-income countries where conventional census data may be limited or outdated.The approach's high efficiency and thorough data analysis make it an effective tool for urban planning and policy development.The significance of this issue extends beyond Sri Lanka, as several low-and middle-income countries encounter comparable difficulties in monitoring and controlling urbanization.Implementing this approach can greatly improve the comprehension of growth pa erns, facilitating be er decision making and promoting sustainable solutions for urban development.
However, this study also showed a few technical limitations.Some obvious urban areas such as Kalutara, Beruwala/Aluthgama, Minuwangoda, Ba icaloa, Kaththankudy, Chilaw, Pu alam, Kilinochchi, Medawachchiya, Chavakachcheri, and Point Pedro seem to have gone missing, while a few less obvious smaller urban areas popped out.For instance, the UPs reflect a high NTL and POI density in the Kalutara area.The limited UP coverage likely resulted from NTL's 500 m resolution, indicating a resolution constraint in the data fusion dataset.In the cases of both Ba icaloa and Kaththankudy, small urban patches were identified by the data fusion approach in areas which had a higher concentration of NTL and POI.However, the results were not prominent, as the study was carried out at the national scale.Therefore, their neglected potentiality was most likely due to the scale factor that the study worked with.Fusing different data sources such as building heights and giving them different weights might lead to bridging the gap between the real ground scenarios and the outputs of the data fusion methodology to identify urban areas.

Conclusions
The study reported in this paper aimed to develop an effective approach to determining urbanization pa erns through the novel big data fusion approach using multiple data sources that provide reliable information on urbanization activities, particularly in the context of low-and middle-income countries, where urbanization challenges, e.g., development control and disasters risks, are higher.
The study findings, considering the testbed case of Sri Lanka, challenged the current definitions used to delineate urban areas and the methods used to understand urbanization pa erns.For instance, from the demarcated urban areas delineated by the DCS, 21.6% were considered as under-bounded, 15% were over-bounded, and 63.3% were not within the urban areas defined by the DCS.As urbanization or urban areas help to understand the development status of a country, inaccurate and inappropriate understandings about the urbanization level would misguide all decisions related to the development of a country.Accordingly, this study emphasizes the need to move towards a data-driven approach to delineate urban areas and understand the dynamic nature of urbanization processes.In particular, the extant definitions used to identify urban areas have failed to understand the role of small towns in shaping the future urbanization pa ern of a country.
The study used different data sources, which were employed to trace different functional dimensions of cities, i.e., mobile 4G network coverage data, crowdsourced data, and POI data.Cities are evolving and expanding while adding new layers to these cities.The data sources used to understand the urban extents and present urbanization process may need to be changed in future, and such sources must also be accessible without constraints.Yet, the approach adopted in this study based on big data fusion using openly available data is a suitable way to understand the urbanization pa ern of a country.While the findings primarily concern Sri Lanka, the method is also transferable to many other low-and middle-income country contexts.However, the adoption of the novel approach proposed in this paper might need careful tailoring to other national circumstances, e.g., considering planning regulations, governance system differences, data availability, and so on.
This study also has a few limitations.First, the study segmented images using the accurate LOT algorithm from the OpenCV library.However, it was indicated that image segmentation using a Fully Convolutional Neural Network (FCNN) would have produced significantly more accurate findings for identifying urban areas.Second, the POI dataset was used with equal weights for each category, but allocating higher weights for locations that are more closely associated with urban area will be able to help with the accuracy of identifying urban areas.Third, only spatial verification and precision verification could be conducted in this study.However, ground verification would have significantly contributed to the verification results.Fourth, the method struggled to clearly distinguish the urban pa ern in the coastal belt, possibly due to the national scale of the study.For instance, coastal towns like Panadura and Kalutara were not effectively captured using this methodology.Finally, there was an uncertainty associated with the boundaries, as these may change with changing datasets and the accuracy levels of the fusion methodology.Our prospective research will concentrate on addressing these limitations and fine-tuning the approach for applicability to other low-, middle-, and highincome country contexts.
(a) population based; (b) administrative boundary based; (c) land use based; and (d) multicriteria based.

Figure 1 .
Figure 1.Officially identified urban local government areas in Sri Lanka.
, the methodological framework consists of six main phases.They are: (a) data acquisition; (b) data pre-processing; (c) pre-data fusion; (d) data fusion; (e) urban area extraction; and (f) accuracy assessment.The detailed methodological framework is given in Figure 2.

Figure 11 .
Figure 11.Chart of rate of urban growth (RUG) and the urban area extent.

Figure 12 .
Figure 12.Maps of identified 60 UPs in Sri Lanka with UGs and RUGs.

Figure 13 .
Figure 13.UPs identified as large towns in Sri Lanka.

Figure 14 .
Figure 14.UPs identified as medium-sized towns in Sri Lanka.

Figure 15 .
Figure 15.UPs identified as small towns in Sri Lanka.
Further, Figures 17 and 18 elaborates the Colombo and Kandy urban areas in 2021 and Comparison of fusion results with the real ground realities-cases of Ba icaloa and Kaththankudy.

Figure 18 .
Figure 18.Comparison of fusion results with the real ground realities-cases of Ba icaloa and Kaththankudy.

Table 1 .
Datasets used to examine Sri Lanka's urbanization.

Table 2 .
A comprehensive summary of the cleaned POI dataset.

Table 3 .
Reliability and weights of the each of five datasets.

Table 4 .
Estimated Accuracy and Kappa values.

Table 5 .
Underestimation/overestimation of urban areas identified through data fusion.